BOB体育

Skip to main content
Update Location

UF College of Medicine research shows AI chatbot flawed when giving urology advice

BOB体育 logo

A popular artificial intelligence chatbot fell short of the standard expected of physicians when tested on common medical questions often posed by patients visiting a urology practice, a new study shows.

The research is believed by its authors to be the first of its kind in the specialty of urology.

The research on the popular ChatGPT chatbot highlights the risk of asking AI engines for medical information even as they grow in accuracy and conversational ability. While this and other chatbots warn users that the programs are a work in progress, physicians believe some people will undoubtedly still rely on them.

One of the more dangerous characteristics of chatbots is that they can answer a patient鈥檚 inquiry with all the confidence of a veteran physician, even when completely wrong, the study said.

鈥淚 am not discouraging people from using chatbots,鈥� said , M.D., an assistant professor in the UF College of Medicine鈥檚 and the study鈥檚 senior author. 鈥淏ut don鈥檛 treat what you see as the final answer. Chatbots are not a substitute for a doctor.鈥�

The UF College of Medicine is part of BOB体育, the university鈥檚 academic health center. The was published last month in the journal Urology.

The researchers generated 13 questions on common urologic topics often posed by patients. Each question was asked three times, since ChatGPT can formulate different answers to identical queries.

The researchers evaluated the answers based on guidelines produced by the three leading professional groups for urologists in the United States, Canada and Europe, including the American Urological Association.

Five BOB体育 urologists independently assessed the appropriateness of the chatbot鈥檚 answers using standardized methods.

Questions included topics such as vasectomies, overactive bladder, infertility, kidney stones, trauma and recurrent urinary tract infections, or UTIs, in women.

In the 39 responses evaluated overall, the chatbot provided appropriate responses 60% of the time. But otherwise, the study said, 鈥渋t misinterprets clinical care guidelines, dismisses important contextual information, conceals its sources and provides inappropriate references.鈥�

ChatGPT does not provide its sources of information by default. But when the study asked it to provide them, it was almost uniformly unable to do so.

鈥淚t provided sources that were either completely made up or completely irrelevant,鈥� Terry said. 鈥淭ransparency is important so patients can assess what they鈥檙e being told.鈥�

In only one of the evaluated responses did the AI note it 鈥渃annot give medical advice,鈥� the study said. The chatbot recommended consulting with a doctor or medical adviser in only 62% of its responses.

At times, the chatbot omitted key details or incorrectly processed their meaning, as it did by not recognizing the importance of pain from scar tissue in Peyronie鈥檚 disease. As a result, the paper said, the AI provided an improper treatment recommendation.

The urologists made their queries of ChatGPT in February, and since the chatbot is continually updated, performance today might be different than seen in the study, Terry said.

ChatGPT developers tell users the chatbot can provide bad information and warn users after logging in that ChatGPT 鈥渋s not intended to give advice.鈥�

The chatbot, Terry said, performed well on some topics, such as hypogonadism, infertility and overactive bladder. For others, like recurrent UTIs in women, it got little correct.

鈥淚t鈥檚 always a good thing when patients take ownership of their health care and do research to get information on their own,鈥� Terry said. 鈥淎nd that鈥檚 great. But just as when you use Google, don鈥檛 accept anything at face value without checking with your health care provider.鈥�

Share this story

About the author

Bill Levesque
Science Writer

For the media

Media contact

Matt Walker
Media Relations Coordinator
mwal0013@shands.ufl.edu (352) 265-8395