BOB体育

Skip to main content

SynGatorTron鈩� to speed medical research, alleviate privacy worries

鈥淒r. Chatbot will see you now.鈥�

The next generation of super-smart computers, tablets and cell phones may come equipped with artificial intelligence-generated medical chatbots that can interact with patients using human language and medical knowledge.

According to Yonghui Wu, Ph.D., director of natural language processing at the University of Florida Clinical and Translational Science Institute, the medical chatbot you interact with online will be able to use conversational language to communicate with and educate patients in much the same way we now interact with Apple鈥檚 chatbot, Siri, and Amazon鈥檚 Alexa.

The chatbot may also be culturally sensitive and matched to your age.

鈥淚t will be like having your own personal medical avatar,鈥� Wu said.

Medical chatbots are just one of many possible applications to arise out of groundbreaking new AI tools developed by Wu and other researchers at UF and NVIDIA as part of a $100 million artificial intelligence public-private collaboration formed in 2020. Last year, they launched a clinical language AI model, GatorTron鈩�. This AI tool enables computers to quickly access, read and interpret medical language in clinical notes and other unstructured narratives stored in real-world electronic health records. The model was trained on HiPerGator-AI, the university鈥檚 system, which ranks among the .

The GatorTron鈩� model is expected to accelerate research and medical decision-making by extracting information and insights from massive amounts of clinical data with unprecedented speed and clarity. It will also lead to innovative AI tools and advanced, data-driven health research methods that were unimaginable even 10 or 15 years ago.

This year, the team is rolling out another model 鈥� SynGatorTron鈩� 鈥� with different capabilities. SynGatorTron鈩� can generate synthetic patient data untraceable to real patients. This synthetic data can then be used to train the next generation of medical AI systems to understand conversational language and medical terminology.

Most data-driven health research and health-related AI applications today rely on 鈥榙e-identified鈥� patient data in electronic health records, from which patients鈥� private information such as name, address and birthdate, has been removed before it is used for research and development.

Removing patient data is time-consuming and labor-intensive. Automated de-identification systems can be used to generate large-scale machine de-identified data, but it鈥檚 not an ironclad solution.

According to Wu, even after all identifying patient information has been removed, there鈥檚 still a remote chance that someone could identify a patient by tracking data over time.

鈥淕enerating synthetic patient data is a safe way to preserve the knowledge of medical language but mitigate the risks of patient privacy,鈥� Wu said.

Patient privacy isn鈥檛 the only barrier to training the next generation of AI models for research and other applications. The sheer volume of data required to train AI models can also stand in the way.

鈥淭here鈥檚 a finite amount of patient data available to us, and training AI computer models requires a tremendous amount of data,鈥� said Duane Mitchell, M.D., Ph.D., director of the UF Clinical and Translational Science Institute and associate dean for clinical and translational sciences at the UF College of Medicine. 鈥淲ith SynGatorTron鈩�, we can generate all the data we need.鈥�

Another advantage SynGatorTron鈩� has over its competitors is that because real-world patient data is used as a model for generating synthetic patient data, the synthetic data has 鈥渞eal human characteristics,鈥� Mitchell said.

鈥淭he synthetic patient data generated by SynGatorTron鈩� reflects the complexity and diversity of the human population,鈥� he said. 鈥淭his diversity in the synthetic data is crucial because AI is only as good as the data it is trained on.鈥�

Low-quality data used in training algorithms has already been found to introduce or reinforce bias in a few high-profile applications, including gender bias in Google Translate and racial bias in Amazon鈥檚 Rekognition facial recognition technology.

The data produced by SynGatorTron鈩� could be used to address issues with underrepresented minorities and other potential sources of bias, Mitchell said.

Having the ability to generate high-quality synthetic patient data that can be used to develop new AI applications opens up a new world of possibilities.

鈥淲e haven鈥檛 even begun to think of all the downstream uses that will spring from this,鈥� Mitchell said.

One thing is certain: 鈥淭here is a lot of interest in the race for AI applications to generate relevant and accurate synthetic patient data,鈥� he said. 鈥淲ith the development and launch of SynGatorTron鈩�, UF and NVIDIA will certainly be positioned at the forefront of these efforts within the field.鈥�

Photo caption:

Media contact: Ken Garcia at [email protected] or 352-265-9408

Share this story

About the author

For the media

Media contact

Matt Walker
Media Relations Coordinator
[email protected] (352) 265-8395