Can Artificial Intelligence Save Real Time?
Large language models could be a valuable tool in documentation and support for outpatient visits.

Artificial intelligence has the potential to streamline one of the most burdensome parts of health care: paperwork and documentation. But questions remain as to its reliability, accuracy, and efficacy.
Today’s abstract, “Leveraging artificial intelligence to streamline documentation and support patient-centered gynecologic oncology outpatient visits,” sought to answer some of those questions by evaluating large-language models (LLMs) such as ChatGPT in terms of provider perceived accuracy, completeness, and clinical utility.
To put the LLM through its paces, the research team — led by Katelyn Tondo-Steele, DO, a gynecologic oncologist at the Huntsman Cancer Institute in Salt Lake City, along with Madison Kieffer, MD, an OB-GYN resident at the University of Utah; Jarom Morris, BS, a medical student at the University of Utah School of Medicine; and Britton Trabert, MS, MSPH, PhD, epidemiologist at Huntsman Cancer Institute — developed six clinical scenarios to represent a spectrum of outpatient gynecologic oncology visits, including postoperative follow-up and primary or recurrent treatment planning across disease sites.
Dr. Kieffer said one specific scenario, for example, asked the LLM to act as a board-certified gynecologic oncologist who is charting in the Epic electronic health record system (EHR). The patient is 32 years old and received a diagnosis of a 5 cm cervical mass with biopsy-proven squamous cell carcinoma of the cervix. The LLM was asked to write an assessment and plan for this patient, including education, the next steps in management, and treatment. The plan must be fully aligned with the National Comprehensive Cancer Network (NCCN) guidelines and typical gynecologic oncology practice.
Outputs from scenarios like the one above were independently reviewed by gynecologic, radiation, and medical faculty at a National Cancer Institute-designated tertiary referral center. Reviewers were blinded to each other’s evaluations, and accuracy was rated on a 5-point Likert scale, while completeness was rated on a 3-point Likert scale.
On the 5-point accuracy scale, the median was 4.0 with 91.7% of ratings “mostly correct” or higher. On the 3-point completeness scale, the median was 2.0 with 91.7% of ratings “adequate” or “comprehensive.” Across all scenarios, 86% of respondents indicated a willingness to use AI-generated plans in clinical practice.
“Our findings suggest that large language models like ChatGPT can produce treatment plans that multidisciplinary gynecologic oncology teams consider highly accurate and complete,” said Dr. Kieffer. “This opens the door for gynecologic oncology providers to consider AI as a practical tool for assisting with clinical documentation, which is one of the most time-consuming parts of outpatient practice.”
Analysis of the prompt-development process and reviewer feedback was used to identify key elements for generating reliable outputs, including role specification and guideline parameters. In designing prompts for the LLM, Dr. Kieffer said the team discovered some key components of what makes an optimal prompt to minimize common pitfalls seen with AI-generated answers.
“When you’re working with an LLM, how you ask the question matters enormously to your outputs,” she said. “First, by asking AI to act in a certain role (in our case, a gynecologic oncologist), we obtained outputs that were more applicable to our setting as opposed to documentation more appropriate for a visit with a general oncologist.”
Giving the LLM specific limitations and guidelines was crucial in order to minimize the addition of extraneous information or “AI hallucinations,” or false or misleading information generated by the LLM, into the generated documentation. In this case, the team used the NCCN guidelines and specified that the LLM should only use clinical information that was given in the prompt, Dr. Kieffer said.
“These prompt engineering principles aren’t unique to our study,” she said. “They could be applied by any clinician or researcher looking to use AI tools more effectively, and we think they’re an important practical contribution to this work alongside the clinical findings.”
Even though proofreading of the documentation would still be required, AI-generated documentation has the potential to decrease the amount of time physicians spend charting, according to Dr. Kieffer.
“Although not directly captured by our results, if we can reduce the charting burden on physicians, that time can be redirected toward what matters most: counseling patients, engaging in shared decision-making, and providing individualized care,” she said.
Although AI has shown promise in areas such as early cancer detection, histopathologic classification, risk stratification, and outcome prediction, Dr. Kieffer said that, to her knowledge, there are no published studies specifically evaluating the clinical efficacy of LLM-generated treatment plans and the associated EHR documentation within gynecologic oncology.
“Our study was conducted at a single institution, so an important next step is understanding whether these findings hold across different practice settings and documentation preferences,” she said. “It’s also critical to directly compare AI-generated outputs to physician-authored documentation. Beyond that, there are also practical implementation questions that need to be studied: How do we ensure HIPAA compliance? How could something like this integrate into existing EHR workflows?”
View abstract presentation slides and session recordings on the event platform. Recordings will be available within 24 hours and accessible for 60 days.











