CLEVELAND—As efficient and convenient as it may be, incorporating artificial intelligence and natural language models in medical writing is a process also fraught with potential pitfalls and setbacks. In a recent presentation, Abe Fingerhut, MD, discussed the finer points of AI-assisted medical writing and how surgeons might use the technology in accurate, responsible ways.
“Artificial intelligence is being used increasingly for assistance in medical writing,” began Dr. Fingerhut, MD, the GuangCi Laureate Professor of General Surgery at Shanghai Jiao Tong University School of Medicine. “It is driven by advances in natural language processing and machine learning technologies with the creation of algorithms,” he said, speaking at the 2024 annual meeting of the Society of American Gastrointestinal and Endoscopic Surgeons.
Of the many current AI systems, none is as popular as ChatGPT, which was developed by the California-based company OpenAI. Trained on a database of conversational texts eclipsing 300 billion words, ChatGPT can create responses to user input or prompts in a conversational context. Furthermore, as Dr. Fingerhut said, it’s also the most widely used AI program in medical writing.
Perhaps not surprisingly, ChatGPT’s possible uses in medical writing are numerous, and begin with generating reports and summaries of medical research papers and clinical trials. Other uses that Dr. Fingerhut noted include:
- creating patient-specific medical information like discharge summaries and patient education materials (the public-facing version of ChatGPT is not HIPAA-compliant; data input into it can subsequently be used to train the model and therefore generated into someone else’s query);
- assisting in writing medical textbooks and guidelines;
- generating product labels and package inserts for medical devices and drugs;
- creating a chatbot or virtual assistant capable of answering medical-related questions; and
- assisting in protocolized letter writing, such as preauthorization letters to insurance companies, work excuses or letters of recommendation.
But can it write your research paper? Here is where Dr. Fingerhut urged caution. Despite the myriad benefits that AI language processing offers, there are some distinct disadvantages in its use, including an inability to exhibit genuine creativity or ensure accuracy without quality control. Furthermore, AI cannot evaluate, praise or criticize by itself.
“It cannot interpret data in a diversified clinical way when things change,” Dr. Fingerhut noted. “It cannot make an on-the-spot decision and change its sequence since the algorithms are all predetermined. So, it cannot be used theoretically for a discussion or a peer review in the sense of medical writing.”
At the same time, there are many controversies surrounding the use of AI-assisted medical writing, not the least of which are ethical issues and legal liability. Furthermore, authorship in medical writing is strictly defined, making it difficult for an AI program to be accountable for all aspects of what has been written in the paper. Yet perhaps the most important drawback to using AI in medical writing is the fact that the technology has been known to make regular errors.
Sometimes called “hallucinations,” these errors are often born from the fact that the program’s source information—often Wikipedia and Reddit—is not peer-reviewed, or it comes from the article abstract instead of the complete text.
“There are also lay texts that have been introduced into this database,” Dr. Fingerhut said. “Meanwhile, we are interested in scientific integrity.” AI programs have been known to generate entirely new text in convincing language that is completely incorrect, and include irrelevant, nonsensical or factually incorrect answers.
Another common error with AI-based medical writing comes in the form of fabricated and erroneous bibliographic citations, which have been shown to occur in as much as 94% of citations in one study (Sci Rep 2023;13[1]:14045).
“This is not only annoying, but as the program will not admit that it doesn’t know the references, it fabricates them,” he explained. “If you use that reference, you’re going to be perpetuating that error in your own scientific work.”
Another, perhaps more embarrassing, AI-induced error in medical writing occurs when authors forget to remove the chatbot’s prompts from their finished product. Indeed, more than one article was published with chatbot prompts directly in the body of their article.
As a result, many scholarly journals and societies have published statements and guidelines regarding the responsible use of AI technologies in academic publishing. Generally, these statements agree on the following tenets, Dr. Fingerhut noted:
- AI tools should not be listed as authors on papers.
- Authors should be transparent about their use of generative AI, and editors should have access to tools and strategies to ensure authors’ transparency.
- Editors/reviewers should not rely solely on generative AI to review papers.
- Editors/reviewers retain final responsibility in selecting reviewers and should exercise active oversight of that task.
- The final responsibility for editing a paper lies with human authors and editors.
“In summary, these generative AI models are very powerful and are likely here to stay until the next innovation displaces them,” Dr. Fingerhut concluded. “The authors and scientists are ready to use it, but you have to be very careful and only use it properly. Artificial intelligence is a great invention, but we don’t want it to get out of hand.”
For session co-moderator Daniel A. Hashimoto, MD, MTR, an assistant professor of surgery, computer and information science at the University of Pennsylvania, in Philadelphia, the use of AI in medical writing is a potential minefield that needs to be nimbly navigated to maintain clinical and professional integrity.
“I see the value of these models in things like refinements and improving upon work that you have already written,” he told General Surgery News. “I think it’s a little disingenuous to ask ChatGPT to write a paper for you, but another thing entirely to have a draft and ask a large language model to refine it in some way. In my opinion, that’s not very different from giving it to a friend and asking them to look at it.”
As Dr. Hashimoto added, these models are only as good as the data on which they’re trained.
“If you’re using the publicly accessible base version of these models, where you can’t go in and tinker with some of the settings, you’re going to get a product that is very middle-of-the-road. And so, you might find that this writing becomes very bland.
“At the end of the day, I think it really falls on the medical field to think about what we are trying to accomplish with our medical writing,” Dr. Hashimoto said.
Drs. Fingerhut and Hashimoto reported no relevant financial disclosures.

