By Monica J. Smith

Tampa, Fla.—Using artificial intelligence in addition to physician documentation may better predict death in the surgical ICU than using documentation of physiologic parameters alone, according to a recent study.

The aspect of AI used in the study was natural language processing (NLP), software that converts unstructured text to a format that can be used for extracting relevant information.

“NLP has been applied to electronic health records for a range of clinical uses, such as predicting breast cancer recurrence and augmenting the identification of postoperative complications,” said Joshua Parreco, MD, a general surgery chief resident at the University of Miami Miller School of Medicine.

image

“The purpose of our study was to use NLP of physician documentation with supervised machine learning to predict mortality in the surgical ICU, which is of vital importance in determining resource allocation and treatment decisions. We hypothesized that adding the subtle nuances of physician documentation to the processing could better predict mortality than processing objective data alone.”

Dr. Parreco and his colleagues used electronic health records from the Multiparameter Intelligent Monitoring in Intensive Care III (MIMIC-III) database, which contains detailed information on patient stays, including diagnosis, procedure codes, laboratory data, vital signs and medical administrations. They queried the database for each patient admitted to a surgical ICU at Beth Israel Deaconess Medical Center, in Boston, between 2001 and 2012.

“The database was queried for text from the first note written by physicians for each ICU admission. We used six different severity of illness scores [OASIS, SOFA, SAPS, SAPS II, APS III and LODS] calculated on the first day of ICU admission,” Dr. Parreco said.

The researchers created three different machine learning classifiers, or algorithms, using gradient boosted decision trees. The classifiers were trained to predict mortality in the ICU using different variables: One used severity of illness scores alone, one used physician notes alone, and one used the two in combination.

“We performed NLP on these physician notes using term frequency–inverse document frequency. This value increases proportionately every time a word appears in a note, and it’s offset by the frequency of the word used in all the notes evaluated, thus controlling for some words that are generally more common than others,” Dr. Parreco explained.

They performed cross evaluation of the classifiers by dividing the data into 10 subsets with equal proportions of outcomes, using nine groups for training and one for validation.

All told, they identified 3,838 surgical ICU stays, of whom 208 (5.4%) died. The classifier trained using severity of illness scores in combination with physician notes performed better than the other two classifiers for predicting mortality, with an area under the curve of 0.88. The area under the curve for severity of illness scores alone was 0.86, and 0.84 for physician notes alone. All other markers of performance—precision, accuracy, sensitivity and specificity—also were higher for the classifier that combined severity of illness scores with physician notes.

“When we combine physiologic patient parameters with NLP of physician documentation, we can improve mortality predictions by identifying terms that appear more frequently than a single diagnosis code,” Dr. Parreco said. “These classifiers are able to weigh the terms and appropriately further improve these mortality predictions.”

Dr. Parreco presented a sample note in which the chief complaint is a large intracranial hemorrhage. “The note goes on to discuss a consultation of neurosurgery for hemorrhage, the extent of the hemorrhage; and the diagnosis code mentions hemorrhage. So for this patient, the term ‘hemorrhage’ appears more frequently than it would if it were just in the diagnosis code alone.”

Dr. Parreco presented his research at the 2018 Southeastern Surgical Congress.

John H. Stewart IV, MD, MBA, an associate professor of surgery at Duke University School of Medicine, in Durham, N.C., said as he reviewed Dr. Parreco’s paper, he was impressed by the growth of AI over the last 20 years. “We’re moving from artificial intelligence to augmented intelligence,” he said.

“But language is a very complex tool, influenced not only by intelligence but by mood, and shaped by experience. How do you control for the level of experience of the note writer?”

Dr. Stewart also asked about future implications of the research. “Can we make broad applications of your findings for important advances, including personalized medicine and population management?”

Dr. Parreco said the database was queried for the first note written by a physician, as opposed to a nurse or anyone else who might write an assessment. “Most of them were written by the resident,” he said.

Moving forward, Dr. Parreco and his colleagues want to find a way to use large freely available databases such as the MIMIC-III to train the predictive models, test the validity of those models using their own actual patients, and compare physicians with access to those models with physicians who do not have such access and see whether there is any difference in treatment decisions.

“We’re trying to tease out the subtle nuances, looking at the different writers of these notes, the words they use, and how those words can be used to predict and inform,” he said.

He noted that the MIMIC-III database included many medical patients in addition to surgery patients, and that the classifiers worked better for the latter. “This is something we want to explore. We think it may be that notes for the surgical ICU patients may be more detailed, or maybe the medical notes are just template driven. We definitely need the whole note.”