While most health systems are at the beginning of the journey in using artificial intelligence and machine learning to predict surgical complications, surgeons at the forefront of this science are expanding our knowledge by investigating ways to overcome the obstacles to AI, as described in two recently published studies.
“These articles pose different questions, but what they have in common is that they’re both examining the challenges of bringing AI to bear on the problem of surgical site infection,” commented Philip S. Barie, MD, MBA, a professor emeritus of surgery at Weill Cornell Medicine, in New York City, and the executive director of the Surgical Infection Society Foundation for Education and Research.
“Studies of ssI prevalence are challenging to perform and interpret if not done prospectively, using trained observers inspecting each incision,” Dr. Barie told General Surgery News. “Retrospective studies always leave doubt as to what exactly was observed, whether patients were omitted inadvertently because of sporadic reporting from the outpatient setting, or if data reporting is incomplete. Moreover, thousands of patients are required to achieve adequate statistical power to study clean operations owing to the low prevalence of infection.”
One study, conducted by researchers at Mayo Clinic in Rochester, Minn., addressed the problem of missing data, which can skew retrospective analyses and subsequent prospective predictions of ssIs.
“The nice thing about machine learning is that it allows the system to refine a model as it evolves, as long as you can get data for the system to look at; we wanted to know what the impact of missing data is on the ability to model infections,” said Robert Cima, MD, a professor of surgery at Mayo Clinic College of Medicine and Science, in Rochester, Minn.
“What we found is that unless you do certain corrections, your model is going to suffer from it.”
To evaluate a method for handling missing data, Dr. Cima and his colleagues compared a Bayesian-Probit regression model with multiple imputation (BPMI) with a generalized linear model (GLM) in predicting colorectal deep organ-space ssIs (C-OSIs, e.g., postoperative intraabdominal abscess).
Among the 2,376 elective colorectal resections performed at Mayo Clinic between 2006 and 2014, the C-OSI rate was 4.6% (108). The BPMI model identified 57 of these patients: a sensitivity of 56%, compared with the GLM’s sensitivity of 47%. The BPMI model lost its advantage when the model was built to use extra-institutional data (i.e., based on the American College of Surgeons National Surgical Quality Improvement Program), which reduced its sensitivity to 47%.
They concluded that for optimal performance, the BPMI model should be built using “data specific to the individual institution” (Surg Infect 2021;22[5]:523-541).
“We’re going to be seeing more and more of these models, and people need to understand the limitations of them, and how to use them in their institution,” Dr. Cima said.
“My concern is that somebody will develop a big model based on a very heterogeneous data set that may not reflect the risk profile or the patient profile of an individual hospital. I’d hate to see them penalized or made to look like they’re not performing well when the model was never designed to be used in their environment,” Dr. Cima said.
Because retrospective chart review is cumbersome, other investigators have sought to automate the process using machine learning and natural-language processing. The other study, which specifically investigated the generalizability of ssI-detection machine learning–generated algorithms, found that machine learning models designed at one center worked just as well at another.
“We’re at the beginning of an acceleration of having machine learning and AI used more widely in health care, but the work to validate models isn’t always done optimally. In many instances, we expect it to be like a ‘plug-and-play’ technology, where you install the solution in and it works. But the truth is, in some cases there is a degradation in performance or the need for more optimization,” said Genevieve Melton-Meaux, MD, PhD, a professor of surgery and Institute for Health Informatics core faculty at the University of Minnesota Medical School, in Minneapolis.
To do so, Dr. Melton and her colleagues tested automated ssI-detection algorithms developed and validated using electronic health record (EHR) data from 8,883 patients at their institution, and then applied those algorithms to 1,473 patients at the University of California, San Francisco.
Looking at the detection of superficial, incisional, organ-space and total ssI complications, the researchers found no difference in area under the curve for any outcome. They concluded that the algorithms developed at one site are generalizable to another (J Am Coll Surg 2021;232[6]:P963-P971).
“Currently there is no standard way ssIs are documented in the EHR that would make it easier for a person to extract the data—if they are documented at all. Here, they’re using machine learning and AI to go through records looking for certain terms that correlate with the presence of an ssI, saying that the process of screening might be automated, with a particular advantage that the need to manually review low-risk cases might be eliminated,” Dr. Barie commented.
“Basically, they’ve developed a tool that makes it easier for the surveillance people to find these ssI cases accurately.”
So, what explains the discrepancy in generalizability between the two papers? Both Drs. Cima and Melton suspect it has to do with characteristics of the institutions, the types of patients they see and the way their surgeons practice, and the questions that each of the algorithms are designed to answer.
“In our case, it appears that what we used to build the model is robust and good, but it’s unclear if that would scale across the country. These were both academic health systems; it might be different at a smaller center, or with different patient populations or over time as surgical practices change,” Dr. Melton said.
“These are important questions that we’re going to need to be able to answer more and more.”