Discussion
In this clinical outcomes study, we tested the hypothesis that use of an MLA for severe sepsis detection and prediction would result in reductions of adverse sepsis-related clinical outcomes. The design of this study involved minimal to no risk of patient harm, but offered potential benefits to both patients and providers. In particular, the algorithm’s ability to identify patients with severe sepsis prior to onset provided a significant opportunity for early intervention. Prior studies have shown that early detection or prediction of sepsis and severe sepsis, respectively, can lead to a decrease in the time to administration of antibiotics,40 53 and early intervention has been shown to reduce rates of patient mortality.54–56 Use of the MLA in this study was associated with a 39.5% reduction of in-hospital mortality (p<0.001), a 32.3% reduction in LOS (p<0.001) and a 22.7% reduction in 30-day readmissions (p<0.001).
Improvements in clinical outcomes were calculated by comparing outcomes before algorithm implementation with outcomes after implementation. Not all data fields were available for abstraction at all nine participating hospitals. In cases where pre-implementation measurements were not available, the first month of clinical implementation was used as an approximate baseline. During this initial period, the MLA alert sensitivity, specificity and clinical response were undergoing evaluation and development, and therefore, did not represent the final state of the MLA alert and response. However, including the use of this period as a baseline may result in an underestimation of the effect of the MLA, compared with the pre-implementation period.
Results from the clinical outcomes analysis indicate that the algorithm has a more significant effect on improving clinical outcomes than other screening tools such as MEWS, SOFA and SIRS.25–28 For example, in a prospective comparative analysis of qSOFA and SIRS for predicting adverse outcomes of patients with suspicion of sepsis, discrimination of in-hospital mortality using the SIRS score was reported to be significantly less than that of the qSOFA score, with an overall in-hospital mortality rate of 19%.25 A pre-implementation and post-implementation study evaluating the effect of an SIRS-based sepsis early warning system that monitored SIRS criteria along with signs of organ dysfunction (based on systolic blood pressure and serum lactate thresholds), found that while the tool prompted more timely sepsis care, there was no significant reduction in mortality.53 In a comprehensive review of peer-reviewed literature to evaluate the effect of MEWS on improving clinical outcomes, limited data and no clinical trials which linked use of MEWS scoring systems to ‘robust’ outcomes were found.26 An analysis of a variety of disease severity scoring systems for the prognostic assessment of septic patients revealed that SOFA and MEWS showed only moderate discrimination in predicting 28-day mortality rates.28 Beyond the simple heuristics of rules-based scoring systems such as MEWS, SOFA, qSOFA and SIRS, several machine learning approaches have been retrospectively evaluated for the detection and prediction of incipient sepsis.37 38 57–66 They include dynamic Bayesian networks,60 support vector machines,57 survival-analytical models (TREWScore, Artificial Intelligence Sepsis Expert),61 62 smoothed disease severity score learning,63 hierarchical switching linear dynamical systems,64 autoregressive hidden Markov models,65 free-text models38 and random-forest models.57 These tools contribute notably to the field of sepsis detection because they offer generalisability, are scalable, and can be updated as new information is acquired.58 However, many do not use information about measurement trends or correlations,67 or do so ineffectively. Most machine learning approaches have been evaluated only on retrospective data as proof-of-concept.37 57 58 60 62 64–66 There remains an ongoing need for research which evaluates the clinical utility of sepsis prediction models in prospective and real-world settings.
Towards this end, Nelson et al conducted a prospective trial of a real-time electronic surveillance system to expedite early care of severe sepsis.67 Outcome measures were rate and timeliness of sampling of blood lactate and blood cultures, performance of chest radiography and provision of antibiotics; however, only time to blood culture was significantly improved. The primary limitation of the trial was cited as the inability to detect severely septic cases before caregivers. Umscheid et al conducted a real-world pre-implementation and post-implementation study of an early warning and response system (EWRS) for sepsis outside of the ICU.68 The EWRS identified at risk patients with a sensitivity of 16% and a specificity of 97%. Compared with a control period, the EWRS activated in the post-implementation period resulted in an increase in ICU transfer <6 hours after alert (p=0.06). However, additional outcome measures of hospital LOS (p=0.92), ICU transfer ﹤24 hours after alert (p=0.20), renal replacement therapy ≤6 hours after alert (p=0.51) and all-patient mortality reductions (p=0.45) failed to reach statistical significance.68 Austrian et al performed a time-series study which evaluated an electronic surveillance system on mortality and LOS on emergency department patients with severe sepsis or septic shock,69 finding a modest decrease in LOS (16%) that did not reach statistical significance, with no difference in-hospital mortality or other intermediate outcome measures. Alert fatigue due to low positive predictive value (PPV) (0.146) was proposed as the primary contributor to these results, and researchers noted that more sophisticated approaches to early sepsis identification are needed to consistently improve patient outcomes. Importantly, the study supports the principle that high PPV is critical for effective clinical decision support interventions.69 The early and accurate alerting system introduced in our study is associated with an LOS reduction of 32.3%, a mortality reduction of 39.50% and obtains a PPV of approximately 40% for sepsis prediction as demonstrated in prior work.44
In addition to clinically improving patient outcomes, the sepsis prediction tool analysed in this study also provides economic advantages. The cost of severe sepsis has been reported to extend ‘well beyond’ patient impact, as a large part of the sepsis economic burden is incurred after discharge and during rehospitalisation.70 Administration of timely treatment is therefore crucial to reducing costs, reducing rates of readmission and improving treatment outcomes. This clinical outcomes study provides a prospective analysis of machine learning algorithm performance in the sepsis care domain. To the extent possible, we also calculate a first order approximation of cost reductions incurred through reductions in LOS from the use of the algorithm. The average LOS reduction was found to be 1.56 days. At an average per diem cost of care of US$2271 with 343 patients included per month at the nine locations, the reduction of LOS translates to approximately US$14.5 million of annual cost savings across all nine hospitals included in this analysis. These findings on post-marketing real-world data confirm pre-marketing randomised clinical trial results.40 Previous research has shown that early detection and treatment of sepsis can improve patient outcomes and reduce hospital costs.12 13 71 72
Our real-world data analysis has several limitations. We cannot guarantee the usefulness of resulting alerts to clinicians, which indicates a need for future studies which include qualitative analysis of algorithm utility (ie, clinician surveys or interviews). Future work including clinician surveys would also help to determine how clinicians responded to the alerts, including any diagnostics tests or treatment interventions that were ordered. This level of detail would be helpful in assessing the potential means through which the observed positive impacts on patient outcomes were achieved. Ideally, this future work would also include the clinical adjudication of sepsis onset times, instead of defining onset times in terms of gold standard criteria, so that the extent to which alerts were accurate and early could be determined. Further, although machine learning systems have made significant advances in the healthcare domain over the past decade, it is important to consider the unintended ways in which they impact clinical practice. Unintended consequences of machine learning in medicine include an over-reliance on the capabilities of automation; a lack of contextual information which may lead to diagnostic misinterpretation; and observer variability affecting the accuracy and reliability of machine learning performance.73 However, it should be noted that risks of machine learning are minimised by screening or ‘sniffer’ algorithms such as this MLA, which are designed to increase clinician oversight for high-risk cases, and not to replace expert clinical judgement and standards of care. Other limitations of our study include variation in clinician and team responses to patients at possible risk for sepsis. Only adults in US hospitals were included in the study. While nine diverse hospitals were included in the analysis, these hospitals may not be representative of all US hospitals or international hospital settings. Data were not available from all hospitals for all months and outcome measurements. Baseline data were not available for all hospitals and the first month of MLA data was used as an approximation in these cases. This may lead to an underestimation of the effect of the MLA at these sites. However, the analysis was repeated on a subset of three hospitals with at least 1 month of baseline pre-MLA implementation data and outcomes were similar. This study did not follow patient mortality after hospital discharge. We cannot eliminate the possibility that implementation of a sepsis algorithm raised general awareness of sepsis within a hospital, which may lead to higher recognition of septic patients, independent of algorithm performance.