Original Research

Novel machine learning model for predicting multiple unplanned hospitalisations

Abstract

Background In the Australian public healthcare system, hospitals are funded based on the number of inpatient discharges and types of conditions treated (casemix). Demand for services is increasing faster than public funding and there is a need to identify and support patients that have high service usage. In 2016, the Victorian Department of Health and Human Services developed an algorithm to predict multiple unplanned admissions as part of a programme, Health Links Chronic Care (HLCC), that provided capitation funding instead of activity based funding to support patients with high admissions.

Objectives The aim of this study was to determine whether an algorithm with higher performance than previously used algorithms could be developed to identify patients at high risk of three or more unplanned hospital admissions 12 months from discharge.

Methods The HLCC and Hospital Unplanned Readmission Tool (HURT) models were evaluated using 34 801 unplanned inpatient episodes (27 216 patients) from 2017 to 2018 with an 8.3% prevalence of 3 or more unplanned admissions in the following year of discharge.

Results HURT had a higher AUROC (84%, 95% CI 83.4% to 84.9% vs 71%, 95% CI 69.4% to 71.8%) than HLCC, that was statistically significant using Delong test at p<0.05.

Discussion We found features that appear to be strong predictors of admission risk that have not been previously used in models, including socioeconomic status and social support.

Conclusion The high AUROC, moderate sensitivity and high specificity for the HURT algorithm suggests it is a very good predictor of future multi-admission risk and that it can be used to provide targeted support for at-risk individual.

What is already known on this topic

  • Case-finding algorithms for identifying patients at risk of unplanned readmissions traditionally have focused on detecting one or more admissions over a 30-day to 365-day period from discharge. This study focuses on patients who have more frequent admissions, and aims to predict three or more unplanned admissions within 365 days of an index admission. This allows for better targeting of patients that would otherwise use more resources. Accurately predicting those who represent the highest hospital use is likely to lead to greater healthcare cost savings.

What this study adds

  • This study presents an algorithm for identifying patients at risk of three or more unplanned admissions using not only clinical information, socioeconomic indicators and living arrangements, but also a novel cascading chronic condition feature.

How this study might affect research, practice or policy

  • This study demonstrates the importance of using a mixture of clinical, demographic and activity-based features for predicting patient outcomes. Our algorithm outperformed a similar algorithm that used existing weighted scoring approaches. The study demonstrates that machine learning-based methods for identifying patients who would benefit from targeted intervention have great potential in improving health system sustainability.

Background

Potentially preventable admissions include any hospitalisations for acute and chronic conditions that may have been avoided with earlier intervention and rehospitalisation within 30 days of discharge due to inadequate discharge and/or follow-up.1 In many high-income countries, potentially preventable hospitalisations have become an indicator of health system performance.2 Reported rates of preventable hospitalisation range from 5% to 79%.3 This wide range reflects not only differences in the definition of preventable admissions, but also geographical and socioeconomic differences in population composition.4 For hospitals, potentially preventable admissions increase hospital demand, lead to bed blocking and patient flow issues, and in Australia account for 10% of all occupied beds and more than 748 000 admissions per year.5

The cost of providing healthcare in most high-income countries is considered to be unsustainable and will likely be unaffordable by 2050 in the absence of major reforms.6 Identifying and averting these preventable hospitalisations is important for not only improving individual health outcomes but in controlling burgeoning healthcare expenditure. Case finding algorithms to identify those at highest risk of preventable hospitalisations are emerging as a key initiative that may allow for targeted care to prevent deterioration and future admissions.7 A highly sensitive and specific case-finding algorithm should be able to identify only those patients most likely to have high future healthcare costs or hospital resource use.

Internationally, there is a growing body of literature on algorithms that aim to predict the likelihood of future admissions using different models, including the traditional logistic regression model and survival analysis and more recently popular modelling using machine learning techniques.8 Many approaches focus on patients at risk in specific disease categories such as chronic obstructive pulmonary disease (COPD),9 stroke/Transient Ischaemic Attack (TIA),10 diabetes11 or heart failure.12 Others focus on unplanned readmissions, usually within 30 days of discharge, for any-cause using non-linear models,13 gradient boosted decision trees14 or artificial neural networks.15

In 2016, the Victorian Department of Health and Human Services (DHHS) initiated the HealthLinks Chronic Care programme (HLCC) that provided an alternative capitated funding model for patients with chronic and complex health conditions who were at high risk of multiple unplanned admissions. A key component of the programme was the use of a predictive algorithm called the HLCC model. The HLCC model uses an index unplanned admission as a triggering event and then combines diagnostic information from that admission and demographic information to create a ‘risk score’ for the probability of another three or more admissions in the next 12 months. Patients who score above a threshold value determined by logistic analysis of historical data are eligible to be included in the HLCC programme, receiving targeted preventative care.16 The HLCC risk score was found to have a sensitivity of 41% and specificity of 78% over the 2-year evaluation across five participating Victorian health service providers.17 The low sensitivity suggests that there are potentially many patients who would benefit from targeted intervention who are not being identified by this algorithm and the moderate specificity suggests that efforts with targeted intervention was wasted on some individuals who would not have gone on to have a preventable admission. This paper describes the development, and content, of a machine learning case-finding prediction tool with a higher sensitivity and specificity for identifying patients that are at high risk of all-cause potentially avoidable admissions within 12 months of discharge in an Australian setting.

Methods

Setting

This was a single-centred study based at Northern Health (NH). NH is the major provider of acute (410 beds), subacute (251 beds) and ambulatory specialist services in Melbourne’s north. Residents originate from more than 184 countries, speak more than 106 languages and have lower levels of income, educational attainment and health literacy, and higher rates of unemployment than state averages.18 The emergency department at NH is the busiest in the state with over 100,000 presentations per year.19

Study design and data sources

Participants

Eleven years of historical NH acute Inpatient (IP) emergency admitted episodic level data was used from 1 July 2008 to 30 June 2019 to build and test a new model that we named the HURT (Hospital Unplanned Readmission Tool) model. Outpatient (OP) and emergency department (ED) data were also linked to the unplanned IP activity. In addition, the Index of Relative Socio-economic Advantage and Disadvantage (IRSEAD) from the Australian Bureau of Statistics (ABS) Socio-Economic Indexes for Areas (SEIFA) data set were linked to patient’s residential postal address.

An unplanned admission is an unexpected or sudden health issue or event that results in an emergency admission. We only included acute IP episodes where the patient was 18 years or older at admission, the admission was not related to mental health, obstetrics, oncology or renal dialysis and the patient did not die during the episode. If there were any records that contains missing values then they were discarded.

Table 1 presents the summary statistics of the data used including demographic information and the features used for the final model. Where the percentages are the proportion of separations with the given flag. The features are defined in table 2 later in the paper. These data were included as DHHS and other jurisdictions have these data readily available. Data such as pathology and pharmacy were not included as the DHHS does not have this.

Table 1
|
Summary statistics of the 11 years of data with mean value or percentage of separations with the relevant flag
Table 2
|
List of variables in final model for predicting three or more unplanned admissions

Patient and public involvement

Patients and the public were not involved in the design of this work.

Variable selection for the HURT

Variable (feature) selection is a manual or automatic process by which variables that have the highest impact on model performance (in this case prediction of future unplanned admissions) are selected and variables that do not help learning are discarded.

The Boruta R package was used to develop the HURT. Boruta R uses a novel feature selection algorithm that finds all relevant variables, where relevant means variables that are found to be associated with unplanned emergency admissions. Boruta can use a range of decision trees to derive the importance of each feature. Extreme Gradient Boosting (XGBoost) was used to measure the feature importance in the Boruta algorithm with 200 maximum runs to ensure feature importance was fully resolved.

We also created a novel feature which we called a cascading chronic condition flag. If a patient was coded with a chronic condition, all subsequent episodes were also flagged for this chronic condition (ie, if a patient was diagnosed with COPD, all following episodes would be flagged with this condition, when previously this would not occur if their admission had been for a different reason). Hence, this information can be used by the machine learning (ML) modelling to help predict future unplanned admissions.

Weighting variable importance

Over the past decade, ‘black box’ machine learning algorithms have been increasingly used in critical decision-making processes. However, because it is unclear or unknown how the machine learning algorithm decides there have been reports of adverse results in some fields.20

To overcome this problem, we used interpretable models that allow for an understanding of why the machine learning algorithm makes particular decisions on individual cases. The SHAP (Shapley Additive exPlanations) score was used as it assigns each variable an importance value for each decision outcome. The SHAP score can then be visualised to illustrate how the decision tree-based machine learning is making a given decision in an interpretable manner.21

Training and optimisation of the model

HURT is trained and tested on historical data where we know in advance if a patient had three or more unplanned admissions 1 year from the index admission. We define this as the ‘target’ for the model to be trained and tested on.

The XGBoost machine learning algorithm uses an ensemble (collection) of weak decision trees that are sequentially created to progressively improve (ie, boosting) the learning performance.22 This has the advantage of quick training and has been shown to perform well on unseen test data. It also has the advantage of being able to handle unbalanced data where there are fewer positive cases (ie, patient returned three or more times in the future) compared with the negative cases (ie, patient did not return three or more times).

Like most machine learning algorithms, XGBoost has a set of training parameters that impact the final model performance. The parameter values that maximise performance cannot be determined by analysing the data only. These can only be found by trying different training parameters and measuring the model performance.23 Hence, we performed hyperparameter optimisation by using a simple grid-search over a range of parameter values. The optimal XGBoost parameters where Eta=0.05, Max Depth=4, Gamma=0, Colsample_bytree=1, min_child_weight=2, Subsample=0.5, Nrounds=400.

The 11 years of historical data were divided into training and testing phases. The optimal model parameter values that produced the highest area under the reciever operator curve (AUROC) performance using 10-fold cross-validation on the training data (9 years, 171 913 separations, 98 527 patients) were used. Testing was performed on the episodes that were discharged in 2017–18 (1 year, 34 801 separations, 27 216 patients), but 2018–2019 data were needed to count the unplanned admissions up to 1 year from 2017 to 18 discharge. The training and testing phase were performed by using the Caret R-package.24

Final variables selected for the HURT

Table 2 provides an overview of the final 18 features selected for HURT from an initial set of 199 features. The definition of all features tested are available as online supplemental material 1.

The performance of the HURT was assessed retrospectively by calculating the AUROC, sensitivity which is the percentage of separations where the patient was correctly predicted to have three or more potentially avoidable admissions in the 12 months following their discharge. We also assessed the specificity of the model (the percentage of patient separations incorrectly predicted to have three or more unplanned admissions). The higher the sensitivity of the model, the more patients correctly identified and the less that will be ‘missed’ and have a potentially preventable readmission.

Comparison

The primary comparison of our algorithm is with the DHHS HLCC algorithm using AUROC, sensitivity and specifity. We also compare the decisions made by HURT and HLCC on the same separations and illustrate the differences by a Venn diagram. Of particular interest is the false-positives (FP) where a patient is falsely flagged as returning three or more times and will be offered support services. This means resources may be used for patients that were not going to return. The false-negative (FN) cases are of concern as these patients are not flagged, and will not be offered extra services, thus returning three or more times since discharge. This places a strain on hospital resources that could have been reduced but more importantly potential missing patients that may have deteriorated.

Given the lack of existing research using three or more unplanned admissions within 1 year of discharge. We also applied the previously described methodology to predicting if a patient has one or more unplanned admissions within 1 year. Our model is compared with other case-finding algorithms that predict one or more unplanned admission over a year using weighted scores25 26 and machine learning.27 Even though this was not the focus of the research, it provided some reasurrance of the methodology.

Results

The optimised HURT model had a final test AUROC of 84% (95% CI 83.4% to 84.9%), while HLCC had an AUROC of 71% (95% CI 69.4% to 71.8%) (figure 1). The difference between HURT and HLCC ROC was statistically significant with Z=−22.6, p<0.001 (Delong test).

Figure 1
Figure 1

ROC test performance of HLCC and HURT models predicting three or more unplanned admissions. HLCC, HealthLinks Chronic Care; HURT, Hospital Unplanned Readmission Tool; ROC, receiver operating characteristic.

Using the confusion matrix in table 3, the HURT algorithm had a sensitivity of 57%, while HLCC had a sensitivity of 48%. The 9% difference was statistically significant with χ2=85.03, p<0.001 (McNemar test). The HURT algorithm achieved 90% specificity, while HLCC had a specificity of 88%. The 2% difference was statistically significant with χ2=94.99, p<0.001 (McNemar test).

Table 3
|
Comparison of the HURT and HLCC algorithms in identifying patients at risk of three or more unplanned readmissions

The Venn diagram in figure 2 provides an overview of the number of hospital unplanned admissions that were predicted by each of the HLCC and HURT models in terms of true-positive (TP) and FP cases. The number of separations that were predicted correctly (ie, the overlap between ‘returned≥3’, HURT and HLCC) are TP cases (HURT: 528, both: 1120, HLCC: 267). Where HURT has 261 more separations correctly classified compared with HLCC. While the FP cases (HURT: 1577, both: 1708, HLCC: 2175) show the HURT has 598 fewer FPs compared with HLCC. Both models missed 966 positive cases.

Figure 2
Figure 2

Venn diagram of number of separations that were predicted to return by machine learning, HLCC and overlap with the number of separations that actually returned three or more times. FP, false-positive; HLCC, HealthLinks Chronic Care; HURT, Hospital Unplanned Readmission Tool; TP, true-positive.

The 18 most important variables for predicting admission can be grouped into three: demographics (particularly age and marital status), medical conditions (complexity and cascading chronic conditions, in particular COPD and chronic cardiac failure) and past resource use (unplanned admissions, avoidable emergency presentations and failure to attend OP appointments). Figure 3 provides a SHAP plot of each of the 18 variables.

Figure 3
Figure 3

SHAP plot of impact of each feature on decision of XGBoost model. COPD, chronic obstructive pulmonary disease; ED, Extreme Gradient Boosting; HIP, Health Independence Program; LOS, Length of Stay; OP, outpatient; IRSEAD, Index of Relative Socio-economic Advantage and Disadvantage; SHAP, Shapley Additive exPlanations; XGBoost, Extreme Gradient Boosting.

Tables 4 and 5 present the test AUROC, sensitivity and specificity of the proposed algorithm and other models both in Australian and internationally for comparison along with the 95% CIs. Not all the referenced papers provide full details on the data sizes and performance values for their models.

Table 4
|
Summary of test performance for predicting three or more unplanned admissions within 1 year of discharge for different case finding algorithms
Table 5
|
Summary of test performance predicting one or more unplanned admissions within 1 year of discharge for different case finding algorithms

Discussion

The HURT algorithm had an AUROC of 84%, sensitivity of 57% and specificity of 90%. In the model, the 2% higher specificity for the HURT over the HLCC translated into 598 fewer FP and 261 more TP predictions in the 12-month time frame. The HURT algorithm also flagged more patients that would have benefitted from targeted services who went on to have two or less unplanned admissions over 12 months.

Even though these findings are for a tertiary hospital in the state of Victoria, there are still lessons that can be applied to the broader healthcare system across Australia and internationally. In the local Australian context, the Independent Health and Aged Care Pricing Authority may apply penalties for hospitals that treat what are deemed avoidable readmissions (less than 30 days). As the HURT model has a higher specificity than other Australian models, it may be a more cost-effective tool for Australian hospitals to use as it will select less FP, and therefore, prevent hospitals who use this model from being avoidably penalised.

Researchers based in the UK have developed a number of case-finding algorithms25–27 over the years. Direct comparisons with some of these other models is difficult given data scientists use different datasets (both in terms of data captured and patient cohorts), and different definitions of an unplanned admission and benchmarks for what is considered an acceptable number of these within a 12-month period. The UK models use one or more unplanned admissions of any cause as their benchmark,10 25–27 with the SPARRA V4 demonstrating the highest AUROC (80%) with a sensitivity of approximately 52% and a specificity of approximately 90%.27 Where sensitivity and specificity were estimated from figure 2 (a) ROC plot.27 Future algorithm research would benefit from application of consistent definitions so that developed algorithms may be tested and applied within different healthcare contexts (rural, remote and metropolitan) and countries.

Of particular interest in this study were the results from the SHAP scores for the importance of each feature in the HURT algorithm. Higher numbers of unplanned hospital admissions and ED admissions in the past year are shown to be important predictive features of future unplanned readmissions. In addition, lower socioeconomic status and lack of social support was predictive of unplanned readmissions, which was in agreement with SPARRA who used the Scottish Index of Multiple Deprivation using SHAP scores.21 Both QAdmission26 and SPARRA27 found pathology and medication history to be an important feature for prediction of readmission, which would explain their higher performance. These data were deliberately left out so that other jurisdictions could build our model. Our next version will include this data.

The limitation of this study is that it focuses on the application of machine learning to the problem of predicting if a patient would have unplanned readmissions given current and historical information for an index admission. Hence, we only examined the performance of HURT on NH data and compared to the HLCC which was used in several Victorian health services. The model has not been subject to external validation and may not work well in non-tertiary (hospital) sites. Further work will involve multiple phases. The first phase will be to evaluate HURT within a live production system, both in terms of classification performance (eg, sensitivity) and operationally (cost savings, cohort selection). The second phase will draw on the first to improve the HURT as it is a part of a broader system that will be evaluated and optimised. Patient cohorts will be examined for FP/FN to determine any striking features that can be used or enhanced to improve ML identification of patients that will have unplanned admissions. Finally, the aim is to use general practice, pharmacy and pathology data, patient survey data and sensor in the home to better predict patients likely to readmit. These data were not included in the current approach because it is not available to the Victoria Department of Health.

Conclusions

We developed the HURT based on the XGBoost ML algorithm. We also created novel features from hospital medical and administrative data called Cascading Chronic Conditions. The HURT algorithm was compared to the Victorian Department of Health HLCC scoring method for identifying patients at risk. The HURT model was found to have AUROC of 84%, sensitivity of 57% and specificity with 90%, 14%, 9% and 2% better than the HLCC, respectively. Future research will use pathology and pharmacy data with the aim of improving model performance.