Introduction
Risk stratification tools that predict healthcare resource use are widely used in primary care settings.1–6 These tools are integral to population health management (PHM) strategies around the world, enabled by the availability of routinely collected data from sources such as electronic health records.7 Risk stratification tools typically use predictive models that are developed through statistical or machine learning (ML) techniques, to generate an individual risk score for some measure of resource use. These scores form a key component of anticipatory care pathways, where those at the highest risk may be targeted for specific interventions aimed at reducing future morbidity.8–11 The process by which these tools are ideally developed and deployed within healthcare systems is summarised in figure 1.
A growing body of literature describes the development and validation of risk stratification tools in the primary care setting reporting an acceptable discriminatory power for the majority of models.1 2 12 13 However, existing work broadly focuses on the assessment of model performance within retrospective datasets, with little attention paid to their efficacy in real-world settings, where the clinical impacts of deploying these algorithms within a population are assessed. Commercial literature asserts the efficacy of interventions based on algorithmic case selection in improving key outcomes, such as hospital admission rates, but suffers from a lack of transparency in data and methodology.14 15
Predictive models that appear accurate in development are increasingly found to be ineffective or unsafe when deployed in clinical pathways. Predictive performance may be diminished when translated to demographically and culturally distinct populations, or when deployed using electronic health data with differing characteristics. Differences in how healthcare resources are used in local settings, alongside inherent biases inlaid within such technologies, may result in varying clinical effectiveness from inconsistent intervention thresholds, variation in the physical clinical interventions that are deployed, to sociotechnical variation across end-users and processes.16–20 Resultantly, where an algorithm is deployed into an untested context without real-world evidence for a comparable integrated pathway, there are risks to both patient safety and exacerbation of healthcare inequalities through a lack of fairness in prediction or intervention allocation.
With extensive integration of risk stratification into pathways within primary care systems worldwide it is of paramount importance to establish the current evidence base on which these care-defining interventions can be appraised. We therefore systematically review the available literature concerning risk stratification tools for predicting future healthcare utilisation in primary care populations. We present three aims: (1) to update existing evidence for algorithmic solutions with attention paid to predictive performance and risk of bias in dataset evaluation, as well as real-world clinical outcomes; (2) to describe the transfer of algorithms from initial development to testing and deployment in different global contexts and (3) to evaluate risks in cross-context transfer and application. Based on our findings, we provide recommendations for the responsible evaluation and deployment of predictive risk stratification tools.