Article Text

User-centred design for machine learning in health care: a case study from care management
  1. Martin G Seneviratne1,
  2. Ron C Li2,
  3. Meredith Schreier1,
  4. Daniel Lopez-Martinez1,
  5. Birju S Patel1,
  6. Alex Yakubovich1,
  7. Jonas B Kemp1,
  8. Eric Loreaux1,
  9. Paul Gamble1,
  10. Kristel El-Khoury1,
  11. Laura Vardoulakis1,
  12. Doris Wong1,
  13. Janjri Desai3,
  14. Jonathan H Chen2,
  15. Keith E Morse2,
  16. N Lance Downing2,
  17. Lutz T Finger1,
  18. Ming-Jun Chen1 and
  19. Nigam Shah2
  1. 1Research, Google Inc, Mountain View, California, USA
  2. 2Department of Medicine, Stanford University, Stanford, California, USA
  3. 3Division of Pharmacy, Stanford Medicine, Stanford, California, USA
  1. Correspondence to Dr Martin G Seneviratne; martsen{at}


Objectives Few machine learning (ML) models are successfully deployed in clinical practice. One of the common pitfalls across the field is inappropriate problem formulation: designing ML to fit the data rather than to address a real-world clinical pain point.

Methods We introduce a practical toolkit for user-centred design consisting of four questions covering: (1) solvable pain points, (2) the unique value of ML (eg, automation and augmentation), (3) the actionability pathway and (4) the model’s reward function. This toolkit was implemented in a series of six participatory design workshops with care managers in an academic medical centre.

Results Pain points amenable to ML solutions included outpatient risk stratification and risk factor identification. The endpoint definitions, triggering frequency and evaluation metrics of the proposed risk scoring model were directly influenced by care manager workflows and real-world constraints.

Conclusions Integrating user-centred design early in the ML life cycle is key for configuring models in a clinically actionable way. This toolkit can guide problem selection and influence choices about the technical setup of the ML problem.

  • Machine Learning
  • Decision Support Systems, Clinical
  • Software Design

Data availability statement

No data are available.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Despite the proliferation of machine learning (ML) in healthcare, there remains a considerable implementation gap with relatively few ML solutions deployed in real-world settings.1 One common pitfall is the tendency to develop models opportunistically—based on availability of data or endpoint labels—rather than through ground-up design principles that identify solvable pain points for target users. There is a long history of clinical decision support tools failing to produce positive clinical outcomes because they do not fit into clinical workflows, cause alert fatigue or trigger other unintended consequences.2 3 Li et al introduced a ‘delivery science’ framework for ML in healthcare, which is the concept that the successful integration of ML into healthcare delivery requires thinking about ML as an enabling capability of a broader set of technologies and workflows rather than the end product itself.4 However, it is still unclear how to operationalise this framework, particularly how to select the right healthcare problems where an ML solution is appropriate. As ML becomes increasingly commoditised with advances like AutoML,5 the real challenge shifts towards identifying and formulating ML problems in a clinically actionable way.

User-centred or human-centred design principles are recognised as an important part of ML development across a range of sectors.6 Here, we introduce a toolkit for user-centred ML design in healthcare and showcase its application in a case study involving care managers. There were an estimated 3.5 million preventable adult hospital admissions in the USA in 2017, accounting for over US$30 billion in health care spend.7 Care management aims to assist high-risk patients in navigating care by proactively targeting risk factors via social and medical interventions. In this case study, we provide practical guidance for ‘understanding the problem’ and ‘designing an intervention’ (stages in the Li et al framework) via user-centred design principles. We draw on cross-domain resources, specifically the Google People+AI Research guidebook8 and the Stanford design thinking framework (Empathise/Define/Ideate/Prototype/Test),9 which we adapt for a clinical setting.


Figure 1 illustrates the toolkit. First, a problem must satisfy a set of ‘hurdle criteria’: is it worth solving? Specifically, the problem must be associated with significant morbidity or clinical burden, have evidence of modifiability and have adequate data for ML techniques. For candidate problem areas, there are then four key user-centred questions that must be answered:

Figure 1

Toolkit for integrating user-centred design into the problem definition stages for ML development in healthcare. ML, machine learning; UXR, user-experience research.

Q1. Where are the current pain points?

Q2. Where could ML add unique value?

Q3. How will the model output be acted on?

Q4. What criteria should the model be optimised for?

The above toolkit was applied through a series of six user-experience research (UXR) workshops with multidisciplinary stakeholders, including care managers, nurses, population health leaders and physicians affiliated with a managed care programme at Stanford Health Care. Workshops were conducted virtually and were approved by Stanford and Advarra Institutional Review Boards, with consent obtained from all participants.

The schedule of workshops is detailed below:

  • Workshop 1 focused on mapping existing workflows. The output was a set of process maps, annotated with pain points.

  • Workshop 2 focused on where ML could add unique value (Q2). This yielded a mapping between pain points and possible ML formulations, categorised into automation (replicating repetitive, time-consuming tasks) versus augmentation (adding superhuman functionality).8

  • Workshops 3 and 4 focused on how a model output would be acted on (Q3). Low-fidelity study probes were developed—storyboards of how an ML tool might fit into a clinical workflow. These were presented to participants for feedback and refined iteratively.

  • Workshops 5 and 6 explored ML evaluation metrics for the most promising concept designs. This included how care managers would expect results to be presented and any auxiliary information required alongside the main model output (Q4)


What are the current pain points?

The following pain points were identified:

  1. Identifying and prioritising the highest risk patients.

  2. Extracting relevant risk factors from the electronic health record.

  3. Selecting effective interventions.

  4. Evaluating intervention efficacy.

Where could ML add unique value?

Risk stratification (pain point number 1) emerged as an opportunity for ‘augmentation’ given the challenges in forecasting future deterioration. The ML formulation was a model to predict adverse outpatient events, with emergency department visits and unplanned chronic disease admissions chosen as the prediction endpoints (online supplemental table S1). Identifying risk factors (pain point number 2) was classed as an opportunity for ‘automation’ given that there is a large volume of unstructured clinical data to sift through. The proposed ML formulation was a natural language processing tool for summarising clinical notes and extracting modifiable risk factors. Selecting interventions and evaluating efficacy (pain points number 3 and 4) were also classed as augmentation opportunities. The ML formulation involved causal inference approaches to estimate individualised treatment effects.

Supplemental material

How will the model output be acted on?

Online supplemental figure S1 shows example workflows and storyboards addressing the first two ML formulations above. The actionability pathway for risk scores and personalised risk factor summaries is that care managers can more rapidly prepare for calls and more effectively target their calls to patients with modifiable risk. These risk summaries could be presented to care managers on a monthly basis alongside the existing rule-based lists for high-risk patients. To mimic the existing workflow, the triggering frequency for inference was set as monthly and the inclusion criteria were tailored to fit the managed care population.

What criteria should the model be optimised for?

Since care managers have a limited capacity of patients whom they can contact, precision (positive predictive value) at c (where c is capacity) was selected as the primary evaluation metric. The value of c could be set either as a percentage of the total attributable population (more generalisable across health systems) or as a fixed value (more realistic given care manager staffing does not directly scale with patient load). We also selected realistic baselines to compare the ML models against—namely, rule-based risk stratification heuristics such as selecting recently discharged patients or those with high past utilisation.10


We applied a practical toolkit for user-centred design, involving four key questions about pain points and ML formulations, via a series of participatory design workshops with care management teams. This guided us towards the pain points of outpatient risk stratification and risk factor identification, with ML formulations involving personalised risk scoring and extraction of potentially modifiable risk factors from the notes. Critical choices about the setup of the ML model were informed by workflow considerations—namely, the endpoint definition, the triggering frequency and the inclusion criteria. Importantly, the evaluation metrics must be tailored to a care management workflow. In this case, there was a capacity constraint on how many patients a care manager can contact each day or week. Hence, the most pragmatic metric was the precision of the model on the top c highest risk patients, rather than global accuracy metrics such as area under the curve of the receiver operating characteristic (ROC-AUC) or precision recall curve (PR-AUC).

This study is limited in only focusing on a single clinical use-case and only using workshops and concept probes as a medium for UXR, given the challenges around direct field observation during the pandemic. Future work will showcase the results of the ML models generated from this UXR collaboration.


User-centred design is important for developing ML tools that address a real clinical pain point and dovetail with existing workflows. An iterative approach involving stakeholder interviews and concept feedback can be used to identify pain points, pinpoint where a model could add unique value, understand the actionability pathway and prioritise evaluation metrics.

Data availability statement

No data are available.

Ethics statements

Patient consent for publication


Special thanks to the leadership and clinical staff of the Stanford University HealthCare Alliance.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Twitter @martin_sen

  • MGS and RCL contributed equally.

  • Contributors MGS, RCL, NS devised the project. MS led the UX study sessions with support from MGS, LV, DL-M, DW, KE-K, LTF, JBK, EL, M-JC, PG, RCL, KEM, NLD, JD, JHC. MGS and RCL wrote the initial draft and all other authors reviewed the manuscript.

  • Funding This study was funded by Google (N/A).

  • Competing interests This research was funded by Google LLC. LF is now an employee of Marpai Inc. but work for this paper was completed at Google.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.