Implementer Report

Healthcare provider evaluation of machine learning-directed care: reactions to deployment on a randomised controlled study

Abstract

Objectives Clinical artificial intelligence and machine learning (ML) face barriers related to implementation and trust. There have been few prospective opportunities to evaluate these concerns. System for High Intensity EvaLuation During Radiotherapy (NCT03775265) was a randomised controlled study demonstrating that ML accurately directed clinical evaluations to reduce acute care during cancer radiotherapy. We characterised subsequent perceptions and barriers to implementation.

Methods An anonymous 7-question Likert-type scale survey with optional free text was administered to multidisciplinary staff focused on workflow, agreement with ML and patient experience.

Results 59/71 (83%) responded. 81% disagreed/strongly disagreed their workflow was disrupted. 67% agreed/strongly agreed patients undergoing intervention were high risk. 75% agreed/strongly agreed they would implement the ML approach routinely if the study was positive. Free-text feedback focused on patient education and ML predictions.

Conclusions Randomised data and firsthand experience support positive reception of clinical ML. Providers highlighted future priorities, including patient counselling and workflow optimisation.

Introduction

Artificial intelligence (AI) and machine learning (ML) has the potential to transform medical practice. Despite many retrospective studies, randomised controlled trials (RCTs), particularly interventional trials, remain limited.1–3 Thus, there have been limited opportunities to formally characterise barriers to the implementation of healthcare AI and ML and identify solutions.4 5 There are minimal reports describing provider opinions following a prospective randomised interventional study of healthcare ML.

One application of healthcare ML is in the prediction and reduction of acute care (emergency visits and hospitalisations) during outpatient cancer therapy,1 6–9 prioritized by the Centers for Medicare and Medicaid Services.10 The System for High Intensity EvaLuation During Radiotherapy study (SHIELD-RT; NCT03775265) was a randomised controlled quality improvement study of an ML model predicting acute care visits (emergency department visits and/or hospitalisation) during radiotherapy (RT) or chemoradiotherapy (CRT).1 ,6 ML identified high-risk patients for supplemental clinical evaluations, which reduced acute care rates from 22.3% to 12.3%, with low-risk patients experiencing a 2.7% rate. Radiation oncology care uniquely requires a diverse clinical staff, including attending and resident physicians, advanced practice provider (APPs), nurses and radiation therapists (RTTs), each with different viewpoints on how ML can optimally play a role in delivering care. Following the completion but prior to final analysis of SHIELD-RT, we administered a survey to understand the perspectives of healthcare providers with regard to the acceptability and feasibility of ML-directed strategies, addressing key components of the implementation outcomes framework.11 The objective was to evaluate specific barriers to planned long-term implementation.

Methods

We conducted a single institution survey of perceptions of SHIELD-RT, during which all outpatient adult courses of RT and CRT initiated from 7 January 2019 to 30 June 2019 were evaluated during the first week of treatment by ML to identify high-risk patients with >10% risk of an acute care visit during RT.1 6 Patients were randomised to standard of care (mandatory weekly on-treatment and clinically indicated ad hoc visits) versus mandatory twice-weekly visits. Interventional second weekly visits were facilitated through an alert that notified RTTs to bring patients to an appropriate clinic room to then be seen by an APP, nurse clinician, resident physician or attending physician. The primary endpoint was rate of acute care visits during RT. Additional details of SHIELD-RT and its primary analysis and implementation workflow were previously reported.1 12

Involved attending and resident physicians, APPs, nurses and RTTs were invited to participate in an anonymous survey to characterise workflow satisfaction and evaluation of potential barriers to future adoption. This included eight questions on a Likert-type scale characterising respondents’ attitudes with an optional free-text comment field.

Results

A total of 59/71 (83%) of invited staff completed the survey, including 14/16 attending physicians (MD), 9/9 resident physicians, 3/5 APPs, 10/11 nurses, 23/30 RTTs (table 1). Eighty-one per cent of staff disagreed or strongly disagreed that the study disrupted their workflow. Only 51% of respondents agreed or strongly agreed that they were aware of their patients undergoing the intervention; 3% agreed that their clinical management beyond the study intervention was altered. Of those aware of patients seen twice weekly, 67% agreed or strongly agreed that patients undergoing intervention were high risk. Most staff (64%) neither agreed nor disagreed that patients understood the study. Willingness for future adoption was favourable, as 75% of respondents agreed or strongly agreed that they would implement the intervention routinely if the study was positive; 41% agreed or strongly agreed and none disagreed that their opinion of clinical ML improved following the study.

Table 1
|
Responses to survey questions

There were 8 (16%) free-text comments. Three (two RTT and one nurse) indicated confusion among staff and patients with the need and logistics of the supplemental visit. One nurse noted that they felt ML overestimated the risk of their patients (specifically in brain tumours). Two MD responses indicated that they had minimal contact with patients on study. Two (one MD and one RTT) responses expressed anticipation for the results of the study.

Discussion

Our study highlights an overall positive reception towards ML implementation in an academic radiation oncology clinic. Our survey supports that RCT results drive willingness to routinely adopt clinical ML. ML-guidance and supplemental visits were integrated successfully into our clinical workflow with minimal perceived disruption.

This analysis shows how some concerns regarding ML may be overcome. In addition to randomised evidence, direction observation of ML operating in a controlled setting may have improved subjective opinions of clinical ML prior to the study. This is instrumental given recent data demonstrating the limitations of commercial prediction models,13 and ultimately, subsequent to this survey, the SHIELD-RT analysis demonstrated a reduction in acute care events.1 While ML will continue to require complementary input from healthcare professionals, these survey results are promising for adoption.14 Our clinic is currently incorporating this ML-directed clinical strategy into routine practice.

Overall, ML implementation had limited provider-perceived impact on clinical workflows, to the point of reducing MD awareness as indicated by survey responses. This was intentional in the design to minimise extra cognitive and functional effort to improve the likelihood of MD adoption.1 12 One relative exception to this was surveyed APPs, the majority of whom participated in the interventional second mandatory clinical evaluation. This suggests that ML-guided interventions may place greater burden on specific staff. This cost must be considered in model and interventional design.

Among limited free-text comments, staff reservations focused on patient education and ML risk predictions. Patients were not surveyed, although staff both anecdotally and in the survey highlighted logistical challenges surrounding location and timing of supplemental visits. While patients were educated when undergoing the supplemental evaluation, the neutral evaluation of patient understanding and anecdotal responses highlight the reported challenges of explaining the algorithm and its clinical implications to patients. This emphasises the need for transparent and explainable approaches, especially given increasingly opaque AI methods. Despite the single comment noting concern for overestimation, calibration analyses previously reported in the primary study results demonstrated good model performance in comparison to clinicians who were more inconsistent, with wide CIs, and assigned a 0% risk to a patient who had an acute care event.1 It is possible that over time, both improved explainability and consistent observation of ML accuracy may demonstrate longitudinal improvements in clinician perception.

There are limitations to our study. We surveyed staff only following completion of the study, and direct comparisons pre-SHIELD-RT and post-SHIELD-RT were not possible. The results of this survey may be subject to bias, though we had a high rate of completion (83%) across a range of roles, with a high representation of non-academic staff (61% of respondents; APPs, nurses and RTTs).

The results of this study inform our future directions, primarily emphasising the importance of RCTs in demonstrated clinical ML benefit and highlighting the need for concerted efforts in patient and staff education. Other ongoing work focuses on optimising workflows, patient logistics, long-term ML surveillance and generalisability.