Integration of AI into clinical systems
When automation started to be deployed at scale in industrial systems, human factors research on ‘automation surprises’ and the ‘ironies of automation’ explained some of the problems that appeared with the introduction of automation.13 14 The fundamental fallacy is the assumption that automation might replace people, but in actual reality, the use of automation changes and transforms what people do.15 Clinical systems are not necessarily comparable to commercial aircraft or autonomous vehicles. However, a look across these different industries can be useful to highlight potential human factors challenges that are likely to require consideration when adopting AI in patient care. The human factors challenges discussed further relate to cognitive aspects (automation bias and human performance), handover and communication between clinicians and AI systems, situation awareness and the impact on the interaction with patients (see figure 1).
Figure 1Overview of human factors challenges of using artificial intelligence in patient care.
Automation bias
Studies in aviation dating back to the 1980s and 1990s and analysis of incident reports recorded in the Aviation Safety Reporting System found that pilots frequently failed to monitor important flight indicators or did not disengage the autopilot and automated flight management systems in the cockpit in case of malfunction.16 17 For example, in 1985, the US National Transportation Safety Board (NTSB) investigated an incident involving a China Airlines Boeing 747 SP-9 flying from Taipei to Los Angeles. When the aircraft was close to San Francisco, an engine failed. The autopilot took mitigating actions but did not alert the pilots to this problem. The pilots only became aware of the engine failure when they disengaged the autopilot and the aircraft started rolling over and dived into an uncontrolled descent. The NTSB report concluded that ‘the probable cause of this accident was the captain’s preoccupation with an inflight malfunction and his failure to monitor properly the airplane’s flight instruments […] Contributing to the accident was the captain’s over-reliance on the autopilot […]’ (NTSB report AAR-86–03).
This phenomenon is referred to as automation bias or automation-induced complacency, and represents an example of inappropriate decision-making as a result of over-reliance on automation.18 Automation bias can lead to omission errors, where people do not take a required action because the automation failed to alert them, and it can lead to errors of commission, where people follow the inappropriate advice of an automated system.16
The speed with which people start to rely and over-rely on automation and AI-driven autonomous systems might come as a surprise to many as a recent study in the automotive domain has demonstrated.19 A sample of 49 experienced drivers were instructed on the limitations of a partially autonomous car and were asked to complete a 30 min commute for 1 week. By the end of the week, most of the drivers were not watching the road anymore and spent instead about 80% of their time on their smartphones or reading books and documents.
Healthcare is transitioning towards digital and AI-supported clinical environments at a rapid pace, and we can and should expect clinicians to come to trust and rely on the technology. This brings with it the risk of automation bias, and this can potentially affect clinician decision-making for millions of patients. Automation bias introduced with clinical decision support systems has been highlighted in a number of studies. An early study comparing the performance of radiologists interpreting mammograms found that under certain situations, the performance of expert radiologists deteriorated when supported by a decision support system that highlighted specific areas to focus on.20 A study investigating the impact of decision support on the accuracy of ECG interpretation found that while correct decision support classification increased clinician (non-cardiologist) accuracy, incorrect decision support classification decreased the accuracy of clinicians from 56% to 48%.21 Similar findings of the effects of clinical decision support were produced by another study looking at the impact of clinical decision support in electronic prescribing systems.22 The study found that clinical decision support reduced prescribing errors when working correctly but also increased prescribing errors by around one-third in cases where the system either did not alert the clinician to a potential problem or provided the wrong advice. A review of the literature on automation bias in healthcare identified six studies investigating the impact of automation bias on errors.23 The study concluded that task complexity (eg, diagnosis supported by a clinical decision support system) and task load (ie, the number of task demands) increased the likelihood of over-reliance on automation.
Many, if not most, AI systems will be advertised as having ultrahigh reliability, and it is to be expected that in due course, clinicians will come to rely on these systems. However, studies on automation bias suggest that the reliability figures by themselves do not allow prediction of what will happen in clinical use, when the clinician is confronted with a potentially inaccurate system output.20 How easy or difficult will it be to spot this, and how will the potential for automation bias be guarded against?
Impact on human performance
Expertise is built through frequent exposure and training. The current generation of human car drivers is reasonably skilled in managing complex traffic situations because many of us do it on an everyday basis. Will the generation that has grown up with autonomous vehicles have the same levels of basic driving skills that enable them to retake control in potentially highly time-critical and complex traffic situations when the AI system fails? This is particularly relevant in healthcare, where healthcare professionals take pride in their professional skill sets. Will the expertise of radiographers deteriorate when they are exposed only to specific images specifically selected by an AI system rather than the broad range of images they currently train on day by day?24
Ironically, AI algorithms are frequently trained and validated against baseline data developed from human performance (eg, radiologist reading of images), and the erosion of training opportunities and hands-on skills for clinicians as a result of introducing AI systems might create a vicious circle where the quality of baseline data deteriorates in the long term.
Handover
A key argument for the safety of autonomous vehicles is that the driver is able to take control in case of emergencies or unforeseen situations. However, the well-publicised fatal Tesla accidents of Josh Brown in 2016 and more recently of Jeremy Banner in March 2019 tragically demonstrate that drivers do not always take control from the autopilot when required. Research has put into question whether such an assumption is realistic in the first place, considering the short reaction time available.25
Handover is a well-recognised safety critical task in the delivery of care, although in traditional conception, we think of handover between clinicians or teams of clinicians.26 In the future, handover between humans and autonomous AI systems will become increasingly important, and one might assume that this will be even more complex than the handover between the autopilot and the driver of an autonomous vehicle.
The AI system needs to recognise the need to hand over. While this might be achievable, the AI also needs to figure out what to hand over, how this should be done and when. In human handover, we have recognised the need for structured communication protocols to convey clearly the salient features of a situation, for example, age, time, mechanism, injuries, signs, treatments (ATMIST) for emergency care or situation, background, assessment, recommendation (SBAR) more generally. Should there be an equivalent for human—AI handover?
For example, if an autonomous infusion pump delivering insulin starts to recognise that it is struggling to maintain blood sugar levels, at what point should it trigger an alarm to initiate handover? Identifying the precise moment requires trading off accuracy with timeliness. Should the handover simply convey the infusion pump’s inability to maintain blood sugar levels, or should the infusion pump provide further information about prior adjustments it made? Is the best strategy to wait for the infusion pump to trigger an alarm and initiate handover, or should we ensure that the clinician is enabled to recognise that a need to retake control will arise?
These questions are fundamentally about how the AI will support clinicians and clinical teams, and how their interaction can be optimised.
Situation awareness
Individuals and teams perform more successfully when they have good situation awareness.27 Traditional handover contributes to the development of shared situation awareness, and it enables discussion and dialogue.28 While it might be possible to create autonomous agents that have high reliability, questions arise about what the autonomous system should communicate to clinicians during normal operation to enable the clinician to maintain situation awareness. This is not straightforward to answer by looking simply at one AI system in isolation, because clinicians might be interacting with many autonomous agents (eg, multiple infusion pumps) concurrently, and the design of communication has to consider human information needs and limitations.
Autonomous agents need to build situation awareness, too. An autonomous infusion pump needs to know if the patient receives other medications that might affect the patient’s physiology and response. These medications might come via other infusion pumps or they might be given by the clinician. The saying ‘if it’s not documented, it didn’t happen’ applies here with critical consequence: if there are relevant activities going on that are not documented and communicated to the autonomous agent (eg, infusion pump), then as far as the AI is concerned, these literally did not happen because the system has no way of knowing about it. The results could be catastrophic.
Patient interaction
AI can improve efficiency of clinical processes and free up clinician time to undertake other tasks. This is potentially very useful in a pressured health system. However, another way of looking at this is that there might be smaller numbers of clinicians that have other tasks to do, potentially away from the patient. Might AI-enabled intensive care units make do with fewer nurses and therefore increase the number of patients per nurse? This might be a worry for patients, because they might see less of their clinicians, and they might find it harder to provide feedback about their care and their condition. For example, if a needle comes unstuck, the patient might be aware of this before the AI system—and could potentially avoid and mitigate any adverse effect—but who do patients communicate this to?
Providing healthcare means being responsive to a patient’s physiological as well as personal and emotional needs. In some clinical settings, such as the intensive care unit, the bond between nurse and patient is very strong, and for many patients, their episode in intensive care is traumatic. How will the introduction of AI and autonomous systems in these environments affect this unique relationship? It has been argued that AI might actually create more opportunities for empathy and caring because it might allow clinicians to focus more on these aspects of care.29 However, whether this is the case, or whether the caring aspect is eroded by transforming, for example, nursing care into AI specialist nurses who ‘care’ for autonomous systems (ie, supervise them), remains to be seen.
The introduction of AI at scale has the potential to fundamentally change and disrupt communication between patients and their clinicians. Will hospitals become similar to automated supermarket checkouts, with frustrated customers waiting for an overstretched employee to attend to the frequent hassles at the checkout? To date, these issues have received too little attention compared with the focus on accuracy and performance of the AI in isolation.