Effects of incorrect computer-aided detection (CAD) output on human decision-making in mammography

doi:10.1016/j.acra.2004.05.012

Academic Radiology

Volume 11, Issue 8, August 2004, Pages 909-918

https://doi.org/10.1016/j.acra.2004.05.012 Get rights and content

Rationale and objectives

To investigate the effects of incorrect computer output on the reliability of the decisions of human users. This work followed an independent UK clinical trial that evaluated the impact of computer-aided detection (CAD) in breast screening. The aim was to use data from this trial to feed into probabilistic models (similar to those used in “reliability engineering”) which would detect and assess possible ways of improving the human–CAD interaction. Some analyses required extra data; therefore, two supplementary studies were conducted. Study 1 was designed to elucidate the effects of computer failure on human performance. Study 2 was conducted to clarify unexpected findings from Study 1.

Materials and methods

In Study 1, 20 film readers viewed 60 sets of mammograms (30 of which contained cancer) and provided “recall/no recall” decisions for each case. Computer output for each case was available to the participants. The test set was designed to contain an unusually large proportion (50%) of cancers for which CAD had generated incorrect output. In Study 2, 19 different readers viewed the same set of cases in similar conditions except that computer output was not available.

Results

The average sensitivity of readers in Study 1 (with CAD) was significantly lower than the average sensitivity of readers in Study 2 (without CAD). The difference was most marked for cancers for which CAD failed to provide correct prompting.

Conclusion

Possible automation bias effects in CAD use deserve further study because they may degrade human decision-making for some categories of cases under certain conditions. This possibility should be taken into account in the assessment and design of CAD tools.

Section snippets

Material and methods

We first outline the data collection methods used in the HTA trial because we used essentially the same methodology in our follow-up studies. We then detail the specifics of our experiments, in particular their rationale and the methodologic aspects in which they differ from the original trial.

Supplementary analyses of data from the HTA trial

The administrators of the HTA trial compared the sensitivity and specificity of the readers in the unprompted condition with their sensitivity and specificity in the prompted condition. The analyses showed that the prompts had no significant impact on (either improved or diminished) readers' sensitivity and specificity (3).

We were granted access to the trial data and conducted supplementary analyses focusing on the instances in which the readers made different decisions for the same case

Discussion

Our supplementary analyses of the data from the HTA trial suggest that the output of the CAD tool did have an effect on the readers' decision-making even if there was no statistically significant effect on their average performance in terms of sensitivity and specificity. We cannot entirely exclude the possibility of the variations we observed being because of random error (eg, it is not uncommon for experts to change their decisions in successive presentations of the same case). However, our

Acknowledgment

The authors would like to thank R2 Technologies (especially Gek Lim, Jimmy Roerigh, and Julian Marshall) for their support in obtaining the data samples for our follow-up studies; Paul Taylor and Jo Champness (from University College London) for granting us access to their data, facilitating the follow-up studies, and helping run them; and DIRC collaborators Mark Hartswood, Rob Procter, and Mark Rouncefield for their advice.

References (11)

B. Littlewood et al.
Modelling software design diversity – A review
ACM Comput Surveys
(2001)
Strigini L, Povyakalo A, Alberdi E. Human-machine diversity in the use of computerised advisory systems: a case study....
P.M. Taylor et al.
An evaluation of the impact of computer-based prompts on screen readers' interpretation of mammograms
Br J Radiol
(2004)
US Food and Drug Administration. Pre-market approval decision. Application P970058. June 26, 1998. Available at...
R. Castellino et al.
Improved computer aided detection (CAD) algorithms for screening mammography
Radiology
(2000)

There are more references available in the full text version of this article.

Cited by (85)

Ethical Implications of Artificial Intelligence in Gastroenterology
2024, Clinical Gastroenterology and Hepatology
Stakeholder perceptions of the safety and assurance of artificial intelligence in healthcare
2022, Safety Science
Citation Excerpt :
The benefits might be particularly relevant in patients that require a lot of attention and receive several medications and interventions concurrently, as such situations are especially demanding. Similar expectations were held for previous generations of clinical decision support systems, as well as more broadly for highly automated systems across different industries, but numerous studies as well as accident investigations demonstrated that the assumption that automation reduces human error and thereby improves safety is overly simplistic (Cabitza et al., 2017; Bainbridge, 1983; Alberdi et al., 2004). However, some participants also pointed out that AI has the potential to cause or contribute to patient harm.
There is an increasing number of healthcare AI applications in development or already in use. However, the safety impact of using AI in healthcare is largely unknown. In this paper we explore how different stakeholders (patients, hospital staff, technology developers, regulators) think about safety and safety assurance of healthcare AI.
26 interviews were undertaken with patients, hospital staff, technology developers and regulators to explore their perceptions on the safety and the safety assurance of AI in healthcare using the example of an AI-based infusion pump in the intensive care unit. Data were analysed using thematic analysis.
Participant perceptions related to: the potential impact of healthcare AI, requirements for human-AI interaction, safety assurance practices and regulatory frameworks for AI and the gaps that exist, and how incidents involving AI should be managed.
The description of a diversity of views can support responsible innovation and adoption of such technologies in healthcare. Safety and assurance of healthcare AI need to be based on a systems approach that expands the current technology-centric focus. Lessons can be learned from the experiences with highly automated systems across safety-critical industries, but issues such as the impact of AI on the relationship between patients and their clinicians require greater consideration. Existing standards and best practices for the design and assurance of systems should be followed, but there is a need for greater awareness of these among technology developers. In addition, wider ethical, legal, and societal implications of the use of AI in healthcare need to be addressed.
We and It: An interdisciplinary review of the experimental evidence on how humans interact with machines
2022, Journal of Behavioral and Experimental Economics
Today, humans interact with automation frequently and in a variety of settings ranging from private to professional. Their behavior in these interactions has attracted considerable research interest across several fields, with sometimes little exchange among them and seemingly inconsistent findings. In this article, we review 138 experimental studies on how people interact with automated agents, that can assume different roles. We synthesize the evidence, suggest ways to reconcile inconsistencies between studies and disciplines, and discuss organizational and societal implications. The reviewed studies show that people react to automated agents differently than they do to humans: In general, they behave more rationally, and seem less prone to emotional and social responses, though this may be mediated by the agents’ design. Task context, performance expectations and the distribution of decision authority between humans and automated agents are all factors that systematically impact the willingness to accept automated agents in decision-making - that is, humans seem willing to (over-)rely on algorithmic support, yet averse to fully ceding their decision authority. The impact of these behavioral regularities for the deliberation of the benefits and risks of automation in organizations and society is discussed.
AI-assistance for predictive maintenance of renewable energy systems
2021, Energy
Citation Excerpt :
In addition, researchers in the medical field have examined the system’s effect considering the radiologist’s proficiency level for diagnosis tasks because radiologists may have different rooms for performance improvement being affected by the assistance system [21,36]. The effect of incorrect detection by CAD has been investigated at more specific levels, such as false positives and false negatives because the system cannot provide 100% correct answers [37,38]. Also, the effects of the system on user perception have to be considered.
Although promising results of high-performance AI algorithms have been reported in recent predictive maintenance researches, most of the existing studies merely deal with AI-only solutions and do not consider the interaction between humans and AI. In this study, we explicitly focus on the benefits of interactions where a human inspector is assisted by AI solutions. A case study is conducted for predictive maintenance of wind farms, where endoscopic images were used for bearing fault detection. The experiment consisted of 54 technical inspectors and 2301 images collected over 138 wind turbines, and each inspector was shown images and asked to identify bearing faults in the absence and presence of AI-assistance. The results showed that AI-assistance had a statistically significant impact on improving the technical inspector’s specificity and time efficiency. The level of improvement was dependent on the level of expertise, where the generalist group showed greater improvements in specificity and time efficiency (24.6% and 25.3%, respectively) when compared with the specialist group (4.7% and 6.4%, respectively). Both groups responded positively on the reuse intention and usefulness of AI-assistance, and the change in cognitive load was not statistically significant.
Considering the Safety and Quality of Artificial Intelligence in Health Care
2020, Joint Commission Journal on Quality and Patient Safety
A rapid increase in the capabilities of machine learning and artificial intelligence (AI) has focused attention on their potential applications in the health care setting. As health care providers begin to explore these opportunities, organizations must develop an understanding of how implementing AI tools will impact patient safety. This article reviews some of the challenges in development and implementation that may create barriers to the safe utilization of these technologies and prevent quality care for patients.
Diagnostic decisions of specialist optometrists exposed to ambiguous deep-learning outputs
2024, Scientific Reports

View all citing articles on Scopus

: The work described in this article has been partly funded by the UK's Engineering and Physical Sciences Research Council (EPSRC) through DIRC (the Interdisciplinary Research Collaboration on Dependability, which studies the dependability of computer based systems; http://www.dirc.org.uk).

View full text

Computer Assisted Radiology and SurgeryEffects of incorrect computer-aided detection (CAD) output on human decision-making in mammography

Rationale and objectives

Materials and methods

Results

Conclusion

Section snippets

Material and methods

Supplementary analyses of data from the HTA trial

Discussion

Acknowledgment

Modelling software design diversity – A review

ACM Comput Surveys

An evaluation of the impact of computer-based prompts on screen readers' interpretation of mammograms

Br J Radiol

Improved computer aided detection (CAD) algorithms for screening mammography

Radiology

Computer Assisted Radiology and Surgery
Effects of incorrect computer-aided detection (CAD) output on human decision-making in mammography