Elsevier

Academic Radiology

Volume 11, Issue 8, August 2004, Pages 909-918
Academic Radiology

Computer Assisted Radiology and Surgery
Effects of incorrect computer-aided detection (CAD) output on human decision-making in mammography

https://doi.org/10.1016/j.acra.2004.05.012Get rights and content

Rationale and objectives

To investigate the effects of incorrect computer output on the reliability of the decisions of human users. This work followed an independent UK clinical trial that evaluated the impact of computer-aided detection (CAD) in breast screening. The aim was to use data from this trial to feed into probabilistic models (similar to those used in “reliability engineering”) which would detect and assess possible ways of improving the human–CAD interaction. Some analyses required extra data; therefore, two supplementary studies were conducted. Study 1 was designed to elucidate the effects of computer failure on human performance. Study 2 was conducted to clarify unexpected findings from Study 1.

Materials and methods

In Study 1, 20 film readers viewed 60 sets of mammograms (30 of which contained cancer) and provided “recall/no recall” decisions for each case. Computer output for each case was available to the participants. The test set was designed to contain an unusually large proportion (50%) of cancers for which CAD had generated incorrect output. In Study 2, 19 different readers viewed the same set of cases in similar conditions except that computer output was not available.

Results

The average sensitivity of readers in Study 1 (with CAD) was significantly lower than the average sensitivity of readers in Study 2 (without CAD). The difference was most marked for cancers for which CAD failed to provide correct prompting.

Conclusion

Possible automation bias effects in CAD use deserve further study because they may degrade human decision-making for some categories of cases under certain conditions. This possibility should be taken into account in the assessment and design of CAD tools.

Section snippets

Material and methods

We first outline the data collection methods used in the HTA trial because we used essentially the same methodology in our follow-up studies. We then detail the specifics of our experiments, in particular their rationale and the methodologic aspects in which they differ from the original trial.

Supplementary analyses of data from the HTA trial

The administrators of the HTA trial compared the sensitivity and specificity of the readers in the unprompted condition with their sensitivity and specificity in the prompted condition. The analyses showed that the prompts had no significant impact on (either improved or diminished) readers' sensitivity and specificity (3).

We were granted access to the trial data and conducted supplementary analyses focusing on the instances in which the readers made different decisions for the same case

Discussion

Our supplementary analyses of the data from the HTA trial suggest that the output of the CAD tool did have an effect on the readers' decision-making even if there was no statistically significant effect on their average performance in terms of sensitivity and specificity. We cannot entirely exclude the possibility of the variations we observed being because of random error (eg, it is not uncommon for experts to change their decisions in successive presentations of the same case). However, our

Acknowledgment

The authors would like to thank R2 Technologies (especially Gek Lim, Jimmy Roerigh, and Julian Marshall) for their support in obtaining the data samples for our follow-up studies; Paul Taylor and Jo Champness (from University College London) for granting us access to their data, facilitating the follow-up studies, and helping run them; and DIRC collaborators Mark Hartswood, Rob Procter, and Mark Rouncefield for their advice.

References (11)

  • B. Littlewood et al.

    Modelling software design diversity – A review

    ACM Comput Surveys

    (2001)
  • Strigini L, Povyakalo A, Alberdi E. Human-machine diversity in the use of computerised advisory systems: a case study....
  • P.M. Taylor et al.

    An evaluation of the impact of computer-based prompts on screen readers' interpretation of mammograms

    Br J Radiol

    (2004)
  • US Food and Drug Administration. Pre-market approval decision. Application P970058. June 26, 1998. Available at...
  • R. Castellino et al.

    Improved computer aided detection (CAD) algorithms for screening mammography

    Radiology

    (2000)
There are more references available in the full text version of this article.

Cited by (85)

  • Ethical Implications of Artificial Intelligence in Gastroenterology

    2024, Clinical Gastroenterology and Hepatology
  • Stakeholder perceptions of the safety and assurance of artificial intelligence in healthcare

    2022, Safety Science
    Citation Excerpt :

    The benefits might be particularly relevant in patients that require a lot of attention and receive several medications and interventions concurrently, as such situations are especially demanding. Similar expectations were held for previous generations of clinical decision support systems, as well as more broadly for highly automated systems across different industries, but numerous studies as well as accident investigations demonstrated that the assumption that automation reduces human error and thereby improves safety is overly simplistic (Cabitza et al., 2017; Bainbridge, 1983; Alberdi et al., 2004). However, some participants also pointed out that AI has the potential to cause or contribute to patient harm.

  • AI-assistance for predictive maintenance of renewable energy systems

    2021, Energy
    Citation Excerpt :

    In addition, researchers in the medical field have examined the system’s effect considering the radiologist’s proficiency level for diagnosis tasks because radiologists may have different rooms for performance improvement being affected by the assistance system [21,36]. The effect of incorrect detection by CAD has been investigated at more specific levels, such as false positives and false negatives because the system cannot provide 100% correct answers [37,38]. Also, the effects of the system on user perception have to be considered.

  • Considering the Safety and Quality of Artificial Intelligence in Health Care

    2020, Joint Commission Journal on Quality and Patient Safety
View all citing articles on Scopus

The work described in this article has been partly funded by the UK's Engineering and Physical Sciences Research Council (EPSRC) through DIRC (the Interdisciplinary Research Collaboration on Dependability, which studies the dependability of computer based systems; http://www.dirc.org.uk).

View full text