Computer Assisted Radiology and SurgeryEffects of incorrect computer-aided detection (CAD) output on human decision-making in mammography
Section snippets
Material and methods
We first outline the data collection methods used in the HTA trial because we used essentially the same methodology in our follow-up studies. We then detail the specifics of our experiments, in particular their rationale and the methodologic aspects in which they differ from the original trial.
Supplementary analyses of data from the HTA trial
The administrators of the HTA trial compared the sensitivity and specificity of the readers in the unprompted condition with their sensitivity and specificity in the prompted condition. The analyses showed that the prompts had no significant impact on (either improved or diminished) readers' sensitivity and specificity (3).
We were granted access to the trial data and conducted supplementary analyses focusing on the instances in which the readers made different decisions for the same case
Discussion
Our supplementary analyses of the data from the HTA trial suggest that the output of the CAD tool did have an effect on the readers' decision-making even if there was no statistically significant effect on their average performance in terms of sensitivity and specificity. We cannot entirely exclude the possibility of the variations we observed being because of random error (eg, it is not uncommon for experts to change their decisions in successive presentations of the same case). However, our
Acknowledgment
The authors would like to thank R2 Technologies (especially Gek Lim, Jimmy Roerigh, and Julian Marshall) for their support in obtaining the data samples for our follow-up studies; Paul Taylor and Jo Champness (from University College London) for granting us access to their data, facilitating the follow-up studies, and helping run them; and DIRC collaborators Mark Hartswood, Rob Procter, and Mark Rouncefield for their advice.
References (11)
- et al.
Modelling software design diversity – A review
ACM Comput Surveys
(2001) - Strigini L, Povyakalo A, Alberdi E. Human-machine diversity in the use of computerised advisory systems: a case study....
- et al.
An evaluation of the impact of computer-based prompts on screen readers' interpretation of mammograms
Br J Radiol
(2004) - US Food and Drug Administration. Pre-market approval decision. Application P970058. June 26, 1998. Available at...
- et al.
Improved computer aided detection (CAD) algorithms for screening mammography
Radiology
(2000)
Cited by (85)
Ethical Implications of Artificial Intelligence in Gastroenterology
2024, Clinical Gastroenterology and HepatologyStakeholder perceptions of the safety and assurance of artificial intelligence in healthcare
2022, Safety ScienceCitation Excerpt :The benefits might be particularly relevant in patients that require a lot of attention and receive several medications and interventions concurrently, as such situations are especially demanding. Similar expectations were held for previous generations of clinical decision support systems, as well as more broadly for highly automated systems across different industries, but numerous studies as well as accident investigations demonstrated that the assumption that automation reduces human error and thereby improves safety is overly simplistic (Cabitza et al., 2017; Bainbridge, 1983; Alberdi et al., 2004). However, some participants also pointed out that AI has the potential to cause or contribute to patient harm.
We and It: An interdisciplinary review of the experimental evidence on how humans interact with machines
2022, Journal of Behavioral and Experimental EconomicsAI-assistance for predictive maintenance of renewable energy systems
2021, EnergyCitation Excerpt :In addition, researchers in the medical field have examined the system’s effect considering the radiologist’s proficiency level for diagnosis tasks because radiologists may have different rooms for performance improvement being affected by the assistance system [21,36]. The effect of incorrect detection by CAD has been investigated at more specific levels, such as false positives and false negatives because the system cannot provide 100% correct answers [37,38]. Also, the effects of the system on user perception have to be considered.
Considering the Safety and Quality of Artificial Intelligence in Health Care
2020, Joint Commission Journal on Quality and Patient SafetyDiagnostic decisions of specialist optometrists exposed to ambiguous deep-learning outputs
2024, Scientific Reports
The work described in this article has been partly funded by the UK's Engineering and Physical Sciences Research Council (EPSRC) through DIRC (the Interdisciplinary Research Collaboration on Dependability, which studies the dependability of computer based systems; http://www.dirc.org.uk).