Article Text

How machine learning is embedded to support clinician decision making: an analysis of FDA-approved medical devices
  1. David Lyell,
  2. Enrico Coiera,
  3. Jessica Chen,
  4. Parina Shah and
  5. Farah Magrabi
  1. Australian Institute of Health Innovation, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, NSW, Australia
  1. Correspondence to Dr David Lyell; david.lyell{at}mq.edu.au

Abstract

Objective To examine how and to what extent medical devices using machine learning (ML) support clinician decision making.

Methods We searched for medical devices that were (1) approved by the US Food and Drug Administration (FDA) up till February 2020; (2) intended for use by clinicians; (3) in clinical tasks or decisions and (4) used ML. Descriptive information about the clinical task, device task, device input and output, and ML method were extracted. The stage of human information processing automated by ML-based devices and level of autonomy were assessed.

Results Of 137 candidates, 59 FDA approvals for 49 unique devices were included. Most approvals (n=51) were since 2018. Devices commonly assisted with diagnostic (n=35) and triage (n=10) tasks. Twenty-three devices were assistive, providing decision support but left clinicians to make important decisions including diagnosis. Twelve automated the provision of information (autonomous information), such as quantification of heart ejection fraction, while 14 automatically provided task decisions like triaging the reading of scans according to suspected findings of stroke (autonomous decisions). Stages of human information processing most automated by devices were information analysis, (n=14) providing information as an input into clinician decision making, and decision selection (n=29), where devices provide a decision.

Conclusion Leveraging the benefits of ML algorithms to support clinicians while mitigating risks, requires a solid relationship between clinician and ML-based devices. Such relationships must be carefully designed, considering how algorithms are embedded in devices, the tasks supported, information provided and clinicians’ interactions with them.

  • medical informatics

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information. All data relevant to the analysis are reported in the article. The FDA approval documents analysed are cited in the reference list.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Summary

What is already known?

  • Machine learning (ML)-based clinical decision support (CDS) operates within a human–technology system.

  • Clinician interaction with CDS influences how they make decisions affecting care delivery and patient safety.

  • Little is known about how emerging ML-based CDS supports clinician decision making.

What does this paper add?

  • ML-based CDS approved by the FDA typically provide clinicians with decisions or information to support their decision making.

  • Most demonstrate limited autonomy, requiring clinicians to confirm information provided by CDS and to be responsible for decisions.

  • We demonstrate methods to examine how ML-based CDS are used by clinicians in the real world.

Introduction

Artificial intelligence (AI), technologies undertake recognition, reasoning or learning tasks typically associated with human intelligence,1 such as detecting disease in an image, diagnosis and recommending treatments, have the potential to improve healthcare delivery and patient outcomes.2 Machine learning (ML) refers more specifically to AI methods that can learn from data.3 The current resurgence in ML is largely driven by developments in deep learning methods, which are based on neural networks. Despite the expanding research literature, relatively little is known about how ML algorithms are embedded in working clinical decision support (CDS).

CDS that diagnoses or treats human disease automate clinical tasks otherwise done by clinicians.4 Importantly, CDS operates within a human–technology system,5 and clinicians can elect to ignore CDS advice and perform those tasks manually. Clinician interaction with ML-based CDS influences how they work and make decisions which in turn affects care quality and patient safety.

Alongside intended benefits, ML poses new risks that require specific attention. A fundamental challenge is that ML-based CDS may not generalise well beyond the data on which they are trained. Even for restricted tasks like image interpretation, ML algorithms can make erroneous diagnoses because of differences in the training and real-world populations, including new ‘edge’ cases, as well as differences in image capture workflows.6 Therefore, clinicians will need to use ML-based CDS within the bounds of their design, monitor performance and intervene when it fails. Clinician interaction with CDS is thus a critical point where the limitations of ML algorithms are either mitigated or translated into harmful patient safety events.7–9

One way to study the interaction between clinicians and ML-based CDS is to consider medical devices. In the USA, software, including CDS that is intended to diagnose, cure, mitigate, treat or prevent disease in humans, are considered medical devices10 and subject to regulation. Increasing numbers of devices that embody ML algorithms are being approved by the US Food and Drug Administration (FDA).11 12 Approval requires compliance with standards, as well as evaluation of device safety and efficacy.13 Regulators provide public access to approvals and selected documentation. Therefore, medical devices provide a useful sample for studying how ML algorithms are embedded into CDS for clinical use and how manufacturers intend clinicians to interact with them.

Research has predominantly focused on the development and validation of ML algorithms, and evaluation of their performance,11 14–16 with little focus on how ML is integrated into clinical practice and the human factors related to their use.17 In a recent systematic review of ML in clinical medicine, only 2% of studies were prospective, most were retrospective providing ‘proof of concept’ for how ML might impact patient care, without comparison to standard care.18

While one recent study has described the general characteristics of 64 ML-based medical devices approved by the FDA,12 no previous study has examined how ML algorithms are embedded to support clinician decision making. Our analysis of ML medical devices thus seeks to bridge the gap between ML algorithms and how they are used in clinical practice.

Human information processing

In assessing human–machine interaction, it is useful to consider how clinicians process information and make decisions, and which stages of that process are automated by ML devices. Automaton is the machine performance of functions otherwise be done by humans.4 Human information processing has been broken down into four distinct stages: (1) Sensing information in the environment, (2) Perceiving or interpreting what the information means, (3) Deciding the appropriate response and (4) Acting on decisions (figure 1).19 For example, the diagnosis of pneumonia requires clinicians to sense information relevant to the provisional and differential diagnoses of the patient’s condition from their medical history, physical examination and diagnostic tests. Information then needs to be interpreted: do chest X-rays show evidence of inflammation? These analyses inform decisions about diagnosis and treatment, which are then enacted by ordering or referring for treatment.

Figure 1

Stages of human information processing (top) and their automation (bottom).19

ML devices can automate any or all stages of human information processing: (1) Acquiring information, (2) Analysing information, (3) Decision selection from available alternatives and (4) Implementation of the selected decision (figure 1).19 Later stages represent higher levels of automation. For instance, an ML device assisting the diagnosis of cardiac arrhythmias that report quantitative measurements from ECGs, automates information analysis, whereas devices that indicate the presence or absence of atrial fibrillation automate decision selection. Identifying the stage of human information processing automated provides a useful framework for evaluating how ML devices change clinicians’ work, especially the division of labour between clinicians and ML devices.

Accordingly, we examined FDA-approved ML devices to understand:

  • Which ML devices have been approved for clinical practice, their intended use, the diseases they diagnose, treat or prevent, and how manufacturers intend for clinicians to interact with them?

  • How ML devices might change clinician decision making by exploring the stage of human information processing automated.

  • The extent to which ML devices function autonomously and how that impacts clinician–ML device interaction.

Method

We examined FDA-approved medical devices that use ML (online supplemental appendix A). Unable to directly search FDA databases for ML devices, we used an internet search to identify candidate devices that were:

Supplemental material

  1. FDA-approved medical devices.

  2. Intended for use by clinicians.

  3. Intended to support clinical tasks/decisions.

  4. Using ML.

The search identified 137 candidate devices for which 130 FDA approvals were retrieved. Of these, 59 approvals met the inclusion criteria covering 49 unique ML devices (figure 2).

Figure 2

Process to search for and identify FDA Approved ML devices. FDA, Food and Drug Administration; ML, machine learning.

Data extraction and analysis

For each included approval, we extracted the approval details (date, pathway, device risk class). For each unique device, we then extracted type (software as a medical device or integrated into hardware); characteristics (indicated disease, clinical task, device task, input, output); and ML method used as described in the approval. Clinical task, device input and output were identified from device indications and descriptions, and grouped according to natural categories emerging from the sample. The device task was summarised from the indications and device description in FDA approvals.

Stage of human information processing automated by ML devices

The device task was examined using the stages of automation of human information processing framework.19 We classified the highest stage of human information processing (figure 1) automated by ML devices according to the following criteria (from lowest to highest):

  1. Information acquisition: Device automates data acquisition and presentation for interpretation by clinicians. Data are preserved in raw form, but the device may aid presentation by sorting, or enhancing data.

  2. Information analysis: Device automates data interpretation, producing new information from raw data. Importantly, interpretation contributes new information that supports decision making, without providing the decision. For example, the quantification of QRS duration from electrocardiograms provides new information from ECG tracings that may inform diagnosis without being a diagnosis.

  3. Decision selection: Device automates decision making, providing an outcome for the clinical task. For example, prompting and thereby drawing attention to malignant lesions on screening mammograms, indicates a device decision about the presence of breast cancer.

  4. Action implementation: Device automates implementation of the selected decision where action is required. For example, an implantable cardioverter-defibrillator, having decided defibrillation is required, acts by automatically delivering treatment.

ML device autonomy

To understand the level of device autonomy, we examined the description and indications for use to determine the extent to which the device performs automated tasks independent of clinicians.19 For example, a device automating decision selection that requires clinician approval is less autonomous than similar devices that do not require approval. The approach is similar to existing levels of autonomy for specific tasks, such as driving automation,20 and computer-based automation,21 which identify what user and automation are responsible for in relation to a defined task. Taking these models as a starting point, we developed a three-level classification for ML device autonomy based on how clinical tasks are divided between clinician and ML device (lowest to highest; figure 3).

1. Assistive devices are characterised by overlap in what clinician and device contribute to the task, but where clinicians provide the decision on the task. Such overlap or duplication occurs when clinicians need to confirm or approve device provided information or decisions.

Figure 3

Level of autonomy showing the relationship between clinician and device.

2. Autonomous information is characterised by a separation between what device and clinician contribute to the task, where devices contribute information that clinicians can use to make decisions.

3. Autonomous decision is where device provides the decision for the clinical task which can then be enacted by clinicians or the device itself.

Conceptually, there is also a zero level, representing the complete absence of automation where clinicians perform tasks manually without any device assistance.

Two investigators independently assessed the stage of automation and level of autonomy (DL and FM). Inter-rater agreement was assessed using absolute agreement, two-way mixed effects intraclass correlation coefficient (ICC). Agreement for stage of automation was ICC=0.7 (95% CI 0.53 to 0.82) indicating moderate to good agreement and for level of autonomy was ICC=0.97 (95% CI 0.95 to 0.98) indicating excellent agreement.22 Disagreements were resolved by consensus. A narrative synthesis then integrated findings into descriptive summaries for each category of ML devices.

Results

Fifty-nine FDA approvals for ML devices met the inclusion criteria covering 49 unique devices (table 1). Six devices had two approvals and two had three approvals.

Table 1

Characteristics of ML medical devices approved by the US Food and Drug Administration (2008–2020)

FDA approvals

The earliest approval was in 2008 for IB Neuro23 which produces perfusion maps and quantification of blood volume and flow from brain MRI. However, the majority of approvals were observed in recent years (2016=3; 2017=5; 2018=22; 2019=27; 2020=2).

Most approvals (n=51) were via premarket notification (PMN) for devices which are substantially equivalent to existing and legally marketed devices. Only two were via premarket approval (PMA), the most stringent pathway involving regulatory and scientific review, including clinical trials to evaluate safety and efficacy.13 The remaining six approvals were De Novo classification; a less onerous alternative to PMA for low to moderate risk devices where there is no substantially equivalent predicate. All PMN and De Novo approvals (n=57) were for class 2 devices, while both pPMAs (n=2) were for class 3 devices, which are classified as moderate and high levels of risk, respectively.

Clinical tasks and diseases supported by ML devices

We identified five distinct clinical tasks supported by ML devices. Most (n=35) assisted with diagnostic tasks assisting with the detection, identification or assessment of disease, or risk factors, such as breast density. The second most common were triage tasks (n=10), where devices assisted with prioritising cases for clinician review, by flagging or notifying cases with suspected positive findings of time-sensitive conditions, such as stoke. Less common tasks were medical procedures (n=2), where devices either assisted users performing diagnostic or interventional procedures. Treatment tasks (n=1) where devices provided CDS recommendations for changes to therapy regimes. Monitoring tasks (n=1) involved devices assisting clinicians to monitor patient trajectory over time.

Twenty-three devices were indicated for a specific disease, and nine could be reasonably associated with a disease. The most common diseases were cancers, especially of the breast, lung, liver and prostate (table 2). Others were stroke (intracranial haemorrhage and large vessel occlusion) and heart diseases. Two devices were indicated for two separate diseases.24 25 The remaining 17 devices were indicated for applications broader than a specific disease.

Table 2

Diseases indicated or associated with ML devices

Device inputs and outputs

The majority of devices used image data (n=42), these included computed tomography (CT; n=15), magnetic resonance imaging (MRI; n=10), X-ray (n=5), digital breast tomosynthesis (n=3), digital mammography (n=3), echocardiography (n=3), fluoroscopy (n=1), fundus imaging (n=1), optical coherence tomography (n=1), positron emission tomography (PET; n=1) and ultrasound (n=1).

The remaining seven used signal data. These included, electrocardiography (n=3), phonocardiography (n=2), polysomnography (n=1), blood glucose and insulin pump data (n=1) and biometric data from wearables (n=1).

We identified nine common means by which ML devices communicated results (table 3).

Table 3

ML device output by type

ML method

Manufacturers descriptions of ML method were varied. Most described a family of techniques (ML=14; deep learning=11), followed by generic descriptors (AI=15). Specific ML techniques were the least frequently reported (convolutional neural network=6; neural network=1; deep neural network=1; deep convolution neural network=1).

Stage of decision-making automated or assisted by ml devices

Most devices aided information analysis (n=14) and decision selection (n=29). ML devices also, but less commonly, aided in information acquisition (n=4) and action implementation (n=2), the earliest and latest stages of decision making, respectively.

Information acquisition

None of the devices acquired information, but instead aided presentation by enhancing the quality of CT, MRI and PET images26–29 thereby assisting clinician interpretation. One representative device, SubtleMR28 reduces noise and increases image sharpness of head, neck, spine and knee MRI scans. SubtleMR receives DICOM (Digital Imaging and Communications in Medicine) image data from and returns enhanced DICOM images to a PACS (picture archiving and communication system) server.

Information analysis

Information analysis provides clinicians with new information derived from processing raw inputs. Devices provided analysis in the form of quantification30–37 or automatic coding of features or events.38 39 For example, IcoBrain31 provides volumetric quantification of brain structures from MRI or CT scans, which can aid in the assessment of dementia and traumatic brain injury, while EnsoSleep39 automatically codes events in sleep studies such as stages of sleep and obstructive apnoeas to assist with the diagnosis of sleep disorders.

Decision selection

Decision selection provides a decision that is an outcome for the clinical task, such as triage notifications,25 40–48 case level findings of disease,24 49–53 identification of features indicative of disease,54–59 or clinical classifications or gradings.60–64 One device providing triage notifications is Briefcase.25 40 41 Briefcase assists radiologists triage time-sensitive cases by flagging and displaying notifications for cases with suspect positive findings of cervical spine fracture,40 large vessel occlusion,41 intracranial haemorrhage and pulmonary embolism25 as they are received. A device providing case level findings of disease is AI-ECG Platform.53 It reports whether common cardiac conditions are present, such as arrhythmias and myocardial infarction. While clinicians can view the original tracings, the device reports on the entire case. In contrast, a device providing feature level detection of disease is Profound AI,56 which detects and marks features indicative of breast cancer on digital breast tomosynthesis exams. It is intended to be used concurrently by radiologists while interpreting exams, drawing attention to features which radiologists may confirm or dismiss. A device reporting clinical classifications or grades is DM-Density,63 which reports breast density grading for digital monography cases according to the American Collage of Radiology’s Breast Imaging-Reporting and Data System Atlas.65

Action implementation

Devices providing action implementation included Caption Guidance66 and FluroShield67; these implemented decisions through the automatic control of an electronic or mechanical device. Caption Guidance66 assists with acquisition of echocardiograms, providing real-time guidance to sonographers and feedback on detected image quality. Ultrasounds are automatically captured when the correct image quality is detected. FluroShield67 automatically controls the collimator during the fluoroscopy to provide a live view of a region of interest, with a lower refresh rate of once or twice per second for the wider field of view, thereby reducing radiation exposure to patient and clinician.

ML device autonomy

Nearly half (n=23) of devices were assistive, characterised by indications emphasising clinician responsibility for the final decision or statements limiting the extent to which the device could be relied on (box 1). Assistive devices comprised all devices providing feature level detection,54–59 five of six devices reporting a case level finding of disease,24 50–53 and almost half of devices providing quantification.33–37 68 Notwithstanding clinician responsibility to patients, the indications for the remaining devices did not specify such limitations, when used as indicated. Consequently, those devices appeared to automate functions otherwise performed by clinicians, to a greater extent than assistive devices. Fourteen devices provide autonomous decisions that clinicians could act on; these were primarily devices providing triage notifications,25 40–48 but also included IDx-DR49 a device providing case-level findings of diabetic retinopathy, allowing screening in primary practice where results are used as the basis for specialist referral for diagnosis and management. Twelve devices provide autonomous information, that clinicians could use in their decision making to determine an outcome for clinical tasks. These included devices providing enhanced images,26–29 quantification23 30–32 69 70 and one device which coded features or events.38

Box 1

Examples of FDA-approved indications specifying responsibility for the final decision on the device task resides with the clinician. For further examples, see online supplemental appendix A

‘All automatically scored events are subject to verification by a qualified clinician.’39

‘Not intended for making clinical decisions regarding patient treatment or for diagnostic purposes.’68

‘Intended as an additional input to standard diagnostic pathways and is only to be used by qualified clinicians.’37

‘Interpretations offered by (device) are only significant when considered in conjunction with healthcare provider over-read and including all other relevant patient data.’50

‘Should not be used in lieu of full patient evaluation or solely relied on to make or confirm a diagnosis.’51

‘The clinician retains the ultimate responsibility for making the pertinent diagnosis based on their standard practices.’62

‘Patient management decisions should not be made solely on the results.’64

‘Provides adjunctive information and is not intended to be used without the original CT series.’58

Discussion

Main findings and implications

The way that algorithms are embedded in medical devices shapes how clinicians interact with them, with different profiles of risk and benefit. We demonstrate how the stages of automation framework,19 can be applied to determine the stage of clinician decision making assisted by ML devices. Together with our level of autonomy framework, these methods can be applied to examine how ML algorithms are used in clinical practice, which may assist addressing the dearth of human factors evaluations related to the use of ML devices in clinical practice.17 Such analyses (table 1) permit insight into how ML devices may impact or change clinical workflows and practices, and how these may impact healthcare delivery.

While FDA approval of ML devices is a recent development, only six approvals in this study were via De Novo classification for new types of medical devices. Most approvals were via the PMN pathway for devices that are substantially equivalent to existing predicate devices. Some predicates could be traced to the ML device De Novo’s, while others were non-ML devices with similar indications except using different algorithms. As the FDA assesses all medical devices on the same basis, regardless of ML utilisation, it is unsurprising that ML medical devices largely follow in the footsteps of their non-ML forebears. Most were assistive or autonomous information which left responsibility for clinical decisions to clinicians.

We identified an interesting group of devices, primarily triage devices, which provided autonomous decisions, independent of clinicians. These triage devices appeared to perform tasks intended to supplement clinician workflow, rather than to automate or replace existing clinician tasks. The expected benefit is prioritising the reading of cases with suspected positive findings for time-sensitive conditions, such as stroke, thereby reducing time to intervention, which may improve prognosis. Unlike PMNs, De Novo classifications report more details, including identified risks. The De Novo for the triage device, ContaCT,45 identifies risks associated with false-negatives that could lead to incorrect or delayed patient management, while false-positives may deprioritise other cases.

Likewise, the diabetic retinopathy screening device, IDx-DR49 appears to supplement existing workflows by permitting screening in primary practice that would otherwise be impossible. The goal is to increase screening rates for diabetic retinopathy by improving access to screening and reducing costs.71 The De Novo describes risks that false-negatives may delay detection of retinopathy requiring treatment, while false-positives may subject patients to additional and unnecessary follow-up.49 However, the device may enable far greater accessibility to regular screening.

In contrast, with assistive devices there is overlap between what the clinician and device does. Despite many of these ML devices providing decision selection, such as reporting on the presence of disease, the approved indications of all assistive devices—nearly half of reviewed devices—emphasised that decisions are the responsibility of the clinician (box 1). Such stipulations specify how device information should be used and may stem from several sources, such as legal requirements for tasks: who can decide what, for example, diagnose or prescribe medicines, and legal liability about who is accountable when things go wrong. However, the trustworthiness of devices cannot be inferred by the presence of such indications.

Assistive devices change how clinicians work and can introduce new risks.72 Instead of actively detecting and diagnosing disease, through patient examination, diagnostic imaging or other procedures, the clinician’s role is changed by the addition of the ML device as a new source of information. Crucially, indications requiring clinicians to confirm or approve ML device findings create new tasks for clinicians; to provide quality assurance for device results, possibly by scrutinising the same inputs as the ML device, together with consideration of additional information.

The benefit of assistive ML devices is the possibility of detecting something that might have otherwise been missed. However, there is risk that devices might bias clinicians; that is, ML device errors may be accepted as correct by clinicians, resulting in errors that might not have otherwise occurred.9 73 Troublingly, people who suffer these automation biases exhibit reduced information seeking74–76 and reduced allocation of cognitive resources to process that information,77 which in turn reduces their ability to recognise when the decision support they have received is incorrect. While improving ML device accuracy reduces opportunities for automation bias errors, high accuracy is known to increase the rate of automation bias,78 likely rendering clinicians less able to detect failures when they occur. Of further concern, is evidence showing far greater performance consequences when later stage automation fails, which is most evident when moving from information analysis to decision selection.79 Greater consequences could be due to reduced situational awareness as automation takes over more stages of human information processing.79

Indeed, the De Novo for Quantx,57 an assistive device which identifies features of breast cancer from MRI, describes the risk of false-negatives which may lead to misdiagnosis and delay intervention, while false-positives may lead to unnecessary procedures. The De Novo for OsteoDetect52 likewise identifies a risk of false-negatives that ‘users may rely too heavily on the absence of (device) findings without sufficiently assessing the native image. This may result in missing fractures that may have otherwise been found.’52 While false-positives may result in unnecessary follow-up procedures. These describe the two types of automation bias errors which can occur when clinicians act on incorrect CDS. Omission errors where clinicians agree with CDS false-negatives and consequently fail to diagnose a disease, and commission errors whereby clinicians act on CDS false-positives by ordering unnecessary follow-up procedures.9 80

Other risks identified in De Novo classifications45 52 57 include device failure, and use of devices on unintended patient populations, with incompatible hardware and for non-indicated uses. Such risks could result in devices providing inaccurate or no CDS. Controls outlined in De Novos focused on software verification and validation, and labelling, to mitigate risks of device and user errors, respectively.

These findings have several implications. For clinicians, use of ML devices needs to be consistent with labelling and results scrutinised according to clinicians’ expertise and experience. Scrutiny of results is especially critical with assistive devices. There needs be awareness of the potential for ML device provided information to bias decision-making. Clinicians also need to be supported to work effectively with ML devices, with the training and resources necessary to make informed decisions about use and how to evaluate device results. For ML device manufacturers and implementers, the choice of how to support clinicians is important, especially the choice of which tasks to support, what information to provide and how clinicians will integrate and use those devices within their work. For regulators, understanding the stage and extent of human information processing automated by ML devices may complement existing risk categorisation frameworks,81 82 by accounting for how the ML device contribution to decision-making modifies risk for the intended use of device provided information; to treat or diagnose, to drive clinical management or to inform clinical management.81 Regulators could improve their reporting of ML methods used to develop the algorithms utilised by devices. These algorithms are akin to the ‘active ingredient’ in medicines as they are responsible for the device's action. However, consistent with the previous study we found that the public reporting of ML methods varied considerably but was generally opaque and lacking in detail.12 Presently, the FDA only approves devices with ‘locked’ algorithms,82 but are moving towards a framework that would permit ML devices which learn and adapt to real-world data.83 Such a framework is expected to involve precertification of vendors and submission of algorithm change protocols.82 It will be important to continually evaluate the clinician-ML device interactions which may change with regulatory frameworks.

Finally, there are important questions about responsibility for ML device provided information and the extent to which clinicians should be able to rely on it. While exploration of these questions exceeds the scope of this article, models of use that require clinicians to double check ML devices results, may be less helpful than devices whose output can be acted on. As ML devices become more common there needs to be clearly articulated guidelines on the division of labour between clinician and ML devices, especially in terms of who is responsible for which decisions and under what circumstances. In addition to the configuration of tasks between clinician and ML devices, how devices work and communicate with clinicians is crucial and requires further study. The ability of ML devices to explain decisions through presentation of information, such as marking suspected cancers on images or using explainable AI techniques84 will impact how clinicians will assess and make decisions based on ML device provided information.

Limitations

There are several limitations. First, it was not possible to directly search FDA approval databases, the primary source of approvals. Second, the reporting in approvals varied considerably with nearly one third of included approvals not describing ML utilisation. Indeed, all disagreements on device selection occurred where evidence had to be sought from the manufacturer’s website and non-peer reviewed sources, where one reviewer located key information the other did not. Consequently, it is possible that some devices may have been missed. Nevertheless, the review provides useful insights in the absence of capability to systematically search primary sources. Our analysis focused on intended use as described in approvals, rather than actual use in the real world, which may differ. Finally, the focus on medical devices limits the review to ML algorithms approved by the FDA. Nevertheless, our methods to examine the stage of human information processing automated and level of autonomy can be applied to examine clinician interaction with the vast majority of ML CDS which are not regulated as medical devices. Indeed, there is an urgent need to ensure ML based CDS are implemented safely and effectively in clinical settings.85

Conclusion

Our analysis demonstrates the variety of ways in which ML algorithms are embedded in medical devices to support clinicians, the task supported and information provided. Leveraging the benefits of ML algorithms for CDS and mitigating risks, requires a solid working relationship between clinician and the CDS. Such a relationship must be careful designed, considering how algorithms are embedded in devices, the clinical tasks they support, the information they provide and how clinicians will interact with them.

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information. All data relevant to the analysis are reported in the article. The FDA approval documents analysed are cited in the reference list.

Acknowledgments

We wish to acknowledge the invaluable contributions of Didi Surian, Ying Wang and Rhonda Siu.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Twitter @David_Lyell, @EnricoCoiera, @farahmagrabi

  • Contributors DL conceived this study, and designed and conducted the analysis with advice and input from FM and EC. PS and DL screened ML devices for inclusion and performed data extraction. JC identified additional ML devices and screened devices for inclusion. FM and DL assessed the stage of automation and level of autonomy. DL drafted the manuscript with input from all authors. All authors provided revisions for intellectual content. All authors have approved the final manuscript.

  • Funding NHMRC Centre for Research Excellence (CRE) in Digital Health (APP1134919) and a Macquarie University Safety Net grant.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.