Whether IBM’s Watson, Google’s DeepMind or Tencent’s WeDoctor, the last few years have been characterised by unprecedented levels of research interest and new investments in artificial intelligence (AI) and digital healthcare technology. The number of publications on applications of AI and machine learning to medical diagnosis has dramatically increased since around 2015 (figure 1). Correspondingly, venture capital-backed digital health and AI startups worth over US$1 billion now number in the dozens (figure 1).1 Yet, this influx of new investment has not been without controversy. Google’s recent partnership with national health group Ascension, which gave the company access to the clinical data of around 50 million patients, has been the target of significant mediatic and congressional scrutiny.2 Likewise, pharmaceutical giant GlaxoSmithKline’s (GSK) US$300 million investment in direct-to-consumer genetic testing provider 23andMe has aroused similar concerns.3 Under the terms of their 4–5 years agreement, GSK gained access to 23andMe’s genetic data and became its exclusive collaborator for drug target discovery programmes.4 While much of the coverage of these partnerships has focused on issues of privacy and consent, we argue that another key consideration lies in the risks associated with exclusive or privileged access to databases of patient information and the development of proprietary diagnostic algorithms.
Why should we care about openness and transparency in AI development? Take the hypothetical case of a tech company developing a new proprietary AI to make prescription recommendations using electronic health record data from a large academic medical centre. Aware of this ongoing programme, a pharmaceutical company decides to make its drugs available at a discounted price to the hospital, resulting in increased prescription of its drugs relative to competitors. Now, without any overt collusion, the tech company’s AI may learn that these drugs are more often prescribed by the hospital’s physicians and therefore have increased probability of recommending them in the future. Clearly, these recommendations are inappropriate and not based on any medical evidence, yet without the ability to inspect the proprietary AI or the data it was trained on, the possibilities for peer review and scrutiny would be severely limited. Should AIs have their own disclosures? How would such disclosures be regulated and enforced? Would it be desirable to avert healthcare ‘data monopolies’ with new antitrust legislation? These are questions regulators will need to answer sooner rather than later. While not AI-driven, the recent revelation that popular electronic health record vendor Practice Fusion received kickbacks in exchange for displaying alerts in its software designed to increase prescriptions of opioid analgesics5 is a chilling reminder of the ability of software vendors to influence treatment decisions. The unmonitored allowance of proprietary healthcare AIs trained on privately held datasets risks providing an avenue for plausible deniability in addition to further hindering the detectability of such complicit partnerships between drug manufacturers and software vendors.
Beyond theoretical scenarios, take also for example a recent study by a group of Google researchers who designed an AI system to read mammograms that outperformed radiologists on a breast cancer identification task.6 While unintentional and acknowledged by the authors, 95% of the over 90 000 mammograms used in the study were acquired on devices made by a single manufacturer. Would the AI perform as well on images from another manufacturer's systems? What about the 10-year-old mammography system still operating in an under-resourced community? Further studies and clinical trials will be needed to obtain these answers, but this case highlights just how easy it is for systemic biases to be introduced even when no foul play is involved. Nonetheless, AI presents a tremendous opportunity to reduce barriers to care in low-resource settings around the world.7 8 Unfortunately, current trends in AI research and private funding (figure 1) suggest the existence of strong geographical bias. A select group of countries, including notably China and the USA, are responsible for most of the research and investment in AI-assisted medical diagnostics. Unless representative samples of patients are included, the likelihood of these tools providing equal benefits outside of their countries of origin is limited. Collaboration and exchange of data and experience between healthcare systems on a global scale is needed if we are to benefit from truly generalisable and equitable AI systems. Exploratory research and development of AI systems on small single-centre sample datasets is necessary for identifying promising applications, but we suggest that a similar framework of ‘levels of evidence’ as proposed by Woo et al 9 for biomarkers in translational neuroimaging could be applied more broadly for all AI system development in medicine. This framework suggests that early exploratory AI model development should be followed by progressively more comprehensive assessments of generalisability across larger and more diverse research contexts and population samples. Models for which initial results can be satisfactorily reproduced across larger multicentric studies and in diverse groups of patients then become strong candidates for translation into real-world clinical practice.
AI systems often—even to the ignorance of their creators—replicate the societal biases extant within the data they are trained on. In our own study,10 we found that the models we had trained on data from over 60 000 patients from a national cancer registry to predict meningioma malignancy and survival predicted worse survival for black and uninsured patients. Another study which developed an algorithm to predict no-show appointments in paediatric orthopaedic clinics likewise identified that insurance type was a significant factor in predicting the rate of no-shows.11 While these predictions are factually representative of the data, the predicted outcomes are much more reflective of social and economic realities than they are of any biology. Other previously reported examples of bias include a melanoma diagnosis algorithm that did not factor skin colour or the use of genomic databases in which minorities are under-represented.12 Patient age is another factor from which disparities may arise. It has for example been reported that Babylon Health’s GP at Hand system, which offers online consultations and an AI-driven symptom checker, has attracted on average younger and healthier patients as compared with in-person general practice clinics.13 Barriers to care driven by difficulties with adapting to new technologies are one issue, but the relative lack of training data in certain age groups could also lead to AI systems becoming more proficient at identifying the health issues more frequently experienced by the groups of patients for which they hold more data. These cases underscore the importance for healthcare practitioners to critically assess the predictions of putatively ‘objective’ machine learning systems. They are also a reminder that while technological solutions will undoubtedly form part of our efforts for better care delivery, other systemic issues remain just as, if not more, critical to address.
While there is a strong argument to be made in favour of federating de-identified health data in national or even international databases to allow for the development of healthcare AI systems,14 15 we argue that these data should be considered a public good. As discussed above, there is a real risk that allowing exclusive or privileged access to databases of patient information may allow for intentional or unintentional bias to be introduced in health AI systems. Treating patient data as a commodity could also create perverse incentives for companies to invest more in acquiring datasets or developing products in countries or among groups of individuals with greater purchasing power. This could be of particular concern for direct-to-consumer health products. The Apple Heart Study16 investigated the ability of an optical pulse sensor and smartwatch application to identify atrial fibrillation. The study recruited an impressive sample of over 400 000 individuals; however, all participants were from the USA and owners of Apple smartphone and smartwatch devices. Moreover, the paper and data sharing statement for this study notably state that the data are ‘not available to be shared’ and that ‘Apple sponsored the study and owns the data’.16 While the challenges involved in balancing commercial and research interests cannot uniquely be attributed to this study, the possibility that socioeconomic (and consequentially demographic) groups were not equally represented does raise concerns given the cost of these devices. There is a need for greater advocacy for the inclusion of data from historically underserved communities in datasets that will be used to train the next generation of health AI systems. While the scale of potential consequences is hard to estimate given the paucity of systems currently in real-world use, we must take the initiative to ensure that underserved communities are adequately represented in health AI developments. Requiring the systematic evaluation and reporting of health AI systems’ performance in diverse population subsamples as a condition in the approval process for commercialisation is one step regulators could take in this direction.
In the context of primary care, the commoditisation of personal medical data runs counter to public expectations of the confidential nature of the physician-patient relationship.17 In this regard, we believe that there is an urgent need for greater transparency and public discourse on how and for what purpose health data are exchanged. Even if data are de-identified, patients should ultimately have the ability to know and decide who has access to their data and for what purpose these data are being used. The question of ‘ownership’ of medical data remains ill-defined from a legal perspective in many jurisdictions18 19 in spite of the frequent mismatch between patient expectations and actual data usage. In the UK, the now scrapped NHS care.data programme raised significant concerns with respect to the provision of health data to the insurance industry, for instance.17 Beyond the technological challenges lies the issue of maintaining public confidence.14 While a cancer patient may support sharing data for research into developing a diagnostic AI system that could allow for earlier disease detection, this same patient may not agree with any data being used to train an AI system designed to calculate life insurance premiums. In a recent survey of patient attitudes towards sharing data from electronic health records for research purposes, only 4% of patients recruited from two US academic medical centres declined to share data with researchers from the home institution, while 28% declined to share with other non-profit institutions and 47% were not willing to share data with for-profit institutions.20 Allowing for the responsible use of aggregated health data to develop AI-driven diagnostic tools has considerable potential to benefit patients, but we must ensure that mechanisms allowing for ethical oversight and independent validation remain available. Beyond preventing exclusive private ownership of patient data, this also means requiring a minimum level of transparency in disclosing what data were used to train health AI systems and actively informing patients about the use of their data. Open data and transparent reporting of data sources used in AI development will allow for the necessary accountability to ensure that algorithm developers build generalisable health AI systems that minimise bias and respect public expectations of medical data usage.
In spite of the challenges, there is growing recognition of the necessity for intentional design of equitable AI systems.21 22 Human-centred AI, a perspective that argues that AI systems must be designed for social responsibility with an understanding of sociocultural context,23 24 has been gaining traction among AI researchers. There have, moreover, been encouraging steps towards policy discussion and legislation to protect personal information while requiring transparency, fairness and accountability for processors of personal data.25 These are promising developments, but we cannot stop here. In the end, sensitivity, specificity and other metrics tell only part of the story. While we can and should attempt to build performant AI systems that emulate ethical decision making, we must remember that human-designed AI remains biased by the same social, cultural and political biases that shaped the data these systems were trained on. The physician’s role as an advocate for patients’ interests is as important today as it has ever been. We will increasingly come to rely on AI-assisted diagnosis and prognosis in the years to come, but treatment recommendations must remain conscious of societal context and continue to represent a shared decision-making process between physician and patient.