Ensuring machine learning for healthcare works for all
•,,,.
...
Introduction
Machine learning, data science and artificial intelligence (AI) technology in healthcare (herein collectively referred to as machine learning for healthcare (MLHC)) is positioned to have substantial positive impacts on healthcare, enhancing progress in both the acquisition of healthcare knowledge and the implementation of this knowledge in numerous clinical contexts. However, there are concerns that have been identified with these technologies regarding their potential for negative impacts.1–7 In particular that they may damage health equity by either introducing novel biases, or uncritically reproducing and magnifying existing systemic disparities. These concerns have led to a growth of scholarship on the intersection of ethics, AI and healthcare,1–7 as well as significant restrictions on the use of patient data for MLHC research.8 9
Unfortunately, modern healthcare is already rife with treatments that fail to live up to evidentiary scrutiny,10 while evidence behind their use is riddled with biases that further deepen health inequities.11 Against this backdrop, it becomes clear that urgent and substantial change is needed, and that MLHC offers one of the most promising avenues toward achieving this end. Ethical concerns regarding the impact of this technology should be addressed and made foundational to the development of MLHC in meaningful ways. However, those concerns must not act to affect the field in a manner that perpetuates the structural inequalities that presently exist.
Through the conceptual lens of MLHC, this paper will explore various flaws of healthcare’s current approaches to evidence, and the ways in which insufficient evidence and bias combine to lead to ineffective and even harmful care. We examine the potential for data science and AI technologies to address some of these issues, and we tackle commonly raised ethical concerns in this space. Ultimately, we provide a series of recommendations for reform in policies around MLHC which will facilitate the development of systems that provide a public benefit for all.
Bias and insufficiency of evidence in healthcare
Many common interventions in healthcare are performed without good evidence to support them. A 2012 National Academy of Medicine report noted that high quality evidence is lacking or even non-existent for many clinical domains,12 and a similar investigation from the UK’s National Institute for Health and Care Excellence and the BMJ found that 50% of current treatments have unknown effectiveness, 10% are still in use despite being ineffective or even harmful and only 40% have some evidence for effectiveness.13 As Prasad et al have found, studies that contradict previous research and lead to ‘medical reversal’ changes to practice standards are common—comprising up to 40% of papers that evaluated current standard of care in the New England Journal of Medicine from 2001 to 2010,14 and many papers in JAMA and The Lancet.10 It is clear that many interventions have insufficient evidence but continue to be adopted and propagated based on expert opinion typically backed by professional societies. Even when prospective randomised controlled trials are performed, they are subject to numerous opportunities for bias—and even outright conflict of interest—which can impact the quality and transferability of results.15 16
The burdens of medicine’s failures in evidentiary quality and applicability are not borne equally.11 17–19 The historical and ongoing omission in research of certain groups, including women and underserved populations, has skewed our understanding of health and disease.11 The concerns that exist regarding the generation of algorithms on racially biased datasets17 are unfortunately far from being new, but represent a continuation of a long-standing history of minority groups being under-represented or entirely unrepresented in foundational clinical research.11 18 The Framingham study, for example, generated its cardiovascular risk scores from an overwhelmingly white and male population, and has subsequently been inaccurate when uncritically used on black populations.19 Similarly, women have been and continue to be heavily under-represented in clinical trials.11 20 21 These problems extend to the global health context as well, as the trials used to inform clinical practice guidelines around the world tend to be conducted on a demographically restricted group of patients in high-income countries (mainly white males in the USA)11 These issues are compounded by structural biases in medical education,22 and the biases of the healthcare providers tasked with interpreting and implementing this medical knowledge in the clinical context.23
Can MLHC help, or will it harm?
The question is whether MLHC will help to remedy these shortcomings or exacerbate them. Models that are trained uncritically on databases embedded with societal biases and disparities will end up learning, amplifying and propagating those biases and disparities under the guise of algorithmic pseudo-objectivity.2 17 24 25 Similarly, gaps in quality of care will be widened by the development and use of tools that are only beneficial to a certain population—such as a melanoma detection algorithm trained on a dataset containing mostly images of light toned skin.26 Concerns also exist around patient privacy and safeguarding sensitive data (particularly for vulnerable groups such as HIV positive patients).27 Finally, there are structural concerns related to the possibility that the information technology prerequisites for implementing MLHC will only be available to already privileged groups.5 7
Yet, and as recent scholarship has indicated, the potential for MLHC to counter biases in healthcare is considerable.3 28 Data science methods can be used to audit healthcare datasets and processes, deriving insights and exposing implicit biases so they might be directly investigated and addressed.1 3 29 While much has been made of the ‘black box’ characteristics of AI, it may be argued that human decision making in general is no more explainable.30 31 This is particularly true in the context of the sort of implicit gender and racial biases that influence physicians' decisions but are unlikely to be consciously admitted.23 As checklist studies in healthcare have demonstrated,32 it may be possible to reduce these biases through the use of standardised prompts and clinical decision support tools that move clinical decisions closer to the data—and further from the biasing subjective evaluations. At the structural level, there is hope that AI will drive down the costs of care, increasing access for groups that have been traditionally underserved, and enabling greater levels of patient autonomy for self-management.4 5
Further, MLHC technologies may be able to address issues of disparity in the clinical research pipeline.33 Improvements in the use and analysis of electronic health records and mobile health technology herald the possibility of mobilising massive amounts of healthcare data from across domestic and global populations. The prospect of using ‘big data’ (ie, large and comprehensive datasets involving many patient records) that better represents all patients for health research may hold promise for counteracting issues of evidentiary insufficiency and limitations. As shown by the ‘All of Us’ programme, biological information database initiatives can be specifically tailored toward the active inclusion of traditionally under-represented groups.34 Recent progress in the ability to emulate a ‘target trial’ when no real trial exists may even enable scientists to regularly obtain real-world evidence and evolve insights about the effectiveness of treatments in groups absent from initial clinical trials.35
Ensuring MLHC works for all
Despite this potential, MLHC is far from a magical solution, and should not be seen as such. Embracing it must not lead subsequently to the neglect of the role played by other structural factors such as economic inequities36 and implicit physician bias.23 No simple set of data-focused technical interventions alone can effectively deal with complex sociopolitical environments and structural inequity,37 and simple ‘race correction’ methods can be deeply problematic.38 The potential for ‘big data’ synthetic clinical trials, for example, must come as a supplement to and not a replacement for efforts to improve the diversity of clinical trial recruitment. Similarly, issues of structural bias must be acknowledged and addressed at all levels of the MLHC development pipeline,17 39 from assessing the quality of the input data to ensuring adequate funding for the information technology needed to implement MLHC in underserved areas.
If MLHC is to be successful at reducing health disparities, it must reflect this function in its form. The troubling lack of diversity both in the field of AI40 and in biomedical research generally41 raises concerns about the perpetuation of biased perspectives in development, and the historical and ongoing flaws of healthcare and its research communities have led to distrust among minority communities.42 The onus is on the MLHC community to rebuild this trust and embrace structural reform. Inclusion and active empowerment of members of marginalised communities is essential, and concepts around individual or collective data ownership and sovereignty43 deserve further exploration.
At the same time, we must not forget the biases exerted by the status quo, which we cannot allow to slow the sort of progress that is necessary to address these problems. Problems evolving from the systematic exclusion of vulnerable populations from research will not be solved by the continued exclusion of these populations. While work certainly must be done to ensure that minoritised patients do not need to be saved from MLHC research, work must also be done to remedy disparities and improve outcomes for minoritised patients through MLHC research.
The vigorous discussions surrounding ethical issues in MLHC must be translated into active efforts to construct the field from the ground up. Both the field itself and the outputs it creates must be ethical and equitable at their core, with these concerns rendered structurally integral rather than addressed post hoc. An emphasis is already growing throughout the field on the establishment of codes of conduct,44 and practical procedures6 33 for the ethical and equitable implementation of AI in healthcare. As outlined in table 1, we identify a number of critical areas of emphasis in the development of MLHC that fosters this vision. Just as the potential for problematic bias in MLHC has no single cause, the onus for achieving these recommendations does not fall on any single actor in the MLHC space. Open collaboration between universities, technology companies, ministries of health, regulators, patient advocates and individual clinicians and data scientists will be essential to its success.
Table 1
|
Areas of emphasis for ensuring machine learning for healthcare (MLHC) works for all
Conclusion
The gaps in the medical knowledge system stem from the systematic exclusion of the majority of the world’s population from health research. These gaps combined with implicit and explicit biases lead to suboptimal medical decision making which negatively impact health outcomes for everyone, but especially those in groups typically under-represented in health research.
Recent developments in machine learning and AI technologies hold some promise to address the issues with the generation of scientific evidence and human decision making. They also, however, have spurred concerns about their potential to maintain if not exacerbate these problems. These concerns must be aggressively addressed by adopting necessary structural reforms to ensure that the field is both equitable and ethical by design.
Claims of ‘doing better’ have, of course, come before in healthcare with respect to bias, and the burden is on MLHC as a field to grow in a fashion that is deserving of the hype it has received. MLHC is not a magic bullet, nor can it address issues of structural health inequity by itself, but its potential may be substantial. Healthcare is flawed, and it must be reformed so that it equitably benefits all. Effective and equitable machine learning, data science and AI will be an essential component of these efforts.
Twitter: @liamgmccoy, @MITCriticalData
Contributors: Initial conceptions and design: LGM, JDB and LAC. Drafting of the paper: LGM, JDB, MG and LAC. Critical revision of the paper for important intellectual content: LGM, JDB, MG and LAC.
Funding: LAC is funded by the National Institute of Health through the NIBIB R01 grant EB017205. JDB receives grant support from the Advanced Radiology Services Foundation in Grand Rapids, Michigan, USA.
Competing interests: MG acts as an advisor to Radical Ventures in Toronto.
Patient consent for publication: Not required.
Provenance and peer review: Not commissioned; externally peer reviewed.
Zhang H, AX L, Abdalla M, et al. Hurtful words: quantifying biases in clinical contextual word embeddings. 2020; Google Scholar
Chen IY, Szolovits P, Ghassemi M, et al. Can AI help reduce disparities in general medical and mental health care? AMA J Ethics2019; 21:167–79. Google Scholar
Nuffield Council on bioethics. Artificial intelligence (AI) in healthcare and research, Bioethics Briefing note. 2018; Google Scholar
Centre for Data Ethics and Innovation. CDEI AI barometer. 2020;
Joshi I, Morley J. Artificial intelligence: how to get it right. putting policy into practice for safe data-driven innovation in health and care. NHSX2019; Google Scholar
Fenech M, Strukelj N, Buston O, et al. Ethical, social, and political challenges of artificial intelligence in health. London, Wellcome Trust Future AdvocacyPublished online 2018; Google Scholar
Loukides M, Mason H, Patil D, et al. Ethics and data science. O’Reilly Media, Inc2018; Google Scholar
Herrera-Perez D, Haslam A, Crain T, et al. A comprehensive review of randomized clinical trials in three medical journals reveals 396 medical reversals. Elife2019; 8. doi:10.7554/eLife.45183•Google Scholar•PubMed
Medicine I of, America C on the LHCS in. Best care at lower cost: the path to continuously learning health care in America. National Academies Press2013; Google Scholar
Smith QW, Street RL, Volk RJ, et al. Differing levels of clinical evidence: exploring communication challenges in shared decision making. Introduction. Med Care Res Rev2013; 70:3S–13. doi:10.1177/1077558712468491•Google Scholar•PubMed
Gijsberts CM, Groenewegen KA, Hoefer IE, et al. Race/Ethnic differences in the associations of the Framingham risk factors with carotid IMT and cardiovascular events. PLoS One2015; 10. doi:10.1371/journal.pone.0132321•Google Scholar•PubMed
Lippman A. The inclusion of women in clinical trials: are we asking the right questions? women and health Protection=Action pour La protection de la santé des femmes. 2006; Google Scholar
Chapman EN, Kaatz A, Carnes M, et al. Physicians and implicit bias: how doctors may unwittingly perpetuate health care disparities. J Gen Intern Med2013; 28:1504–10. doi:10.1007/s11606-013-2441-1•Google Scholar•PubMed
Osoba OA, Welser IVW. An intelligence in our image: the risks of bias and errors in artificial intelligence. Rand Corporation2017; Google Scholar
Seyyed-Kalantari L, Liu G, McDermott M, et al. CheXclusion: fairness gaps in deep chest X-ray classifiers. arXiv:200300827 [cs, eess, stat]. 2020;
Nissenbaum H, Patterson H. Biosensing in context: Health privacy in a connected world, Quantified: biosensing technologies in everyday life. 2016; 79. Google Scholar
Kleinberg J, Ludwig J, Mullainathan S, et al. Discrimination in the age of algorithms. J Leg Anal2018; 10:113–74. doi:10.1093/jla/laz001•Google Scholar
Lipton ZC. The mythos of model interpretability. arXiv:160603490 [cs, stat]. 2017;
Zerilli J, Knott A, Maclaurin J, et al. Transparency in algorithmic and human decision-making: is there a double standard? Philos Technol2019; 32:661–83. doi:10.1007/s13347-018-0330-6•Google Scholar
Nordell J. Opinion | a fix for gender bias in health care? check. the new York times.
Crawford K, Whittaker M, Elish M, et al. The AI now report: the social and economic implications of artificial intelligence technologies in the near-term. 2016; Google Scholar
The All of Us Research Program Investigators. The “all of us” research program. N Engl J Med2019; Google Scholar
Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol2016; 183:758–64. doi:10.1093/aje/kwv254•Google Scholar•PubMed
Society R. Data management and use: governance in the 21st century—A British Academy and Royal Society project. 2017; Google Scholar
Vyas DA, Eisenstein LG, Jones DS, et al. Hidden in plain sight - reconsidering the use of race correction in clinical algorithms. N Engl J Med2020; 383:874–82. doi:10.1056/NEJMms2004740•Google Scholar•PubMed
Armstrong K, Ravenell KL, McMurphy S, et al. Racial/Ethnic differences in physician distrust in the United States. Am J Public Health2007; 97:1283–9. doi:10.2105/AJPH.2005.080762•Google Scholar•PubMed
Kukutai T, Taylor J. Indigenous data Sovereignty: toward an agenda. Anu Press2016; 38. Google Scholar
Department of Health and Social Care. Code of conduct for data-driven health and care technology. 2019;