Article Text

Download PDFPDF

Applications and challenges of AI-based algorithms in the COVID-19 pandemic
  1. Danai Khemasuwan1,
  2. Henri G Colt2
  1. 1 Pulmonary and Critical Care Medicine, Virginia Commonwealth University Medical Center, Richmond, Virginia, USA
  2. 2 Pulmonary and Critical Care Medicine, University of California Irvine, Irvine, California, USA
  1. Correspondence to Dr Danai Khemasuwan, Internal Medicine, Virginia Commonwealth University Medical Center, Richmond, VA 23298-5023, USA; danai.khemasuwan{at}vcuhealth.org

Abstract

The COVID-19 pandemic is shifting the digital transformation era into high gear. Artificial intelligence (AI) and, in particular, machine learning (ML) and deep learning (DL) are being applied on multiple fronts to overcome the pandemic. However, many obstacles prevent greater implementation of these innovative technologies in the clinical arena. The goal of this narrative review is to provide clinicians and other readers with an introduction to some of the concepts of AI and to describe how ML and DL algorithms are being used to respond to the COVID-19 pandemic. First, we describe the concept of AI and some of the requisites of ML and DL, including performance metrics of commonly used ML models. Next, we review some of the literature relevant to outbreak detection, contact tracing, forecasting an outbreak, detecting COVID-19 disease on medical imaging, prognostication and drug and vaccine development. Finally, we discuss major limitations and challenges pertaining to the implementation of AI to solve the real-world problem of the COVID-19 pandemic. Equipped with a greater understanding of this technology and AI’s limitations, clinicians may overcome challenges preventing more widespread applications in the clinical management of COVID-19 and future pandemics.

  • COVID-19
  • critical care
  • pulmonary medicine

Data availability statement

No data are available.

This article is made freely available for use in accordance with BMJ’s website terms and conditions for the duration of the covid-19 pandemic or until otherwise determined by BMJ. You may use, download and print the article for any lawful, non-commercial purpose (including text and data mining) provided that all copyright notices and trade marks are retained.

https://bmj.com/coronavirus/usage

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Summary box

What is already known?

  • Some AI-based algorithms are helpful in the fight against COVID-19, but wider implementation requires prospective validation studies of performance and accuracy as well as collaborative strategies to address ethical and legal issues.

  • Epidemiological models are more reliable than ML-based models to forecast spread of COVID-19 because historical data are insufficient in the early phase of an outbreak.

What are the new findings?

  • AI algorithms using natural language processing helps detect information about possible outbreaks using social media platforms.

  • Machine learning (ML) algorithms are being used to help predict clinical outcomes such as mortality, risk for intubation and risk for requiring intensive care in patients with COVID-19.

  • Artificial intelligence facilitates development and repurposing of drugs and vaccines that may prove helpful to combat the effects of SRS-CoV-2 infection.

Introduction

The COVID-19 pandemic affects healthcare professionals and society at large with unprecedented scale and speed that have not been seen in modern history. Initially, some experts thought the pandemic was a black swan event, rare but with massive impact potential and only retrospective predictability.1 Others, however, said it was clearly predictable, a white swan2 event signalled by sentinel respiratory virus epidemics such as SARS (2003), H1N1 (2009) and MERS (2012).3

Artificial intelligence (AI) and, in particular, machine learning (ML) and deep learning (DL) are used on multiple fronts to help medical scientists combat the effects of COVID-19. With the appropriate input and innovative algorithmic design, AI can recognise patterns, predict outcomes, assist with medical decision-making and help uncover relevant information from data.4 By seamlessly analysing millions of data points, AI is a potential game changer in the battle against the pandemic. Whereas most conventional statistical methods are unable to handle ‘big data’ in its various forms (eg, texts and images), for example, ML and DL help analyse highly complex non-linear interactions in massive datasets5 6 in order to isolate relationships between predictors and outcomes. Not surprisingly, there has been an explosion of capital investment in AI-based medical imaging, and the AI healthcare market is expected to reach $6.6 billion by 2021.7

Research design

For this narrative review, several requisites of ML and DL in COVID-19 are described based on results from almost 2000 articles published since the beginning of the pandemic in 2019. Articles were identified using a PubMed literature search performed on 27 March 2021 using the search terms ‘artificial intelligence OR machine learning OR deep learning AND COVID-19’.

Given the narrative nature of our work, articles were carefully selected to complement other reviews and to provide readers with a general understanding of various roles for ML and DL algorithms in the COVID-19 19 pandemic, particularly (1) outbreak detection and contact tracing; (2) disease forecasting; (3) detection on medical imaging; (4) prognostication and (5) drug and vaccine development. We also describe existing challenges, solutions and future directions affecting further implementation of AI-based algorithms in response to the COVID-19 pandemic.

Overview of AI-based ML and DL

AI refers to the ability of computer systems to perform tasks and mimic human intelligence. ML is considered as a subset of AI. ML algorithms are broadly categorised into supervised learning and unsupervised learning. Supervised ML aims to create predictive models using regression analysis or classification systems. Datasets must be labelled to predict known outcomes. Once algorithms are successfully trained, they are capable of outcome predictions when applied to a new dataset. The process requires a large amount of data and labelling input can be time consuming and labor intensive. Commonly used supervised ML techniques are tree-based models (random forest, gradient boosted trees), support vector machine models and K-nearest neighbour algorithm. In terms of performance metrics, some parameters are commonly used to determine a performance for classification problem. The receiver operator characteristic (ROC) curve is a probability curve that plots the true positive rate (y-axis) against false positive rate (x-axis) at various threshold values. The area under the curve (AUC) is the measure of the ability of a classifier to distinguish between classes and is used as a summary of the ROC curve. The higher the AUC, the better the performance of the model at classifying between the positive and negative groups. F1 Score is a common parameter to determine a performance of classification model. It is a harmonic mean of precision and recall. The formula of F1 Score is shown as follows: 2×precision×recall/(precision+recall). Precision is a number of items correctly identified as positive out of total items identified as positive (true positive/(true positive+false positive)). Recall (or sensitivity) is a number of items correctly identified as positive out of total true positives (true positive/(true positive+false negative)). A higher F1 Score means the model has lower false positive and false negative.

Unsupervised ML identifies clusters in unlabeled data and detects previously unknown patterns. This identifies functions that map input datasets into clusters so data points within each cluster are more similar than data points in other clusters.8 An example of unsupervised ML is the data mining of electronic medical records,9 where goals are to reveal patterns for patients who share clinical, genetic or molecular characteristics that might theoretically respond to targeted therapies.

DL is a subset of ML inspired by the fundamental structure of neurons and synapses in human neocortex. DL consists of a multilayered structure of algorithms called neural networks. Each layer consists of nodes (or neurons). The individual layers of neural networks function as filters which extract features from input. A large number of connections between each layer can be weighted differently based on the output if these neurons arrive at the correct answer. The connection between neurons mimics the synapses in human neocortex. A neuron will send the output signal if weighted summation of its input signals reach activation threshold which is non-linear. In brief, a training process of neural network can be summarised as follows (figure 1): (1) feeding forward of the data points through the network get the outputs, (2) use backpropagation algorithm to calculate the gradient of the loss function with respect to each weight and bias, (3) use gradient descent to update the weights and biases at each layer and (4) iterate above steps to minimise output error. DL systems use labelled datasets to classify samples into different categories. DL is used for internet data searches, language translation and speech recognition on smart phones.

Figure 1

Graphical description of training artificial neural network. (1) Feeding forward of the data points through the network get the outputs. (2) Use backpropagation algorithm to calculate the gradient of the loss function with respect to each weight and bias. (3) Use gradient descent to update the weights and biases at each layer. (4) Iterate above steps to minimise output error.

In the following paragraphs, we provide examples of how medical scientists and clinicians currently use innovative ML and DL algorithms to combat COVID-19, the disease caused by the SARS-Cov-2 virus.

Outbreak detection and contact tracing

Outbreak detection focuses on developing effective surveillance, prevention and operational capabilities for detecting biological hazards in the community.10 Natural language processing (NLP) is a subfield of AI focused on programming machines to read, understand and extract meaning of human languages. In other words, NLP represents the automatic handling of unstructured data such as speech or text. NLP can consistently analyse language-based data without fatigue or bias, which is essential for considering the staggering amount of unstructured data generated daily in social media, search queries and electronic health records. NLP, therefore, is an important tool to fully analyse text data efficiently.

For example, the popularity of online social networks such as Twitter has created massive social interaction among users with a consequent proliferation of big data. Supervised and semisupervised algorithms were used to investigate the use of Twitter data to deliver signals for syndromic surveillance (asthma/difficulty breathing).11 The advantage of semisupervised over classic supervised algorithms enables authors to minimise labelling efforts required to build a classifier with comparable performance (table 1).

Table 1

Summary of AI approaches against COVID-19 pandemic

Another example is how big data is used to develop an almost real-time pandemic surveillance system through Google Flu Trends (GFT).12 GFT is a search log-based detection method which is a form of crowdsourced epidemiology and provides estimate influenza-like illness (ILI) in the general population. Although GFT is not an AI approach itself, it processed billions of individual searches from Google web search logs.13 The frequency of certain queries is highly correlated with the percentage of physician visits for influenza-like symptoms. The model can be used for real-time tracking of influenza activity in similar geographic areas.14 Although an initial report stated that GFT predictions were 97% accurate compared with Centers for Disease Control and Prevention (CDC) data,13 subsequent reports demonstrated GFT consistently overestimated ILI incidence. One of the most common explanation for GFT’s error was a media-related panic in the 2012–2013 influenza season. Other possible culprits were changes in Google’s search algorithm itself.15 The event led to a GFT shut down in 2013.16

In terms of outbreak detection for COVID-19, rather than entrusting traditional surveys and clinical reports, in which data collection processes are labour intensive and costly, social networks and governmental data allow exploratory analyses of geographical and temporal information.17 For example, a rapid recognition of the SARS-CoV-2 outbreak was, in part, related to an AI epidemiology algorithm called ‘BlueDot’. Big data played an important role in the BlueDot algorithm. It used NLP to extract data from hundreds of thousands of sources in foreign-language news reports and official announcements to provide news of potential epidemics.18 BlueDot’s data scientists manually classified the data and developed relevant keywords for NLP to scan the data source. As a result, the algorithm identified suspected cases for human data experts to further analyse. The BlueDot algorithm predicted the early spread of COVID-19 outside of Wuhan based on travel data generated from the International Air Transport Association.18 BlueDot algorithms also successfully predicted the international spread of the Zika virus in South Florida in 2016.19 Another NLP technique that can be used to monitor public opinion in regard to infectious disease outbreaks and governmental policies is called ‘sentiment analysis’. Data analysts use this method to extract online information from social media platforms in order to understand the public’s reactions toward disease outbreaks.20 The data in text format are separated into basic components (eg, sentences, phases, etc) that are labelled by human experts to assign and weight sentiment scoring. This also provides government agencies with valuable insights for directing efforts towards public education (table 1).21 22

Digital contact tracing helps prevent the wider spread of a disease. It is an effective damage control method for minimising the development of an outbreak after initial exposures.23 Generally, the process identifies an infected individual with a follow-up period of 14 days after the reported exposure. Several countries have mobile applications for contact tracing. Bluetooth and global positioning system are the main technologies for proximity tracing.24 China is one of the first country that deployed centralised consolidation of personal and mobile phone tracking data.25 The proprietary AI-generated health code (green, amber and red) produces individualised risk score for Chinese citizen.26 The algorithm behind the system is not entirely clear and made inaccessible to public.27 A citizen with a red code is instructed to limit their mobility in certain geographic locations. The violation can be marked down in the China’s social credit system which leads to devastating personal penalties from public authorities. In contrast, Singapore has taken a more privacy-aware approach, including an opt-in decision support application that helps health authorities track and communicate with at-risk contacts of infected users.28 This app uses a protocol for logging bluetooth encounters between two participating devices that exchange temporary identifiers. If users are infected, they are asked to share their encounter history with public health officials.28

In the USA (California), the Google/Apple Exposure Notification was recently deployed on smartphones.29 This technology uses bluetooth to notify users of potential exposure to other users with a diagnosis of COVID-19, regardless of whether the users know each other.29 The digital contact tracing process performs virtually in real time and is faster when compared with techniques used in non-digital platforms. However, privacy control and data security breaches can lead to public outcries, particularly in case of data privacy violations. For this reason, the implementation of contact tracing apps remains controversial in many countries.30

Forecasting an outbreak

Various mathematical and statistical population models have been used to forecast the extent and spread of outbreaks.31 In Brazil, an ML approach was used to forecast COVID-19 cumulative cases at 3 and 6 days in the future. Support vector regression and stacking ensemble reach a better performance among the other three ML-based models32 (table 1). Short-term prediction ML models may assist Brazilian governments and public health agencies in decision-making and determination of public policy. However, proposed models are specifically trained with datasets from the Brazilian health system. Therefore, it would be technically challenged to apply this model in other countries.

Unfortunately, ML-based prediction models have not been very reliable, partly because historical data are insufficient and because in the early phase of the pandemic, investigators used results from potentially biased studies.33 In addition to historical data, ML needs large amounts of training data to generate accurate prediction algorithms.34 Coordinating data collection, however, is challenging even for smaller geographical areas. In Singapore, a health national consortium was required to assemble detailed data from health systems, hospitals and clinics.35 Even though financial and logistical resources are plentiful, large scales of data may not be available during the early stage of an outbreak, which ironically is when predictions are most needed. Therefore, many models used to track and forecast COVID-19 used epidemiological models such as Susceptible-Infected-Recovered (SIR) model36 37 or Susceptible, Infected, Diagnosed, Ailing, Recognized, Threatened, Healed and Extinct (SIDARTHE) model to justify control measures such as social distancing, widespread testing and contact tracing.38 These mechanistic models mimic the dynamic pattern of COVID-19 spread and are used to simulate future transmission scenarios under various assumptions and sensitivity analyses.39

Detection of COVID-19 on medical imaging

The clinical features of SAR-CoV-2 are sometimes indistinguishable from other viral infections. The chest X-rays (CXRs) of patients with COVID-19 typically reveal non-specific bilateral infiltrates. Meanwhile, CT scan may show non-specific ground-glass opacities and subsegmental consolidation. There is growing effort, however, to train DL to diagnose COVID-19 using chest imaging. Convolutional neural network (CNN) is a form of DL which is designed to process input images. The structural architecture of CNN is following a hierarchical model that creates a funnel-like framework to provide a fully connected layer. The last layer is a fully connected output layer which provides final probabilities for each label as a final classification (figure 2).6 CNN has achieved remarkable performance and accuracy in various imaging applications through training on existing datasets. However, most of the published studies used relatively small datasets (<1000 CXR images of COVID-19 cases).40–44 Transfer learning is an ML approach that can help investigators overcome limited data sizes. A CNN is pretrained with results of a previous training round from a different domain. This pretrained CNN is used as a basis for initialising data which are then fine-tuned using limited available medical datasets with results. This approach appears to outperform fully trained networks under certain circumstances.45 Minaee et al applied transfer learning approach to train publicly available CNNs (ResNet18, ResNet50, SqueezeNet and DenseNet-121) to identify COVID-19 disease in analysed CXR images. These CNNs achieved a sensitivity rate of 98% (±3%), while having a specificity rate of around 90%.46

Figure 2

A basic component of convolutional neural network (CNN). A sequence of layers. Each layer transforms one volume of activations to another through a differentiable function. Three main types of layers build CNN architectures: convolutional layer, pooling layer and fully connected layer. The convolution layers merge two sets of information with the use of a filter to produce a feature map as an output. The pooling layers reduce the number of parameters and computation in the network. The fully connected output layer provides the final probabilities for each label as a final classification.

During several months after the onset of the pandemic, the CXR dataset of COVID-19 became more publicly available in tertiary medical centres. Webhe et al trained and validated the DL algorithm named DeepCOVID-XR on 14 788 images and externally tested the algorithm on 2214 images. Positive cases CXRs were confirmed with real-time polymerase chain reaction results for SARS-CoV-2. This model was built on one of the largest datasets reported to date. The performance of DeepCOVID-XR is comparable to that of expert thoracic radiologists (AUC 0.88 for DeepCOVID-XR and 0.85 for expert radiologist (p=0.13).47 The matching performance of expert radiologists can be clinically useful as the model can be used as an automated tool to rapidly triage patients with suspicious CXR for isolation which can mitigate unnecessary exposure, especially in the emergency room or urgent care setting. However, the algorithm requires prospective validation in a different environment, especially where COVID-19 is not a predominant cause of viral pneumonia.47

The number of studies using ML and DL to diagnose COVID-19 using chest CT scan is growing rapidly. Most algorithms demonstrate high performance and accuracy.48 However, these studies have a high heterogeneity in terms of available datasets, geography and specification of imaging which may affect the consistency and uniformity of results. While imaging alone cannot be used solely to diagnose COVID-19, roles for AI are promising and provide clinical support systems for clinicians, particularly to help flag suspicious cases.48 Further studies are needed to determine how image-based AI diagnostics systems will comply with regulatory and quality control requirements.49 Regardless of which models are used, performance should be validated on a diverse dataset and demonstrate effectiveness in the clinical setting.

Prognostication

The limited capacity of intensive care units makes the availability of predictive models that forecast disease severity crucial to healthcare professionals involved in care-giving, triage as well as public policy. Yan et al 50 developed a decision tree-based model (supervised XGBoost classifier) using supervised ML to predict mortality based on three serum biomarkers (serum lactate dehydrogenase, percentage of lymphocytes and high-sensitive C reactive protein). This model demonstrated an AUC score for validation sets of 95.06%±2.21%, suggesting a simple triage tool may be used to identify high-risk patients and allocate healthcare resources accordingly. In a different study, Singh et al 51 used an automated risk-score system integrated into electronic health records to construct a proprietary early-warning prediction model (the Epic Deterioration Index) with a fair discrimination value (AUC of 0.76 (95% CI 0.68 to 0.84) to predict the probability of hospitalised patients requiring intensive care.

In patients with hypoxemic respiratory failure from severe pneumonia, the ROX Index (SpO2, FiO2 and respiratory rate) has been used to predict a failure of high-flow nasal cannula support and need further ventilatory support with intubation.52 A supervised ML model was also developed to predict intubation among hospitalised patients with COVID-19.53 The model was based on the first 24 hours of index admission. Elixhauser comorbidity measures and time-series data were used to fit a random forest classifier to predict the intubation risk. The algorithm outperformed the ROX Index, demonstrating an area under the receiver characteristic curve (AUC) of 0.84 for the ML model and 0.64 for the ROX Index.52 53

Drug and vaccine development

Computational methods for screening potential compounds to target protein have been shown to improve success rates and shorten time of drug discovery.54 DL-based models have impressive performance for protein–ligand binding prediction and drug development,55 similar to how ML-based approaches were used to repurpose drugs for in vitro testing against Ebola virus.56 57 Thus, existing drugs might be repurposed to treat COVID-19. For SARS-CoV-2, there appear to be eight viral proteins which can be used as potential targets, including RNA-dependent RNA polymerase, 3-chymotrypsin-like (3CL) protease, papain-like protease, helicase, spike (S) glycoprotein, exonuclease, endoRNAse, 2′-O-ribose methyltransferase and envelope protein.58 Hu et al used multitask neural networks to identify potential therapeutic agents. The amino acid sequences of these proteins were extracted from the National Center for Biotechnology Information. The virus-specific dataset is achieved from the Global Health Drug Discovery Institute. The output of this model is the score to estimate binding affinity (pKa) between drug and target against SARS-CoV-2. The author suggested 10 promising drugs as potential SARS-CoV-2 inhibitors.58 Abacavir and darunavir showed a high binding affinity with multiple proteins of SARS-CoV-2. There is an ongoing clinical trial of darunavir against COVID-19 (ChiCTR2000029541). Beck et al used unsupervised ML to predict an inhibitory potency of atazanavir and remdesivir against the SARS-CoV-2 3CL proteinase.59 Remdesivir has been widely used in management of COVID-19 infection,60 although its effects on disease course and survival are less remarkable than originally thought.61

SARS-CoV-2 uses cellular receptors to enter cells via endocytosis. One known regulator of endocytosis is AP-2-associated protein kinase 1 (AAK-1). Stebbing et al used Monte Carlo tree search to discover knowledge graphs and identify AAK-1 inhibitors. Monte Carlo tree search is a common iterative search algorithm which is usually used to predict a desired solution. One of several AAK-1 inhibitors provide both antiviral and anti-inflammatory effects, ultimately identifying the antirheumatoid arthritis drug, baricitinib, an orally administered, selective inhibitor of janus kinase 1 and 2, as a potential treatment warranting clinical investigation.62–64 In a double-blind, randomised control trial, a combination of baricitinib and remdesivir was shown to reduce recovery time and accelerate clinical improvement in patients with COVID-19 non-invasive ventilation and high-flow oxygen.65

The use of hydroxychloroquine was deeply scrutinised after a publication by Mehra et al 66 in the Lancet. The authors claimed to obtain deidentified data by automated data extraction from a multinational registry. The paper was retracted 2 weeks after its online publication,66 however, because it was not clear whether ML was properly used to extract data and because of possible inconsistencies in the data used, including from an unrealistically high number of electronic health records in Africa, as well uncertainties regarding the timing of data collection in the UK.

In addition to revolutionising the biopharmaceutical industry, AI is positively impacting the field of vaccine development. In regard to biochemistry applications, AI helps scientists better understand the protein involved in SARS-CoV-2 and search for potential targets.67 ML allows for rapid scanning of the entire viral proteome, allowing faster and potentially less expensive scientific inquiry than older techniques used for vaccine development.67 An ML-based reverse vaccinology was used to predict potential protein targets for development of COVID-19.68 Vaxign-ML is a supervised ML (eXtreme Gradient Boosting) is designed to predict the protegenicity score of all SARS-CoV-2 isolate Wuhan-Hu-1 (GenBank ID: MN908947.3) proteins.69 70 A protein with higher protegenicity score is considered a stronger vaccine candidate with higher utility toward protection. This model identified six proteins, including the S protein and five non-structural proteins. These protein candidates were predicted to be adhesins, which are crucial to viral adherence and host invasion.71 In 2018, Google DeepMind used a DL algorithm, AlphaFold,72 to predict the distance and the distribution of angles between amino acid residues.73 This model is trained on structures extracted from a protein databank. This dataset generated 31 247 protein domains. This technology has progressed to its next generation with AlphaFold 2. This algorithm achieved the highest score from critical assessment of protein structure prediction, surpassing the initial 2018 version of AlphaFold74 and providing critical information to predict the structural proteins related to SARS-CoV-2 for vaccine development.73

Discussion

Limitations and possible solutions to using AI in the COVID-19 pandemic

Artificial intelligence provides opportunities to improve quality of care and accelerate the evolution of precision medicine. Its limitations, however, are increasingly described in the growing field of AI ethics, which helps study AI’s impact on technology, individual lives, economics and social transformation.75

One of the hurdles facing further implementation of AI in healthcare relates to a lack of prospective validation studies and difficulty improving an algorithm’s performance. Many investigations are validated in silico by dividing a single pre-existing dataset into a training and testing dataset. External validation using an independent dataset, however, is critical prior to implementation in a real-world environment, and inherently opaque black-box type models should be avoided as much as possible because of their lack of interpretability and explainability. In fact, the term ‘black box’ defines one of the most important intrinsic drawbacks of using ML and DL type algorithms. Several prediction models also have black-box problems which can lead to difficulty in determining which features are being used to define output.76 Clinicians may not feel confident using ML-based predictive models to evaluate clinical scenarios or apply DL to automate several routine tasks (such as interpretation of CXR and CT scan of the chest) and may have difficulties troubleshooting incorrect ML-based assessments.77 One method that can alleviate interpretability issues uses the surrogate model. The concept is similar to other fields with difficult to measure and expensive outcomes and in which surrogate outcomes may be less costly and easier to use. In ML, the surrogate model is an interpretable model trained to approximate the predictions of a black-box model. Surrogate models can be built on the same dataset of any black-box models by choosing an interpretable model type (eg, a linear model or a decision tree). The surrogate model is then trained on the same dataset. After training, investigators are able to evaluate the performance of the surrogate model against the ‘black-box’ version on the dataset.78

As for any technology that involves individual health records, the right to privacy is another important ethical obligation that must be adhered to by AI-based applications.79 AI may require both access to and sharing of personal information in order to generate trends, make predictions and conduct assessments.79 One example of a mitigating strategy to preserve protect privacy uses a novel process called ‘federated learning’. This technique enables transferring data collaboratively without moving patient data beyond the firewall of the patient’s institution. Therefore, aggregate local ML models can analyse confidential, decentralised data without transferring sensitive information to a central server.79 Multiple organisations can thus share data without compromising individual privacy. Federated learning techniques may also prompt research to improve physician work flow and patient care.80 Prospective studies and randomised controlled trials (RCTs) are needed for these techniques to be more widely implemented.

Challenges for clinicians using AI during the COVID-19 pandemic

There is great interest in examining the potential benefit of using AI to support responses to the COVID-19 pandemic across a wide range of clinical practices.81 As of this writing, however, AI-based applications appear to have only a small impact on clinical management, and several challenges prevent more widespread implementation.82

First of all, AI-based models of disease detection and surveillance rely heavily on data digitisation such as picture archiving and communication system, mobile phones and internet access. These resources are not widely available in less developed countries.83 Efforts are needed to provide and increase infrastructure for the digitisation of healthcare data and provide low-cost AI technologies potentially with applications using readily accessible imaging alternatives such as CXR and even electronic health records.

Second, legal barriers present a major hurdle. Determining legal liabilities associated with adverse outcomes when physicians use AI algorithms in patient care is controversial.84 Currently, AI has been used as a clinical decision support tool, rather than to replace clinical judgement, implying that accountability for mistakes remains with the clinician. However, distinguishing liability between AI-derived protocols and clinical care providers is not easy. AI-based decision support tools (such as those using DL) may be limited by their lack of generalisability, privacy concerns and explainability. In addition, AI might mistakenly classify healthy individual as COVID-19 positive (type I error-false positive) or classify infected patient as COVID-19 negative (type II error-false negative) due to the risk of incomplete datasets for training.85 Furthermore, our present legal system has not yet adapted to accommodate all possible legal challenges,86 making this uncharted territory, especially for patients suffering from COVID-19. Consequently, healthcare professionals may hesitate to adopt AI-based applications in clinical practice.

Third, it is a generally accepted that peer-reviewed, RCTs are the gold standard of evidence-based medicine. Similar to the field of physics, in the AI community of data scientists, many studies are published as preprints and subjected to critical analysis prior to submission and sometimes instead of publication in peer-reviewed journals. Furthermore, as of this writing, there are very few RCTs of AI systems in clinical medicine87 88 and to our knowledge, there are no RCTs studying the use of AI systems in the clinical management of patients with COVID-19. In medicine, there are only 10 randomised trial registrations of DL algorithms.89 The lack of such studies might be a significant obstacle to adopting AI algorithms in clinical practice. Studies of ML and DL should follow best practice recommendations. Prospective RCTs of AI algorithms are warranted to ensure that models are safe and efficient prior to widespread deployment in a real-world clinical practice.

Finally, comparing algorithms objectively across published studies is challenging because of the variable methodologies being used in different populations with different sample distributions and characteristics. Sample sizes are often small, there is frequent heterogeneity between samples and performance matrices may be different. Therefore, it is difficult for clinicians to determine which algorithm is likely to perform best for their particular patients. While one strategy to overcome these obstacles is to use independent local test sets in to reasonably compare the performance of various algorithms in a representative sample of a population,90 guidelines from medical societies regarding AI-related research might also prove to be helpful.

Conclusions

Artificial intelligence driven by big data is fueling the fourth industrial revolution.91 Similar to prior revolutionary technologies such as the internet, AI is fundamentally changing human society and our approach to public health. Nonetheless, we cannot elaborate, study and execute effective AI-based strategies without the dedicated input and ethical deliberations from collaborating, trustworthy, experienced clinicians and forward-thinking medical data scientists. While some may argue it is premature to unquestionably proclaim the added value of using AI in our response to COVID-19, the global effort to organise massive datasets and elaborate new strategies is clear. This collaborative effort across knowledge domains will help us prevail over this crisis, and, like other pandemics in human history, COVID-19 will most surely pass, but future pandemics are inevitable. Whether the next one is a ‘white swan’ event for which we are well prepared may depend on whether innovative AI-based algorithms can accurately provide early warnings and help experts design and implement effective strategies for disease diagnosis, control and prevention.

Data availability statement

No data are available.

References

Footnotes

  • Contributors DK and HGC made substantial contributions to conception and design and/or acquisition of data and analysis of this manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.