Review of study reporting guidelines for clinical studies using artificial intelligence in healthcare

Susan Cheng Shelmerdine; Owen J Arthurs; Alastair Denniston; Neil J Sebire

doi:10.1136/bmjhci-2021-100385

Article Text

PDF

XML

Review

Review of study reporting guidelines for clinical studies using artificial intelligence in healthcare

http://orcid.org/0000-0001-6642-9967Susan Cheng Shelmerdine1,
Owen J Arthurs1,
Alastair Denniston2 and
http://orcid.org/0000-0001-5348-9063Neil J Sebire3

¹Radiology, Great Ormond Street Hospital NHS Foundation Trust, London, UK
²Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK
³Digital Research, Informatics and Virtual Environments Unit (DRIVE), London, UK

Correspondence to Dr Susan Cheng Shelmerdine; susie.shelmerdine{at}gmail.com

Abstract

High-quality research is essential in guiding evidence-based care, and should be reported in a way that is reproducible, transparent and where appropriate, provide sufficient detail for inclusion in future meta-analyses. Reporting guidelines for various study designs have been widely used for clinical (and preclinical) studies, consisting of checklists with a minimum set of points for inclusion. With the recent rise in volume of research using artificial intelligence (AI), additional factors need to be evaluated, which do not neatly conform to traditional reporting guidelines (eg, details relating to technical algorithm development). In this review, reporting guidelines are highlighted to promote awareness of essential content required for studies evaluating AI interventions in healthcare. These include published and in progress extensions to well-known reporting guidelines such as Standard Protocol Items: Recommendations for Interventional Trials-AI (study protocols), Consolidated Standards of Reporting Trials-AI (randomised controlled trials), Standards for Reporting of Diagnostic Accuracy Studies-AI (diagnostic accuracy studies) and Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis-AI (prediction model studies). Additionally there are a number of guidelines that consider AI for health interventions more generally (eg, Checklist for Artificial Intelligence in Medical Imaging (CLAIM), minimum information (MI)-CLAIM, MI for Medical AI Reporting) or address a specific element such as the ‘learning curve’ (Developmental and Exploratory Clinical Investigation of Decision-AI) . Economic evaluation of AI health interventions is not currently addressed, and may benefit from extension to an existing guideline. In the face of a rapid influx of studies of AI health interventions, reporting guidelines help ensure that investigators and those appraising studies consider both the well-recognised elements of good study design and reporting, while also adequately addressing new challenges posed by AI-specific elements.

healthcare sector
BMJ health informatics
medical informatics

Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

https://doi.org/10.1136/bmjhci-2021-100385

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Recent, rapid developments in computational technologies and increased volumes of digital data for analysis have resulted in an unprecedented growth in research activities relating to artificial intelligence (AI), particularly within healthcare. This volume of work has even led to several high impact journals launching their own subjournals within the ‘AI healthcare’ field (eg, Nature Machine Intelligence,1 Lancet Digital Health,2 Radiology: Artificial Intelligence).3 High-quality research should be accompanied by transparency, reproducibility and validity of techniques for adequate evaluation and translation into clinical practice. Standardised reporting guidelines help researchers define key components of their study, ensuring that relevant information is provided in the final publication.4 Studies pertaining to algorithm development and clinical application of AI however, have brought unique challenges and added complexities in how such studies are reported, assessed and compared in relation to elements that are not conventionally prespecified in traditional reporting guidelines. This could lead to missing information and high risk of hidden bias. If these actual or potential limitations are not identified, then it may lead to tacit approval through publication which in turn may support premature adoption of new technologies.5 6 Conversely well-designed, well-delivered studies that are poorly reported may be judged unfavourably due to being adjudged to have a high risk of bias, simply due to a lack of information.

Inadequacies of reporting of AI clinical studies are increasingly well-recognised. In 2019, a systematic review by Liu et al7 reviewed over 20 500 articles, but found that fewer than 1% of these were sufficiently robust in their design and reporting allowing independent reviewers to have confidence in their claims. Similarly Nagendran et al8 identified high levels of bias in the field. In another study,9 it was reported that only 6% of over 500 eligible radiological-AI research publications performed any external validation of their models, and none used multicentre or prospective data collection. Similarly most studies using machine learning (ML) models for medical diagnosis10 did not have adequate detail on how these were evaluated nor sufficient detail for these to be reproduced. Inconsistencies in how ML models from electronic health records have also been reported, with details regarding race and ethnicity of participants omitted in 64% of studies, and only 12% of models being externally validated.11

In order to address these concerns, adapted research reporting guidelines based on the well-established EQUATOR Network (Enhancing the QUAlity and Transparency Of health Research)12 13 and de novo recommendations by individual societies have been published, with a greater relevance for AI research. In this review, we highlight those that will cover the majority of healthcare focused AI-related studies, and explain how they differ to the well-known guidance for non-AI related clinical work. Our intention is to raise awareness of how such studies should be structured, thereby improving the quality of future submissions and providing a helpful aid for researchers, peer reviewers and editors.

In compiling a detailed, yet relevant list of study guidelines, we reviewed the EQUATOR network13 website for those containing the terms AI, ML or deep learning. A separate search was also conducted using Medline, Scopus and Google Scholar databases for publications using the same search terms with the addition of ‘reporting guideline’, ‘checklist’ or ‘template’. Opinion pieces were excluded. Articles were included where the description of the recommendations were provided, and published at time of the search (March 2021).

Types of research reporting guidelines

An ideal reporting guideline should be a clear, structured tool with a minimum list of key information to include within a published scientific manuscript. The EQUATOR Network13 is the international ‘standard bearer’ for reporting guidelines, committed to improving ‘the reliability and value of published health research literature by promoting transparent and accurate reporting and wider use of robust reporting guidelines’. Since the landmark publication of Consolidated Standards of Reporting Trials (CONSORT),14 the network has overseen the development and publication of a number of guidelines that address other types of study design (eg, diagnostic accuracy studies). The EQUATOR guidelines are centrally registered (available via a core library) which ensures adherence to robust methodology of development and avoids redundancy of parallel initiatives to address the same issue. Importantly these guidelines are not medical specialty specific but are focused on the type of study, which helps ensure that there is a consistent approach and quality for addressing the same study design. It is recognised that certain specific scenarios may require specific extensions to these guidelines. For example, the increasing recognition of the importance of patient-reported outcomes (PROs) has led to the development of Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT-PRO)15 and CONSORT-PRO.16 In a similar way, the specific attributes of AI as an intervention, has led to a number of AI extensions, both published and in process, which build on the robust methodology of the original EQUATOR guidelines, while ensuring AI-specific elements are also addressed.

In parallel to the work of the EQUATOR network, a number of experts and institutions have developed their own recommendations for good practice and reporting. In contrast, these start with the intervention (ie, AI) rather than the study type (ie, randomised controlled trial (RCT)), and therefore, cover essentially the same territory. They vary in depth, and there can be differences in nuance depending on their primary purpose. For example some have originated from the need to support reviewers and editorial staff (‘is this complete and is it good enough?’), whereas others are addressing at building a shared understanding of appropriate design and delivery (‘this is what good looks like’).

Given the number of different reporting guidelines in this area, there is value in setting them in context to help support users in understanding which is most appropriate for a particular setting (table 1). Ultimately the most important elements of a high-quality study are contained within the methodology of the study design itself and not within the intervention. It is these elements that help minimise the major biases that all studies must address. In line with leading journals, we would, therefore, recommend starting with the guideline that addresses that particular study design (eg, CONSORT14 for an RCT). If an AI extension is already in existence for that study type then these are clearly appropriate for that study (eg, CONSORT-AI).17–19 If no such -AI extension exists then we recommend using the appropriate EQUATOR guideline (eg, Standards for Reporting of Diagnostic Accuracy Studies (STARD)20 for diagnostic accuracy studies), but supplementing with AI-specific elements recommended in other guidelines (eg, SPIRIT-AI,21–23 CONSORT-AI17–19 or the non-EQUATOR guidelines described below). Indeed all the guidelines considered here contain valuable insights into the specific challenges of AI studies, and are recommended reading into good practice for design and reporting.

View this table:

Table 1

Summary of reporting guidelines for common study types used in radiological research, and their corresponding guideline extensions where these involve artificial intelligence

EQUATOR network guidelines

Clinical trials protocols

The quality of a study and the trustworthiness of its findings, starts at the design phase. The study protocol should contain all elements of the study design, sufficient for independent groups to carry out the study and expect replicability. Prepublication of the study protocol, helps avoid biases such as post-hoc assignment of the primary outcome in which the triallist can ‘cherry pick’ one of a number of outcomes that point in the desired direction.

Guidance for recommended items to include in a trial protocol are provided by the SPIRIT Statement (latest version published in 2013),24 which has been recently adapted for trials with an AI-related focus, termed the ‘SPIRIT-AI’ guideline.21–23 This adaptation includes an additional 15 items (12 extensions, 3 elaborations) to the existing 33-item SPIRIT 2013 guideline. The key differences are outlined in table 2, mostly focused on the methodology of the trial, (accounting for eight extensions, one elaboration) with emphasis on inclusion/exclusion of data and participants, dealing with poor quality data and how the AI intervention will be applied to and benefit clinical practice.

View this table:

Table 2

Additional items proposed for studies relating to AI intervention clinical protocols within the SPIRIT-AI statement (in addition to the SPIRIT 2013 statement)

Clinical trials reports

While most AI studies are currently at early-phase validation stages, those evaluating the use of ‘AI-interventions’ in real world setting are fast emerging, and will become of increasing importance, since these are required for real-world clinical benefit demonstration. RCTs are the exemplar study design in providing a robust evidence basis for efficacy and safety of a given intervention, with the CONSORT statement, 2010 version14 providing a 25-item checklist for the minimum reporting content in such studies. An adapted version, entitled the ‘CONSORT-AI’ extension17–19 was published in September 2020 for ‘AI intervention’ studies. This includes an additional 14 items (11 extensions, 3 elaborations) to the existing CONSORT 2010 statement, the majority of which (8 extensions, 1 elaboration) relate to the study participants and details of the ‘AI intervention’ being evaluated, which are similar to those additions already described in the SPIRIT-AI extension. Specific key differences in the new guideline are outlined in table 3. Although not specific for AI interventions, some aspects of the checklist Template for Intervention Description and Replication, 201425 may be a helpful addition when reporting details of the interventional elements of a study (ie, as an extension of item 5 of the CONSORT 2010 statement or as item 11 of the SPIRIT 2013 statement). These include details regarding any modifications of the intervention during a study, including how and why certain aspects were personalised or adapted. There are currently no publicly proposed plans to publish an ‘AI’ extension to this guideline to the best of our knowledge.

View this table:

Table 3

Additional criteria to be included for studies relating to AI interventions within the CONSORT-AI statement (in addition to the CONSORT 2010 statement)

Diagnostic accuracy studies

The STARD statement, 2015 version20 is the most widely accepted reporting standard for diagnostic accuracy studies. A steering group has been established to devise an AI-specific extension to the latest version of the 30-item STARD statement (called the STARD-AI extension.26 At the time of writing this is undergoing an international consensus survey among leaders in the AI field for suggested adaptations and pending publication.

Prediction models

Extensions to reporting guidelines describing prediction models that use ML have been announced, and are anticipated for publication soon. These include adapted versions of the ‘Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis’ (TRIPOD), 2015 version,27 which will be entitled ‘TRIPOD-AI’,28 29 and supported by the ‘Prediction model Risk Of Bias Assessment Tool’ (PROBAST, 2019 version)30 which is proposed to be entitled PROBAST-ML.28 29

Human factors

Another upcoming guideline, focused on the evaluation of the ‘human factors’ in algorithm implementation, has been announced: the checklist (Developmental and Exploratory Clinical Investigation of Decision-support systems driven by AI).31 This checklist is intended for use in early small-scale clinical trials that evaluate and provide information on how algorithms may be used in practice, bridging the gap between the algorithm development/validation stage (which would follow TRIPOD-AI, STARD-AI or Checklist for Artificial Intelligence in Medical Imaging (CLAIM)), but before large-scale clinical trials of AI interventions (where the CONSORT-AI would be used). Publication is anticipated to be late 2021 or early 2022.

Systematic reviews

Given the increasing volume of radiological AI-related research for a growing variety of conditions and clinical settings, it is also likely that we will encounter more systematic reviews and meta-analyses that aim to aggregate the evidence from studies in this field (eg, recent publications have already emerged that summarise research regarding the role of AI in COVID-19.32–34 At present, the ‘Preferred Reporting Items for Systematic Reviews and Meta-analyses’ (PRISMA), 200935 guidelines are the most established for systematic reviews and meta-analyses, with a modified version specifically tailored for meta-analyses relating to diagnostic test accuracies (ie, the PRISMA-Diagnostic Trials of Accuracy (DTA), 2018).36 Currently, there have not been any announcements for an update to these guidelines for AI-related systematic reviews or meta-analyses, and therefore, it is suggested that the PRSIMA 200935 or PRISMA-DTA 201836 guidance should be followed.

In the planning stages for conducting systematic reviews of prediction models, the ‘Checklist for critical appraisal and data extraction for systematic reviews of prediction modelling studies’ (CHARMS, 201437 was developed by the Cochrane Prognosis Methods Group. This was not intentionally created for publications relating to AI per se, but applicable to a wide range of studies, which also happen to include the evaluation of ML models. The developers provide the checklist to help authors frame their review question, design and extract relevant items from published reports of prediction models and guide assessment of risk of bias (rather than in the analysis of these). This checklist will, therefore, be useful to those who wish to plan a review of AI tools that provide a ‘risk score’ or ‘probability of diagnosis’. A tutorial on how to carry out a ‘CHARMS analysis’ for prognostic multivariate models with real-life worked examples has been published38 and may be a helpful resource for readers wishing to carry out similar work. It is worth noting that the authors of CHARMS still recommend reference to the PRISMA 200935 and PRISMA-DTA 201836 statements for the reporting and analysis of trial results, in conjunction with their own checklist for planning of the review design.

Other (NON-EQUATOR network) guidelines

Alternative guidelines have been published by expert interest groups and endorsed by different specialty societies. A few are described here to supplement further reading and interest.

The Radiological Society of North America recently published the ‘CLAIM’39 in 2020, containing elements of the STARD 2015 guideline and applicable for trials addressing a wide spectrum of AI applications using medical images (eg, classification, reconstruction, text analysis, work flow optimisation). This checklist comprises of 42 items, of which 6 are new (pertaining to model design and training), 8 are extensions of pre-existing STARD 2015 items, 14 items are elaborations (mostly relating to methods and results) and 14 items remain the same. Particular emphasis is given to data, the reference standard of ‘ground truth’ and the precise development and methodology of the AI algorithm being tested. These are listed in further detail in table 4, where differences to the STARD 2015 are highlighted. Care should be taken to avoid any confusion with another similarly named checklist entitled ‘minimum information about clinical AI modelling’ (MI-CLAIM),40 which is less of a reporting guideline but a document outlining required shared understanding in the development and evaluation of AI models aimed to serve clinical and data scientists), repository managers and model users.

View this table:

Table 4

Criteria for the CLAIM checklist for diagnostic accuracy studies using AI

It is also worth noting that the American Medical Informatics Association produced a set of guidelines in 2020 termed the ‘MI for Medical AI Reporting’ (MINIMAR),41 specific to studies reporting the use of AI solutions in healthcare. Rather than a list of items for manuscript writing, this guidance provides suggestions for details pertaining to data sources used in algorithm development and their intended usage, spread across four key subject areas (ie, study population and setting, patient demographics, model architecture and model evaluation). There are many similarities with the aforementioned CLAIM checklist, although the key differences include the granularity by which the MINIMAR suggests researchers should explicitly state participant demographics (eg, ethnicity and socioeconomic status, rather than just age and sex) and how code and data can be shared with the wider community.

Conclusion

In conclusion, this article has provided readers an overview of changes to standard clinical reporting guidelines specific for AI-related studies. The fundamental basics of describing the trial setup, inclusion and exclusion criteria, detailing the study methodology and standards used, together with details on algorithm development, should create transparency and address reproducibility. Those which are most relevant for a particular healthcare specialty will depend on the type of research being conducted in that particular field (eg, guidelines for AI-related diagnostic accuracy trials may be more relevant for radiological or pathological specialties, whereas those addressing patient outcomes with the aid of an AI algorithm may be more relevant for oncological or surgical specialties).

Although the reporting guidelines outlined may seem comprehensive, there remain areas that will need to be addressed, such as for economic health evaluation of AI-tools and algorithms (many are currently developed for ‘pharmacoeconomic evaluations’.49 It is likely that future guidelines may take the form of an extension to the widely used CHEERS guidance (Consolidated Health Economic Evaluation Reporting Standards50 51 available via the EQUATOR network.13 Nevertheless, a wide variation in opinion regarding the most appropriate economic evaluation guideline already exists for non-AI related tools, and this may be reflected in future iterations of such guidelines depending on how the algorithms are funded in different healthcare systems.52

The current guidelines outlined here will likely continue to be updated in the light of new understanding of the specific challenges of AI as an intervention and, how traditional study designs and reports need to be adapted.

Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study.

Ethics statements

Ethics approval

Not required

References

↵
More than machines. Nat Mach Intell 2019;1.doi:10.1038/s42256-018-0014-z
↵
1. The Lancet Digital Health
. A digital (r)evolution: introducing The Lancet Digital Health. Lancet Digit Health 2019;1:e1. doi:10.1016/S2589-7500(19)30010-Xpmid:33323234
OpenUrl PubMed
↵
1. Kahn CE,
2. Charles E,
3. Kahn J
. Artificial intelligence, real radiology. Radiol Artif Intell 2019;1:e184001. doi:10.1148/ryai.2019184001pmid:http://www.ncbi.nlm.nih.gov/pubmed/33937786
OpenUrl PubMed
↵
1. Moher D
. Reporting guidelines: doing better for readers. BMC Med 2018;16:233. doi:10.1186/s12916-018-1226-0pmid:http://www.ncbi.nlm.nih.gov/pubmed/30545364
OpenUrl PubMed
↵
1. Bluemke DA,
2. Moy L,
3. Bredella MA, et al
. Assessing radiology research on artificial intelligence: a brief guide for authors, reviewers, and readers-from the radiology editorial board. Radiology 2020;294:487–9.doi:10.1148/radiol.2019192515pmid:http://www.ncbi.nlm.nih.gov/pubmed/31891322
OpenUrl PubMed
↵
1. CONSORT-AI and SPIRIT-AI Steering Group
. Reporting guidelines for clinical trials evaluating artificial intelligence interventions are needed. Nat Med 2019;25:1467–8.doi:10.1038/s41591-019-0603-3pmid:http://www.ncbi.nlm.nih.gov/pubmed/31551578
OpenUrl CrossRef PubMed
↵
1. Liu X,
2. Faes L,
3. Kale AU, et al
. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health 2019;1:e271–97.doi:10.1016/S2589-7500(19)30123-2pmid:http://www.ncbi.nlm.nih.gov/pubmed/33323251
OpenUrl CrossRef PubMed
↵
1. Nagendran M,
2. Chen Y,
3. Lovejoy CA, et al
. Artificial intelligence versus clinicians: systematic review of design, reporting Standards, and claims of deep learning studies. BMJ 2020;368:m689.doi:10.1136/bmj.m689pmid:http://www.ncbi.nlm.nih.gov/pubmed/32213531
OpenUrl Abstract/FREE Full Text
↵
1. Kim DW,
2. Jang HY,
3. Kim KW, et al
. Design characteristics of studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of medical images: results from recently published papers. Korean J Radiol 2019;20:405–10.doi:10.3348/kjr.2019.0025pmid:http://www.ncbi.nlm.nih.gov/pubmed/30799571
OpenUrl CrossRef PubMed
↵
1. Yusuf M,
2. Atal I,
3. Li J, et al
. Reporting quality of studies using machine learning models for medical diagnosis: a systematic review. BMJ Open 2020;10:e034568. doi:10.1136/bmjopen-2019-034568pmid:http://www.ncbi.nlm.nih.gov/pubmed/32205374
OpenUrl Abstract/FREE Full Text
↵
1. Bozkurt S,
2. Cahan EM,
3. Seneviratne MG, et al
. Reporting of demographic data and representativeness in machine learning models using electronic health records. J Am Med Inform Assoc 2020;27:1878–84.doi:10.1093/jamia/ocaa164pmid:http://www.ncbi.nlm.nih.gov/pubmed/32935131
OpenUrl CrossRef PubMed
↵
1. Altman DG,
2. Simera I,
3. Hoey J, et al
. EQUATOR: reporting guidelines for health research. Lancet 2008;371:1149–50.doi:10.1016/S0140-6736(08)60505-Xpmid:http://www.ncbi.nlm.nih.gov/pubmed/18395566
OpenUrl CrossRef PubMed Web of Science
↵
1. The EQUATOR Network
. Enhancing the quality and transparency of health research. Available: https://www.equator-network.org [Accessed 22 Mar 2021].
↵
1. Moher D,
2. Hopewell S,
3. Schulz KF, et al
. Consort 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 2010;340:c869. doi:10.1136/bmj.c869pmid:http://www.ncbi.nlm.nih.gov/pubmed/20332511
OpenUrl FREE Full Text
↵
1. Calvert M,
2. Kyte D,
3. Mercieca-Bebber R, et al
. Guidelines for inclusion of patient-reported outcomes in clinical trial protocols: the SPIRIT-PRO extension. JAMA 2018;319:483–94.doi:10.1001/jama.2017.21903pmid:http://www.ncbi.nlm.nih.gov/pubmed/29411037
OpenUrl CrossRef PubMed
↵
1. Calvert M,
2. Blazeby J,
3. Altman DG, et al
. Reporting of patient-reported outcomes in randomized trials: the CONSORT pro extension. JAMA 2013;309:814–22.doi:10.1001/jama.2013.879pmid:http://www.ncbi.nlm.nih.gov/pubmed/23443445
OpenUrl CrossRef PubMed Web of Science
↵
1. Liu X,
2. Cruz Rivera S,
3. Moher D, et al
. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Lancet Digit Health 2020;2:e537–48.doi:10.1016/S2589-7500(20)30218-1pmid:http://www.ncbi.nlm.nih.gov/pubmed/33328048
OpenUrl PubMed
↵
1. Liu X,
2. Rivera SC,
3. Moher D
. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. BMJ 2020;370:m3164.
OpenUrl Abstract/FREE Full Text
↵
1. Liu X,
2. Cruz Rivera S,
3. Moher D, et al
. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med 2020;26:1364–74.doi:10.1038/s41591-020-1034-xpmid:http://www.ncbi.nlm.nih.gov/pubmed/32908283
OpenUrl CrossRef PubMed
↵
1. Bossuyt PM,
2. Reitsma JB,
3. Bruns DE, et al
. Stard 2015: an updated list of essential items for reporting diagnostic accuracy studies. Radiology 2015;277:826–32.doi:10.1148/radiol.2015151516pmid:http://www.ncbi.nlm.nih.gov/pubmed/26509226
OpenUrl CrossRef PubMed
↵
1. Cruz Rivera S,
2. Liu X,
3. Chan A-W, et al
. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Lancet Digit Health 2020;2:e549–60.doi:10.1016/S2589-7500(20)30219-3pmid:http://www.ncbi.nlm.nih.gov/pubmed/33328049
OpenUrl PubMed
↵
1. Cruz Rivera S,
2. Liu X,
3. Chan A-W, et al
. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat Med 2020;26:1351–63.doi:10.1038/s41591-020-1037-7pmid:http://www.ncbi.nlm.nih.gov/pubmed/32908284
OpenUrl PubMed
↵
1. Rivera SC,
2. Liu X,
3. Chan A-W, et al
. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. BMJ 2020;370:m3210.doi:10.1136/bmj.m3210pmid:http://www.ncbi.nlm.nih.gov/pubmed/32907797
OpenUrl Abstract/FREE Full Text
↵
1. Chan A-W,
2. Tetzlaff JM,
3. Gøtzsche PC, et al
. Spirit 2013 explanation and elaboration: guidance for protocols of clinical trials. BMJ 2013;346:e7586. doi:10.1136/bmj.e7586pmid:http://www.ncbi.nlm.nih.gov/pubmed/23303884
OpenUrl Abstract/FREE Full Text
↵
1. Hoffmann TC,
2. Glasziou PP,
3. Boutron I, et al
. Better reporting of interventions: template for intervention description and replication (TIDieR) checklist and guide. BMJ 2014;348:g1687. doi:10.1136/bmj.g1687pmid:http://www.ncbi.nlm.nih.gov/pubmed/24609605
OpenUrl Abstract/FREE Full Text
↵
1. Sounderajah V,
2. Ashrafian H,
3. Aggarwal R, et al
. Developing specific reporting guidelines for diagnostic accuracy studies assessing AI interventions: the STARD-AI steering group. Nat Med 2020;26:807–8.doi:10.1038/s41591-020-0941-1pmid:http://www.ncbi.nlm.nih.gov/pubmed/32514173
OpenUrl CrossRef PubMed
↵
1. Collins GS,
2. Reitsma JB,
3. Altman DG, et al
. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015;350:g7594. doi:10.1136/bmj.g7594pmid:http://www.ncbi.nlm.nih.gov/pubmed/25569120
OpenUrl CrossRef PubMed
↵
1. Collins GS,
2. Moons KGM
. Reporting of artificial intelligence prediction models. Lancet 2019;393:1577–9.doi:10.1016/S0140-6736(19)30037-6pmid:http://www.ncbi.nlm.nih.gov/pubmed/31007185
OpenUrl CrossRef PubMed
↵
1. Andaur Navarro CL,
2. Damen JAAG,
3. Takada T, et al
. Protocol for a systematic review on the methodological and reporting quality of prediction model studies using machine learning techniques. BMJ Open 2020;10:e038832. doi:10.1136/bmjopen-2020-038832
↵
1. Moons KGM,
2. Wolff RF,
3. Riley RD, et al
. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med 2019;170:W1. doi:10.7326/M18-1377pmid:http://www.ncbi.nlm.nih.gov/pubmed/30596876
OpenUrl CrossRef PubMed
↵
1. DECIDE-AI Steering Group
. DECIDE-AI: new reporting guidelines to bridge the development-to-implementation gap in clinical artificial intelligence. Nat Med 2021;27:186–7.doi:10.1038/s41591-021-01229-5pmid:http://www.ncbi.nlm.nih.gov/pubmed/33526932
OpenUrl PubMed
↵
1. Albahri OS,
2. Zaidan AA,
3. Albahri AS, et al
. Systematic review of artificial intelligence techniques in the detection and classification of COVID-19 medical images in terms of evaluation and benchmarking: taxonomy analysis, challenges, future solutions and methodological aspects. J Infect Public Health 2020;13:1381–96.doi:10.1016/j.jiph.2020.06.028pmid:http://www.ncbi.nlm.nih.gov/pubmed/32646771
OpenUrl PubMed
↵
1. Li WT,
2. Ma J,
3. Shende N, et al
. Using machine learning of clinical data to diagnose COVID-19: a systematic review and meta-analysis. BMC Med Inform Decis Mak 2020;20:247. doi:10.1186/s12911-020-01266-zpmid:http://www.ncbi.nlm.nih.gov/pubmed/32993652
OpenUrl PubMed
↵
1. Syeda HB,
2. Syed M,
3. Sexton KW, et al
. Role of machine learning techniques to tackle the COVID-19 crisis: systematic review. JMIR Med Inform 2021;9:e23811. doi:10.2196/23811pmid:http://www.ncbi.nlm.nih.gov/pubmed/33326405
OpenUrl CrossRef PubMed
↵
1. Liberati A,
2. Altman DG,
3. Tetzlaff J, et al
. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ 2009;339:b2700. doi:10.1136/bmj.b2700pmid:http://www.ncbi.nlm.nih.gov/pubmed/19622552
OpenUrl Abstract/FREE Full Text
↵
1. McInnes MDF,
2. Moher D,
3. Thombs BD, et al
. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement. JAMA 2018;319:388–96.doi:10.1001/jama.2017.19163pmid:http://www.ncbi.nlm.nih.gov/pubmed/29362800
OpenUrl CrossRef PubMed
↵
1. Moons KGM,
2. de Groot JAH,
3. Bouwmeester W, et al
. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the charms checklist. PLoS Med 2014;11:e1001744. doi:10.1371/journal.pmed.1001744pmid:http://www.ncbi.nlm.nih.gov/pubmed/25314315
OpenUrl CrossRef PubMed
↵
1. Palazón‐Bru A,
2. Martín‐Pérez F,
3. Mares‐García E, et al
. A general presentation on how to carry out a CHARMS analysis for prognostic multivariate models. Stat Med 2020;39:3207–25.doi:10.1002/sim.8660pmid:http://www.ncbi.nlm.nih.gov/pubmed/32583899
OpenUrl PubMed
↵
1. Mongan J,
2. Moy L,
3. Kahn CE
. Checklist for artificial intelligence in medical imaging (claim): a guide for authors and reviewers. Radiol Artif Intell 2020;2:e200029. doi:10.1148/ryai.2020200029pmid:http://www.ncbi.nlm.nih.gov/pubmed/33937821
OpenUrl PubMed
↵
1. Norgeot B,
2. Quer G,
3. Beaulieu-Jones BK, et al
. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist. Nat Med 2020;26:1320–4.doi:10.1038/s41591-020-1041-ypmid:http://www.ncbi.nlm.nih.gov/pubmed/32908275
OpenUrl CrossRef PubMed
↵
1. Hernandez-Boussard T,
2. Bozkurt S,
3. Ioannidis JPA, et al
. MINIMAR (minimum information for medical AI reporting): developing reporting standards for artificial intelligence in health care. J Am Med Inform Assoc 2020;27:2011–5.doi:10.1093/jamia/ocaa088pmid:http://www.ncbi.nlm.nih.gov/pubmed/32594179
OpenUrl CrossRef PubMed
↵
1. Luo W,
2. Phung D,
3. Tran T, et al
. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res 2016;18:e323. doi:10.2196/jmir.5870pmid:http://www.ncbi.nlm.nih.gov/pubmed/27986644
OpenUrl CrossRef PubMed
↵
1. Meskó B,
2. Görög M
. A short guide for medical professionals in the era of artificial intelligence. NPJ Digit Med 2020;3:126.doi:10.1038/s41746-020-00333-zpmid:http://www.ncbi.nlm.nih.gov/pubmed/33043150
OpenUrl PubMed
↵
1. Kocak B,
2. Kus EA,
3. Kilickesmez O
. How to read and review papers on machine learning and artificial intelligence in radiology: a survival guide to key methodological concepts. Eur Radiol 2021;31:1819–30.doi:10.1007/s00330-020-07324-4pmid:http://www.ncbi.nlm.nih.gov/pubmed/33006018
OpenUrl PubMed
↵
1. Faes L,
2. Liu X,
3. Wagner SK, et al
. A clinician's guide to artificial intelligence: how to critically appraise machine learning studies. Transl Vis Sci Technol 2020;9:7.doi:10.1167/tvst.9.2.7pmid:http://www.ncbi.nlm.nih.gov/pubmed/32704413
OpenUrl PubMed
↵
1. Do S,
2. Song KD,
3. Chung JW
. Basics of deep learning: a radiologist's guide to understanding published radiology articles on deep learning. Korean J Radiol 2020;21:33–41.doi:10.3348/kjr.2019.0312pmid:http://www.ncbi.nlm.nih.gov/pubmed/31920027
OpenUrl CrossRef PubMed
↵
1. Handelman GS,
2. Kok HK,
3. Chandra RV, et al
. Peering into the black box of artificial intelligence: evaluation metrics of machine learning methods. AJR Am J Roentgenol 2019;212:38–43.doi:10.2214/AJR.18.20224pmid:http://www.ncbi.nlm.nih.gov/pubmed/30332290
OpenUrl CrossRef PubMed
↵
1. McCradden MD,
2. Joshi S,
3. Mazwi M, et al
. Ethical limitations of algorithmic fairness solutions in health care machine learning. Lancet Digit Health 2020;2:e221–3.doi:10.1016/S2589-7500(20)30065-0pmid:http://www.ncbi.nlm.nih.gov/pubmed/33328054
OpenUrl PubMed
↵
1. Sullivan SM,
2. Wells G,
3. Coyle D
. What guidance are economists given on how to present economic evaluations for policymakers? A systematic review. Value Health 2015;18:915–24.doi:10.1016/j.jval.2015.06.007pmid:http://www.ncbi.nlm.nih.gov/pubmed/26409620
OpenUrl PubMed
↵
1. Husereau D,
2. Drummond M,
3. Petrou S, et al
. Consolidated Health Economic Evaluation Reporting Standards (CHEERS)--explanation and elaboration: a report of the ISPOR Health Economic Evaluation Publication Guidelines Good Reporting Practices Task Force. Value Health 2013;16:231–50.doi:10.1016/j.jval.2013.02.002pmid:http://www.ncbi.nlm.nih.gov/pubmed/23538175
OpenUrl CrossRef PubMed Web of Science
↵
1. Husereau D,
2. Drummond M,
3. Petrou S, et al
. Consolidated health economic evaluation reporting standards (cheers) statement. Value Health 2013;16:e1–5.doi:10.1016/j.jval.2013.02.010pmid:http://www.ncbi.nlm.nih.gov/pubmed/23538200
OpenUrl CrossRef PubMed
↵
1. Sharma D,
2. Aggarwal AK,
3. Downey LE, et al
. National healthcare economic evaluation guidelines: a Cross-Country comparison. Pharmacoecon Open 2021;5:349–64.doi:10.1007/s41669-020-00250-7pmid:http://www.ncbi.nlm.nih.gov/pubmed/33423205
OpenUrl PubMed

Footnotes

Funding OJA is funded by a National Institute for Health Research (NIHR) Career Development Fellowship (NIHR-CDF-2017-10-037). SS, OJA and NJS receive funding from the Great Ormond Street Children’s Charity and the Great Ormond Street Hospital NIHR Biomedical Research Centre. AD receives funding from Health Data Research UK. An initiative funded by UK Research and Innovation, Department of Health and Social Care (England) and the devolved administrations and leading medical research charities.
Disclaimer The funding source(s) did not have any direct involvement in the methodology, design or write-up of this review article.
Competing interests None declared.
Patient and public involvement statement Not required
Provenance and peer review Not commissioned; externally peer reviewed.

[1] ↵
More than machines. Nat Mach Intell 2019;1.doi:10.1038/s42256-018-0014-z

[2] ↵
The Lancet Digital Health
. A digital (r)evolution: introducing The Lancet Digital Health. Lancet Digit Health 2019;1:e1. doi:10.1016/S2589-7500(19)30010-Xpmid:33323234
OpenUrl PubMed

[3] The Lancet Digital Health

[4] ↵
Kahn CE,
Charles E,
Kahn J
. Artificial intelligence, real radiology. Radiol Artif Intell 2019;1:e184001. doi:10.1148/ryai.2019184001pmid:http://www.ncbi.nlm.nih.gov/pubmed/33937786
OpenUrl PubMed

[5] Kahn CE,

[6] Charles E,

[7] Kahn J

[8] ↵
Moher D
. Reporting guidelines: doing better for readers. BMC Med 2018;16:233. doi:10.1186/s12916-018-1226-0pmid:http://www.ncbi.nlm.nih.gov/pubmed/30545364
OpenUrl PubMed

[9] Moher D

[10] ↵
Bluemke DA,
Moy L,
Bredella MA, et al
. Assessing radiology research on artificial intelligence: a brief guide for authors, reviewers, and readers-from the radiology editorial board. Radiology 2020;294:487–9.doi:10.1148/radiol.2019192515pmid:http://www.ncbi.nlm.nih.gov/pubmed/31891322
OpenUrl PubMed

[11] Bluemke DA,

[12] Moy L,

[13] Bredella MA, et al

[14] ↵
CONSORT-AI and SPIRIT-AI Steering Group
. Reporting guidelines for clinical trials evaluating artificial intelligence interventions are needed. Nat Med 2019;25:1467–8.doi:10.1038/s41591-019-0603-3pmid:http://www.ncbi.nlm.nih.gov/pubmed/31551578
OpenUrl CrossRef PubMed

[15] CONSORT-AI and SPIRIT-AI Steering Group

[16] ↵
Liu X,
Faes L,
Kale AU, et al
. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health 2019;1:e271–97.doi:10.1016/S2589-7500(19)30123-2pmid:http://www.ncbi.nlm.nih.gov/pubmed/33323251
OpenUrl CrossRef PubMed

[17] Liu X,

[18] Faes L,

[19] Kale AU, et al

[20] ↵
Nagendran M,
Chen Y,
Lovejoy CA, et al
. Artificial intelligence versus clinicians: systematic review of design, reporting Standards, and claims of deep learning studies. BMJ 2020;368:m689.doi:10.1136/bmj.m689pmid:http://www.ncbi.nlm.nih.gov/pubmed/32213531
OpenUrl Abstract/FREE Full Text

[21] Nagendran M,

[22] Chen Y,

[23] Lovejoy CA, et al

[24] ↵
Kim DW,
Jang HY,
Kim KW, et al
. Design characteristics of studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of medical images: results from recently published papers. Korean J Radiol 2019;20:405–10.doi:10.3348/kjr.2019.0025pmid:http://www.ncbi.nlm.nih.gov/pubmed/30799571
OpenUrl CrossRef PubMed

[25] Kim DW,

[26] Jang HY,

[27] Kim KW, et al

[28] ↵
Yusuf M,
Atal I,
Li J, et al
. Reporting quality of studies using machine learning models for medical diagnosis: a systematic review. BMJ Open 2020;10:e034568. doi:10.1136/bmjopen-2019-034568pmid:http://www.ncbi.nlm.nih.gov/pubmed/32205374
OpenUrl Abstract/FREE Full Text

[29] Yusuf M,

[30] Atal I,

[31] Li J, et al

[32] ↵
Bozkurt S,
Cahan EM,
Seneviratne MG, et al
. Reporting of demographic data and representativeness in machine learning models using electronic health records. J Am Med Inform Assoc 2020;27:1878–84.doi:10.1093/jamia/ocaa164pmid:http://www.ncbi.nlm.nih.gov/pubmed/32935131
OpenUrl CrossRef PubMed

[33] Bozkurt S,

[34] Cahan EM,

[35] Seneviratne MG, et al

[36] ↵
Altman DG,
Simera I,
Hoey J, et al
. EQUATOR: reporting guidelines for health research. Lancet 2008;371:1149–50.doi:10.1016/S0140-6736(08)60505-Xpmid:http://www.ncbi.nlm.nih.gov/pubmed/18395566
OpenUrl CrossRef PubMed Web of Science

[37] Altman DG,

[38] Simera I,

[39] Hoey J, et al

[40] ↵
The EQUATOR Network
. Enhancing the quality and transparency of health research. Available: https://www.equator-network.org [Accessed 22 Mar 2021].

[41] The EQUATOR Network

[42] ↵
Moher D,
Hopewell S,
Schulz KF, et al
. Consort 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 2010;340:c869. doi:10.1136/bmj.c869pmid:http://www.ncbi.nlm.nih.gov/pubmed/20332511
OpenUrl FREE Full Text

[43] Moher D,

[44] Hopewell S,

[45] Schulz KF, et al

[46] ↵
Calvert M,
Kyte D,
Mercieca-Bebber R, et al
. Guidelines for inclusion of patient-reported outcomes in clinical trial protocols: the SPIRIT-PRO extension. JAMA 2018;319:483–94.doi:10.1001/jama.2017.21903pmid:http://www.ncbi.nlm.nih.gov/pubmed/29411037
OpenUrl CrossRef PubMed

[47] Calvert M,

[48] Kyte D,

[49] Mercieca-Bebber R, et al

[50] ↵
Calvert M,
Blazeby J,
Altman DG, et al
. Reporting of patient-reported outcomes in randomized trials: the CONSORT pro extension. JAMA 2013;309:814–22.doi:10.1001/jama.2013.879pmid:http://www.ncbi.nlm.nih.gov/pubmed/23443445
OpenUrl CrossRef PubMed Web of Science

[51] Calvert M,

[52] Blazeby J,

[53] Altman DG, et al

[54] ↵
Liu X,
Cruz Rivera S,
Moher D, et al
. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Lancet Digit Health 2020;2:e537–48.doi:10.1016/S2589-7500(20)30218-1pmid:http://www.ncbi.nlm.nih.gov/pubmed/33328048
OpenUrl PubMed

[55] Liu X,

[56] Cruz Rivera S,

[57] Moher D, et al

[58] ↵
Liu X,
Rivera SC,
Moher D
. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. BMJ 2020;370:m3164.
OpenUrl Abstract/FREE Full Text

[59] Liu X,

[60] Rivera SC,

[61] Moher D

[62] ↵
Liu X,
Cruz Rivera S,
Moher D, et al
. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med 2020;26:1364–74.doi:10.1038/s41591-020-1034-xpmid:http://www.ncbi.nlm.nih.gov/pubmed/32908283
OpenUrl CrossRef PubMed

[63] Liu X,

[64] Cruz Rivera S,

[65] Moher D, et al

[66] ↵
Bossuyt PM,
Reitsma JB,
Bruns DE, et al
. Stard 2015: an updated list of essential items for reporting diagnostic accuracy studies. Radiology 2015;277:826–32.doi:10.1148/radiol.2015151516pmid:http://www.ncbi.nlm.nih.gov/pubmed/26509226
OpenUrl CrossRef PubMed

[67] Bossuyt PM,

[68] Reitsma JB,

[69] Bruns DE, et al

[70] ↵
Cruz Rivera S,
Liu X,
Chan A-W, et al
. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Lancet Digit Health 2020;2:e549–60.doi:10.1016/S2589-7500(20)30219-3pmid:http://www.ncbi.nlm.nih.gov/pubmed/33328049
OpenUrl PubMed

[71] Cruz Rivera S,

[72] Liu X,

[73] Chan A-W, et al

[74] ↵
Cruz Rivera S,
Liu X,
Chan A-W, et al
. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat Med 2020;26:1351–63.doi:10.1038/s41591-020-1037-7pmid:http://www.ncbi.nlm.nih.gov/pubmed/32908284
OpenUrl PubMed

[75] Cruz Rivera S,

[76] Liu X,

[77] Chan A-W, et al

[78] ↵
Rivera SC,
Liu X,
Chan A-W, et al
. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. BMJ 2020;370:m3210.doi:10.1136/bmj.m3210pmid:http://www.ncbi.nlm.nih.gov/pubmed/32907797
OpenUrl Abstract/FREE Full Text

[79] Rivera SC,

[80] Liu X,

[81] Chan A-W, et al

[82] ↵
Chan A-W,
Tetzlaff JM,
Gøtzsche PC, et al
. Spirit 2013 explanation and elaboration: guidance for protocols of clinical trials. BMJ 2013;346:e7586. doi:10.1136/bmj.e7586pmid:http://www.ncbi.nlm.nih.gov/pubmed/23303884
OpenUrl Abstract/FREE Full Text

[83] Chan A-W,

[84] Tetzlaff JM,

[85] Gøtzsche PC, et al

[86] ↵
Hoffmann TC,
Glasziou PP,
Boutron I, et al
. Better reporting of interventions: template for intervention description and replication (TIDieR) checklist and guide. BMJ 2014;348:g1687. doi:10.1136/bmj.g1687pmid:http://www.ncbi.nlm.nih.gov/pubmed/24609605
OpenUrl Abstract/FREE Full Text

[87] Hoffmann TC,

[88] Glasziou PP,

[89] Boutron I, et al

[90] ↵
Sounderajah V,
Ashrafian H,
Aggarwal R, et al
. Developing specific reporting guidelines for diagnostic accuracy studies assessing AI interventions: the STARD-AI steering group. Nat Med 2020;26:807–8.doi:10.1038/s41591-020-0941-1pmid:http://www.ncbi.nlm.nih.gov/pubmed/32514173
OpenUrl CrossRef PubMed

[91] Sounderajah V,

[92] Ashrafian H,

[93] Aggarwal R, et al

[94] ↵
Collins GS,
Reitsma JB,
Altman DG, et al
. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015;350:g7594. doi:10.1136/bmj.g7594pmid:http://www.ncbi.nlm.nih.gov/pubmed/25569120
OpenUrl CrossRef PubMed

[95] Collins GS,

[96] Reitsma JB,

[97] Altman DG, et al

[98] ↵
Collins GS,
Moons KGM
. Reporting of artificial intelligence prediction models. Lancet 2019;393:1577–9.doi:10.1016/S0140-6736(19)30037-6pmid:http://www.ncbi.nlm.nih.gov/pubmed/31007185
OpenUrl CrossRef PubMed

[99] Collins GS,

[100] Moons KGM

[101] ↵
Andaur Navarro CL,
Damen JAAG,
Takada T, et al
. Protocol for a systematic review on the methodological and reporting quality of prediction model studies using machine learning techniques. BMJ Open 2020;10:e038832. doi:10.1136/bmjopen-2020-038832

[102] Andaur Navarro CL,

[103] Damen JAAG,

[104] Takada T, et al

[105] ↵
Moons KGM,
Wolff RF,
Riley RD, et al
. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med 2019;170:W1. doi:10.7326/M18-1377pmid:http://www.ncbi.nlm.nih.gov/pubmed/30596876
OpenUrl CrossRef PubMed

[106] Moons KGM,

[107] Wolff RF,

[108] Riley RD, et al

[109] ↵
DECIDE-AI Steering Group
. DECIDE-AI: new reporting guidelines to bridge the development-to-implementation gap in clinical artificial intelligence. Nat Med 2021;27:186–7.doi:10.1038/s41591-021-01229-5pmid:http://www.ncbi.nlm.nih.gov/pubmed/33526932
OpenUrl PubMed

[110] DECIDE-AI Steering Group

[111] ↵
Albahri OS,
Zaidan AA,
Albahri AS, et al
. Systematic review of artificial intelligence techniques in the detection and classification of COVID-19 medical images in terms of evaluation and benchmarking: taxonomy analysis, challenges, future solutions and methodological aspects. J Infect Public Health 2020;13:1381–96.doi:10.1016/j.jiph.2020.06.028pmid:http://www.ncbi.nlm.nih.gov/pubmed/32646771
OpenUrl PubMed

[112] Albahri OS,

[113] Zaidan AA,

[114] Albahri AS, et al

[115] ↵
Li WT,
Ma J,
Shende N, et al
. Using machine learning of clinical data to diagnose COVID-19: a systematic review and meta-analysis. BMC Med Inform Decis Mak 2020;20:247. doi:10.1186/s12911-020-01266-zpmid:http://www.ncbi.nlm.nih.gov/pubmed/32993652
OpenUrl PubMed

[116] Li WT,

[117] Ma J,

[118] Shende N, et al

[119] ↵
Syeda HB,
Syed M,
Sexton KW, et al
. Role of machine learning techniques to tackle the COVID-19 crisis: systematic review. JMIR Med Inform 2021;9:e23811. doi:10.2196/23811pmid:http://www.ncbi.nlm.nih.gov/pubmed/33326405
OpenUrl CrossRef PubMed

[120] Syeda HB,

[121] Syed M,

[122] Sexton KW, et al

[123] ↵
Liberati A,
Altman DG,
Tetzlaff J, et al
. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ 2009;339:b2700. doi:10.1136/bmj.b2700pmid:http://www.ncbi.nlm.nih.gov/pubmed/19622552
OpenUrl Abstract/FREE Full Text

[124] Liberati A,

[125] Altman DG,

[126] Tetzlaff J, et al

[127] ↵
McInnes MDF,
Moher D,
Thombs BD, et al
. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement. JAMA 2018;319:388–96.doi:10.1001/jama.2017.19163pmid:http://www.ncbi.nlm.nih.gov/pubmed/29362800
OpenUrl CrossRef PubMed

[128] McInnes MDF,

[129] Moher D,

[130] Thombs BD, et al

[131] ↵
Moons KGM,
de Groot JAH,
Bouwmeester W, et al
. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the charms checklist. PLoS Med 2014;11:e1001744. doi:10.1371/journal.pmed.1001744pmid:http://www.ncbi.nlm.nih.gov/pubmed/25314315
OpenUrl CrossRef PubMed

[132] Moons KGM,

[133] de Groot JAH,

[134] Bouwmeester W, et al

[135] ↵
Palazón‐Bru A,
Martín‐Pérez F,
Mares‐García E, et al
. A general presentation on how to carry out a CHARMS analysis for prognostic multivariate models. Stat Med 2020;39:3207–25.doi:10.1002/sim.8660pmid:http://www.ncbi.nlm.nih.gov/pubmed/32583899
OpenUrl PubMed

[136] Palazón‐Bru A,

[137] Martín‐Pérez F,

[138] Mares‐García E, et al

[139] ↵
Mongan J,
Moy L,
Kahn CE
. Checklist for artificial intelligence in medical imaging (claim): a guide for authors and reviewers. Radiol Artif Intell 2020;2:e200029. doi:10.1148/ryai.2020200029pmid:http://www.ncbi.nlm.nih.gov/pubmed/33937821
OpenUrl PubMed

[140] Mongan J,

[141] Moy L,

[142] Kahn CE

[143] ↵
Norgeot B,
Quer G,
Beaulieu-Jones BK, et al
. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist. Nat Med 2020;26:1320–4.doi:10.1038/s41591-020-1041-ypmid:http://www.ncbi.nlm.nih.gov/pubmed/32908275
OpenUrl CrossRef PubMed

[144] Norgeot B,

[145] Quer G,

[146] Beaulieu-Jones BK, et al

[147] ↵
Hernandez-Boussard T,
Bozkurt S,
Ioannidis JPA, et al
. MINIMAR (minimum information for medical AI reporting): developing reporting standards for artificial intelligence in health care. J Am Med Inform Assoc 2020;27:2011–5.doi:10.1093/jamia/ocaa088pmid:http://www.ncbi.nlm.nih.gov/pubmed/32594179
OpenUrl CrossRef PubMed

[148] Hernandez-Boussard T,

[149] Bozkurt S,

[150] Ioannidis JPA, et al

[151] ↵
Luo W,
Phung D,
Tran T, et al
. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res 2016;18:e323. doi:10.2196/jmir.5870pmid:http://www.ncbi.nlm.nih.gov/pubmed/27986644
OpenUrl CrossRef PubMed

[152] Luo W,

[153] Phung D,

[154] Tran T, et al

[155] ↵
Meskó B,
Görög M
. A short guide for medical professionals in the era of artificial intelligence. NPJ Digit Med 2020;3:126.doi:10.1038/s41746-020-00333-zpmid:http://www.ncbi.nlm.nih.gov/pubmed/33043150
OpenUrl PubMed

[156] Meskó B,

[157] Görög M

[158] ↵
Kocak B,
Kus EA,
Kilickesmez O
. How to read and review papers on machine learning and artificial intelligence in radiology: a survival guide to key methodological concepts. Eur Radiol 2021;31:1819–30.doi:10.1007/s00330-020-07324-4pmid:http://www.ncbi.nlm.nih.gov/pubmed/33006018
OpenUrl PubMed

[159] Kocak B,

[160] Kus EA,

[161] Kilickesmez O

[162] ↵
Faes L,
Liu X,
Wagner SK, et al
. A clinician's guide to artificial intelligence: how to critically appraise machine learning studies. Transl Vis Sci Technol 2020;9:7.doi:10.1167/tvst.9.2.7pmid:http://www.ncbi.nlm.nih.gov/pubmed/32704413
OpenUrl PubMed

[163] Faes L,

[164] Liu X,

[165] Wagner SK, et al

[166] ↵
Do S,
Song KD,
Chung JW
. Basics of deep learning: a radiologist's guide to understanding published radiology articles on deep learning. Korean J Radiol 2020;21:33–41.doi:10.3348/kjr.2019.0312pmid:http://www.ncbi.nlm.nih.gov/pubmed/31920027
OpenUrl CrossRef PubMed

[167] Do S,

[168] Song KD,

[169] Chung JW

[170] ↵
Handelman GS,
Kok HK,
Chandra RV, et al
. Peering into the black box of artificial intelligence: evaluation metrics of machine learning methods. AJR Am J Roentgenol 2019;212:38–43.doi:10.2214/AJR.18.20224pmid:http://www.ncbi.nlm.nih.gov/pubmed/30332290
OpenUrl CrossRef PubMed

[171] Handelman GS,

[172] Kok HK,

[173] Chandra RV, et al

[174] ↵
McCradden MD,
Joshi S,
Mazwi M, et al
. Ethical limitations of algorithmic fairness solutions in health care machine learning. Lancet Digit Health 2020;2:e221–3.doi:10.1016/S2589-7500(20)30065-0pmid:http://www.ncbi.nlm.nih.gov/pubmed/33328054
OpenUrl PubMed

[175] McCradden MD,

[176] Joshi S,

[177] Mazwi M, et al

[178] ↵
Sullivan SM,
Wells G,
Coyle D
. What guidance are economists given on how to present economic evaluations for policymakers? A systematic review. Value Health 2015;18:915–24.doi:10.1016/j.jval.2015.06.007pmid:http://www.ncbi.nlm.nih.gov/pubmed/26409620
OpenUrl PubMed

[179] Sullivan SM,

[180] Wells G,

[181] Coyle D

[182] ↵
Husereau D,
Drummond M,
Petrou S, et al
. Consolidated Health Economic Evaluation Reporting Standards (CHEERS)--explanation and elaboration: a report of the ISPOR Health Economic Evaluation Publication Guidelines Good Reporting Practices Task Force. Value Health 2013;16:231–50.doi:10.1016/j.jval.2013.02.002pmid:http://www.ncbi.nlm.nih.gov/pubmed/23538175
OpenUrl CrossRef PubMed Web of Science

[183] Husereau D,

[184] Drummond M,

[185] Petrou S, et al

[186] ↵
Husereau D,
Drummond M,
Petrou S, et al
. Consolidated health economic evaluation reporting standards (cheers) statement. Value Health 2013;16:e1–5.doi:10.1016/j.jval.2013.02.010pmid:http://www.ncbi.nlm.nih.gov/pubmed/23538200
OpenUrl CrossRef PubMed

[187] Husereau D,

[188] Drummond M,

[189] Petrou S, et al

[190] ↵
Sharma D,
Aggarwal AK,
Downey LE, et al
. National healthcare economic evaluation guidelines: a Cross-Country comparison. Pharmacoecon Open 2021;5:349–64.doi:10.1007/s41669-020-00250-7pmid:http://www.ncbi.nlm.nih.gov/pubmed/33423205
OpenUrl PubMed

[191] Sharma D,

[192] Aggarwal AK,

[193] Downey LE, et al

Main menu

Abstract

Data availability statement

Statistics from Altmetric.com

Request Permissions

Introduction

Types of research reporting guidelines

EQUATOR network guidelines

Clinical trials protocols

Clinical trials reports

Diagnostic accuracy studies

Prediction models

Human factors

Systematic reviews

Other (NON-EQUATOR network) guidelines

Further reading

Conclusion

Data availability statement

Ethics statements

Ethics approval

References

Footnotes

Log in using your username and password

Main menu

Log in using your username and password

You are here

Abstract

Data availability statement

Statistics from Altmetric.com

Request Permissions

Introduction

Types of research reporting guidelines

EQUATOR network guidelines

Clinical trials protocols

Clinical trials reports

Diagnostic accuracy studies

Prediction models

Human factors

Systematic reviews

Other (NON-EQUATOR network) guidelines

Further reading

Conclusion

Data availability statement

Ethics statements

Patient consent for publication

Ethics approval

References

Footnotes

Read the full text or download the PDF:

Log in using your username and password