Criteria for the CLAIM checklist for diagnostic accuracy studies using AI
Section | Item no | STARD 2015 item | Amendment | CLAIM item |
Title and abstract | ||||
Title | 1 | Identification as a study of diagnostic accuracy using at least one measure of accuracy (such as sensitivity, specificity, predictive values or AUC). | Elaboration | Identification as a study of AI methodology, specifying the category of technology used (eg, deep learning). |
Abstract | 2 | Structured summary of study design, methods, results and conclusions. | Same | |
Introduction | ||||
Background | 3 | Scientific and clinical background, including the intended use and clinical role of the index test. | Elaboration | Scientific and clinical background, including the intended use and clinical role of the AI approach. |
Objectives | 4 | Study objectives and hypotheses. | Same | |
Methods | ||||
Study design | 5 | Whether data collection was planned before the index test and reference standard were performed (prospective study) or after (retrospective study). | Same | |
Extension | Study goal, such as model creation, exploratory study, feasibility study, non-inferiority trial. | |||
Participants | 6 | Eligibility criteria (inclusion/exclusion). | Extension | State data sources. |
7 | On what basis potentially eligible participants were identified (such as symptoms, results from previous tests, inclusion in registry). | Same | ||
8 | Where and when potentially eligible participants were identified (setting, location and dates). | |||
9 | Whether participants formed a consecutive, random or convenience series. | Extension | Data preprocessing steps. | |
Extension | Selection of data subsets, if applicable. | |||
Extension | Definitions of data elements, with references to common data elements. | |||
Extension | Deidentification methods. | |||
Test methods | 10b | Reference standard, in sufficient detail to allow replication. | Elaboration | Definition of ‘ground truth’ (ie, reference standard), in sufficient detail to allow replication. |
Elaboration | Source of ground truth annotations; qualifications and preparation of annotators. | |||
Elaboration | Annotation tools. | |||
11 | Rationale for choosing the reference standard (if alternatives exist). | Same | ||
12b | Definition of and rationale for test positivity cut-offs or result categories of the reference standard, distinguishing prespecified from exploratory. | Elaboration | Measurement of inter-rater and intrarater variability; methods to mitigate variability and/or resolve discrepancies for ground truth. | |
Model | New | Detailed description of model, including inputs, outputs, all intermediate layers and connections. | ||
New | Software libraries, frameworks, and packages. | |||
New | Initialisation of model parameters (eg, randomisation, transfer learning). | |||
Training | New | Details of training approach, including data augmentation, hyperparameters, number of models trained. | ||
New | Method of selecting the final model. | |||
New | Ensembling techniques, if applicable | |||
Analysis | 14 | Methods for estimating or comparing measures of diagnostic accuracy. | Elaboration | Metrics of model performance. |
16 | How missing data on the index test and reference standard were handled. | Same | ||
17 | Any analyses of variability in diagnostic accuracy, distinguishing prespecified from exploratory. | Elaboration | Statistical measures of significance and uncertainty (eg, CIs). | |
Elaboration | Robustness or sensitivity analysis. | |||
Elaboration | Methods for explainability or interpretability (eg, saliency maps) and how they were validated. | |||
Elaboration | Validation or testing on external data. | |||
18 | Intended sample size and how it was determined. | Same | ||
Extension | How data were assigned to partitions; specify proportions. | |||
Extension | Level at which partitions are disjoint (eg, image, study, patient, institution). | |||
Results | ||||
Participants | 19 | Flow of participants, using a diagram. | Same | |
20 | Baseline demographic and clinical characteristics of participants. | Elaboration | Demographic and clinical characteristics of cases in each partition. | |
Test results | 23 | Cross tabulation of the index test results (or their distribution) by the results of the reference standard. | Elaboration | Performance metrics for optimal model(s) on all data partitions. |
24 | Estimates of diagnostic accuracy and their precision (such as 95% CIs). | Same | ||
25 | Any adverse events from performing the index test or the reference standard. | Elaboration | Failure analysis of incorrectly classified cases. | |
Discussion | ||||
Limitations | 26 | Study limitations, including sources of potential bias, statistical uncertainty and generalisability. | Same | |
Implications | 27 | Implications for practice, including the intended use and clinical role of the index test. | Same | |
Other Information | ||||
Registration | 28 | Registration no and name of registry. | Same | |
Protocol | 29 | Where the full study protocol can be accessed. | Same | |
Funding | 30 | Sources of funding and other support; role of funders. | Same |
This is based on the STARD 2015 guidelines,20 demonstrating which aspects are new, the same or elaborated on. Items not included in the CLAIM checklist (which were previously present in the STARD guideline) have been removed. Table adapted from Bossuyt et al20 and Mongan et al.39
AI, artificial intelligence; CLAIM, Checklist for Artificial Intelligence in Medical Imaging; STARD, Standards for Reporting of Diagnostic Accuracy Studies.