Table 4

Criteria for the CLAIM checklist for diagnostic accuracy studies using AI

SectionItem noSTARD 2015 itemAmendmentCLAIM item
Title and abstract
Title1Identification as a study of diagnostic accuracy using at least one measure of accuracy (such as sensitivity, specificity, predictive values or AUC).ElaborationIdentification as a study of AI methodology, specifying the category of technology used (eg, deep learning).
Abstract2Structured summary of study design, methods, results and conclusions.Same
Background3Scientific and clinical background, including the intended use and clinical role of the index test.ElaborationScientific and clinical background, including the intended use and clinical role of the AI approach.
Objectives4Study objectives and hypotheses.Same
Study design5Whether data collection was planned before the index test and reference standard were performed
(prospective study) or after (retrospective study).
ExtensionStudy goal, such as model creation, exploratory study, feasibility study, non-inferiority trial.
Participants6Eligibility criteria (inclusion/exclusion).ExtensionState data sources.
7On what basis potentially eligible participants were identified (such as symptoms, results from previous tests, inclusion in registry).Same
8Where and when potentially eligible participants were identified (setting, location and dates).
9Whether participants formed a consecutive, random or convenience series.ExtensionData preprocessing steps.
ExtensionSelection of data subsets, if applicable.
ExtensionDefinitions of data elements, with references to common data elements.
ExtensionDeidentification methods.
Test methods10bReference standard, in sufficient detail to allow replication.ElaborationDefinition of ‘ground truth’ (ie, reference standard), in sufficient detail to allow replication.
ElaborationSource of ground truth annotations; qualifications and preparation of annotators.
ElaborationAnnotation tools.
11Rationale for choosing the reference standard (if alternatives exist).Same
12bDefinition of and rationale for test positivity cut-offs or result categories of the reference standard, distinguishing prespecified from exploratory.ElaborationMeasurement of inter-rater and intrarater variability; methods to mitigate variability and/or resolve discrepancies for ground truth.
ModelNewDetailed description of model, including inputs, outputs, all intermediate layers and connections.
NewSoftware libraries, frameworks, and packages.
NewInitialisation of model parameters (eg, randomisation, transfer learning).
TrainingNewDetails of training approach, including data augmentation, hyperparameters, number of models trained.
NewMethod of selecting the final model.
NewEnsembling techniques, if applicable
Analysis14Methods for estimating or comparing measures of diagnostic accuracy.ElaborationMetrics of model performance.
16How missing data on the index test and reference standard were handled.Same
17Any analyses of variability in diagnostic accuracy, distinguishing prespecified from exploratory.ElaborationStatistical measures of significance and uncertainty (eg, CIs).
ElaborationRobustness or sensitivity analysis.
ElaborationMethods for explainability or interpretability (eg, saliency maps) and how they were validated.
ElaborationValidation or testing on external data.
18Intended sample size and how it was determined.Same
ExtensionHow data were assigned to partitions; specify proportions.
ExtensionLevel at which partitions are disjoint (eg, image, study, patient, institution).
Participants19Flow of participants, using a diagram.Same
20Baseline demographic and clinical characteristics of participants.ElaborationDemographic and clinical characteristics of cases in each partition.
Test results23Cross tabulation of the index test results (or their distribution) by the results of the reference standard.ElaborationPerformance metrics for optimal model(s) on all data partitions.
24Estimates of diagnostic accuracy and their precision (such as 95% CIs).Same
25Any adverse events from performing the index test or the reference standard.ElaborationFailure analysis of incorrectly classified cases.
Limitations26Study limitations, including sources of potential bias, statistical uncertainty and generalisability.Same
Implications27Implications for practice, including the intended use and clinical role of the index test.Same
Other Information
Registration28Registration no and name of registry.Same
Protocol29Where the full study protocol can be accessed.Same
Funding30Sources of funding and other support; role of funders.Same
  • This is based on the STARD 2015 guidelines,20 demonstrating which aspects are new, the same or elaborated on. Items not included in the CLAIM checklist (which were previously present in the STARD guideline) have been removed. Table adapted from Bossuyt et al20 and Mongan et al.39

  • AI, artificial intelligence; CLAIM, Checklist for Artificial Intelligence in Medical Imaging; STARD, Standards for Reporting of Diagnostic Accuracy Studies.