Top 10 Reviewer Critiques of Radiology Artificial Intelligence (AI) Articles: Qualitative Thematic Analysis of Reviewer Critiques of Machine Learning/Deep Learning Manuscripts Submitted to JMRI

J Magn Reson Imaging. 2020 Jul;52(1):248-254. doi: 10.1002/jmri.27035. Epub 2020 Jan 13.

Abstract

Background: Classical machine learning (ML) and deep learning (DL) articles have rapidly captured the attention of the radiology research community and comprise an increasing proportion of articles submitted to JMRI, of variable reporting and methodological quality.

Purpose: To identify the most frequent reviewer critiques of classical ML and DL articles submitted to JMRI.

Study type: Qualitative thematic analysis.

Population: In all, 1396 manuscript journal articles submitted to JMRI for consideration in 2018, with thematic analysis performed of reviewer critiques of 38 artificial intelligence (AI) articles, comprised of 24 ML and 14 DL articles, from January 9, 2018 to June 2, 2018.

Field strength/sequence: N/A.

Assessment: After identifying and sampling ML and DL articles, and collecting all reviews, qualitative thematic analysis was performed to identify major and minor themes of reviewer critiques.

Statistical tests: Descriptive statistics provided of article characteristics, and thematic review of major and minor themes.

Results: Thirty-eight articles were sampled for thematic review: 24 (63.2%) focused on classical ML and 14 (36.8%) on DL. The overall acceptance rate of classical ML/DL articles was 28.9%, similar to the overall 2017-2019 acceptance rate of 23.1-28.1%. These articles resulted in 72 reviews analyzed, yielding a total 713 critiques that underwent formal thematic analysis consensus encoding. Ten major themes of critiques were identified, with 1-Lack of Information as the most frequent, comprising 268 (37.6%) of all critiques. Frequent minor themes of critiques concerning ML/DL-specific recommendations included performing basic clinical statistics such as to ensure similarity of training and test groups (N = 26), emphasizing strong clinical Gold Standards for the basis of training labels (N = 19), and ensuring strong radiological relevance of the topic and task performed (N = 16).

Data conclusion: Standardized reporting of ML and DL methods could help address nearly one-third of all reviewer critiques made.

Level of evidence: 4 Technical Efficacy Stage: 1 J. Magn. Reson. Imaging 2020;52:248-254.

Keywords: artificial intelligence; machine learning; thematic analysis.

MeSH terms

  • Artificial Intelligence
  • Deep Learning*
  • Machine Learning
  • Radiography
  • Radiology*