Table 9

Model performance by task type

Outcome typeMean validation performance—reported from studyMean validation performance—NYU dataMean performance difference between study validation performance and NYU original validation performanceMean validation performance—NYU retrainedMean performance difference between study validation performance and NYU retrained validation performance
Predicting deterioration (n=4)Mean AUROC=0.82
(n=4; 0.75–0.88)
Mean AUROC=0.64
(n=3; 0.59–0.74)
Mean AUROC difference=0.18 (n=3; 0.14–0.26)Mean AUROC=0.72
(n=4; 0.68–0.77)
Mean AUROC difference=0.10 (n=4; 0.01–0.17)
Predicting mortality
(n=5)
Mean AUROC=0.93
(n=2; 0.88–0.98)
Mean AUROC=0.72
(n=2; 0.67–0.79)
Mean AUROC difference=0.31 (n=1)Mean AUROC=0.77
(n=5; 0.74–0.93)
Mean AUROC difference=0.15 (n=2; 0.09–0.21)
Predicting either deterioration or mortality
(n=3)
Mean AUROC=0.73
(n=3; 0.72–0.74)
Mean AUROC=0.72
(n=3; 0.71–0.73)
Mean AUROC difference=0.01 (n=2; −0.01–0.02)
  • –=Value unavailable because authors did not provide feature weights when reporting model development.

  • AUROC, area under the receiver–operator curve; NYU, New York University.