Table 6

F1 scores when patients’ transcribed speech is excluded

Model	Including GP and patient speech	Only GP speech
Naïve Bayes (multilabel) ICPC-2	0.323 (0.268, 0.362)	0.372 (0.3, 0.417)
Naïve Bayes (multiclass) ICPC-2	0.512 (0.462, 0.549)	0.484 (0.429, 0.521)
Nearest centroid ICPC-2	0.444 (0.384, 0.489)	0.425 (0.361, 0.47)
BERT conventional, CKS	0.550 (0.494, 0.593)	0.445 (0.384, 0.465)
BERT NSP, CKS	0.462 (0.424, 0.488)	0.436 (0.398, 0.464)
BERT MLM, CKS	0.567 (0.512, 0.604)	0.500 (0.434, 0.539)

The classifiers were trained using their most effective distant supervision source and evaluated on the OIAM training set (repurposed as a validation set). Bold indicates best performance in a comparison between including and excluding patients’ speech with the same classifier.
CKS, Clinical Knowledge Summaries; GP, general practitioner; ICPC-2, International Classification of Primary Care-2; MLM, masked language modelling; NSP, next sentence prediction; OIAM, One in a Million.