Table 4

Performance metrics in the studies used supervised learning (sentiment analysis and text classification). SVM and NB were the preferred classifier as it produced better results demonstrated by the F1 score. Only five studies reported multiple fold validation

Authork-fold cross-validationSentiment analysisText classification
ClassifierPerformanceClassifierPerformance
Alemi et al34*†Five repetitions of twofold cross-validationSVMPositive 0.89
Negative 0.64
SVMStaff related 0.85
Doctor listens 0.34
NBPositive 0.94
Negative 0.68
NBStaff related 0.80
Doctor listens 0.37
Doing-Harris et al24*NRNB0.84NBExplanation 0.74
Friendliness 0.40
Greaves et al27Single-fold cross-validationNB
SVM
0.89
0.84
NB
SVM
Dignity and respect 0.85
Cleanliness 0.84
Dignity and respect 0.8
Cleanliness 0.84
Hawkins et al5210-fold cross-validationSVM0.89‡
Jimenez-Zafra et al5410-fold cross-validationSVMCOPOD 0.86
COPOS 0.71
Huppertz et al6NRSVM0.87‡
Wagland et al48Single-fold cross-validation
10-fold cross-validation
SVM0.80
SVM0.83
Bahja et al26Single-fold cross-validation
4-fold cross-validation
SVM
NB
0.84
0.78
SVM
NB
0.81
0.78
  • *Best and worst performing category, respectively.

  • †Classified as praise (positive), complaint (negative).

  • ‡Reported as overall accuracy.

  • COPOD, corpus of patient opinions in Dutch; COPOS, corpus of patient opinions in Spanish; NB, Naïve Bayes; NR, not reported; SVM, support vector machine.