Performance metrics in the studies used supervised learning (sentiment analysis and text classification). SVM and NB were the preferred classifier as it produced better results demonstrated by the F1 score. Only five studies reported multiple fold validation
Author | k-fold cross-validation | Sentiment analysis | Text classification | ||
Classifier | Performance | Classifier | Performance | ||
Alemi et al34*† | Five repetitions of twofold cross-validation | SVM | Positive 0.89 Negative 0.64 | SVM | Staff related 0.85 Doctor listens 0.34 |
NB | Positive 0.94 Negative 0.68 | NB | Staff related 0.80 Doctor listens 0.37 | ||
Doing-Harris et al24* | NR | NB | 0.84 | NB | Explanation 0.74 Friendliness 0.40 |
Greaves et al27 | Single-fold cross-validation | NB SVM | 0.89 0.84 | NB SVM | Dignity and respect 0.85 Cleanliness 0.84 Dignity and respect 0.8 Cleanliness 0.84 |
Hawkins et al52 | 10-fold cross-validation | – | – | SVM | 0.89‡ |
Jimenez-Zafra et al54 | 10-fold cross-validation | SVM | COPOD 0.86 COPOS 0.71 | – | – |
Huppertz et al6 | NR | SVM | 0.87‡ | – | – |
Wagland et al48 | Single-fold cross-validation 10-fold cross-validation | SVM | 0.80 | – | – |
SVM | 0.83 | – | – | ||
Bahja et al26 | Single-fold cross-validation 4-fold cross-validation | SVM NB | 0.84 0.78 | – | – |
SVM NB | 0.81 0.78 | – | – |
*Best and worst performing category, respectively.
†Classified as praise (positive), complaint (negative).
‡Reported as overall accuracy.
COPOD, corpus of patient opinions in Dutch; COPOS, corpus of patient opinions in Spanish; NB, Naïve Bayes; NR, not reported; SVM, support vector machine.