Experiment 3.1.1—unbalanced training data without feature selection, sex performance disparities
Mean difference averaged over n=100 | Random forest classifier | Logistic regression classifier | Support vector machine | Gaussian Naïve Bayes | ||||
Sex performance disparities (%) | t-test p value | Sex performance disparities (%) | t-test p value | Sex performance disparities (%) | t-test p value | Sex performance disparities (%) | t-test p Value | |
Accuracy | 2.96 | 0.00 | −2.85 | 0.01 | −2.98 | 0.02 | −2.72 | 0.02 |
FScore | 15.63 | 0.00 | 15.86 | 0.00 | 4.14 | 0.00 | 16.19 | 0.00 |
ROC_AUC* | 6.80 | 0.00 | 2.93 | 0.00 | −2.41 | 0.08 | 5.53 | 0.00 |
Precision | 5.25 | 0.00 | −4.87 | 0.00 | 3.41 | 0.00 | −3.13 | 0.05 |
Recall | 21.02 | 0.00 | 24.07 | 0.00 | 2.58 | 0.04 | 19.31 | 0.00 |
False negative rate | −21.02 | 0.00 | −24.07 | 0.00 | −2.58 | 0.08 | −19.31 | 0.00 |
True negative rate | −7.42 | 0.00 | −18.20 | 0.00 | −7.40 | 0.00 | −8.24 | 0.00 |
False positive rate | 7.42 | 0.00 | 18.20 | 0.00 | 7.40 | 0.00 | 8.24 | 0.00 |
True positive rate | 21.02 | 0.00 | 24.07 | 0.00 | 2.58 | 0.04 | 19.31 | 0.00 |
*ROC AUC score is a measure of the separation between classes in a binary classifier, derived from the area under the ROC curve.