Table 1

Text vectoriser classifiers hyperparameters for each text vectorisation model

Text vectoriser hyperparameters
TF-IDFn-gram range: 1–3; max document frequency: 1.0; min document frequency count: 1
Word2VecFeatures size: 2000; window size: 3; min count: 1; training algorithm: CBOW; training epochs: 20
Doc2VecFeatures size: 2000; window size: 3; min count: 1; training algorithm: distributed memory; training epochs: 20
Classifier hyperparameters
SVCKernel: type: RBF, inverse regularisation coefficient: 1.0
KNNTF-IDFNumber of neighbours: 3, leaf size: 10
Word2VecNumber of neighbours: 10, leaf size: 10
Doc2VecNumber of neighbours: 3, leaf size: 10
RFTF-IDFNumber of estimators: 50, max tree depth: 10
Word2VecNumber of estimators: 50, max tree depth: 5
Doc2VecNumber of estimators: 50, max tree depth: 5
  • KNN, K-nearest neighbours; RBF, radial basis function; RF, random forest; SVC, support vector classification; TF-IDF, term frequency-inverse document frequency.