## In this notebook, we will explore various ML algorithms

1. Random forest
2. Naive Bayes
3. Logistic Regression
4. Support Vector Machines
5. Gradient Boosted Classifiers

## Converting text data into forms that ML models can read

Before we utitlize machine learning models for prediction, we need to convert the text data into a form that ML models can read.
Here are some forms we will explore:

1. Bag of Words representation
2. TF-IDF representation
3. Word2Vec representation of text
4. GloVe representation of text

In [None]:
%pip install gensim
%pip install scipy==1.12.0
%pip install xgboost

Collecting scipy==1.12.0
  Downloading scipy-1.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m38.4/38.4 MB[0m [31m17.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: scipy
  Attempting uninstall: scipy
    Found existing installation: scipy 1.11.4
    Uninstalling scipy-1.11.4:
      Successfully uninstalled scipy-1.11.4
Successfully installed scipy-1.12.0


In [1]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB, GaussianNB
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from nltk.data import find
import gensim
import nltk
from gensim.models import KeyedVectors, Word2Vec
from xgboost import XGBClassifier
from sklearn.preprocessing import MinMaxScaler

Reading the data

In [2]:
train_df = pd.read_csv('ML_train.csv') # data/ML/ML_train.csv
test_df = pd.read_csv('ML_test.csv') # data/ML/ML_test.csv

In [None]:
train_df.head()

Unnamed: 0,text,humor
0,watch swimmer disappear winter storm jonas,False
1,laughed reagan trump idea outlast political stage,False
2,hey cold go corner 90 degress,True
3,cant get standing desk almost good,False
4,wanna hear joke penis never mind long,True


In [None]:
test_df.head()

Unnamed: 0,text,humor
0,thought reddit joke today triangle rectangle f...,True
1,much pirate pay corn buck ear,True
2,hillary clinton sent book every gop candidatee...,False
3,italian union lambast new museum bos working hard,False
4,life ocean surface wholly depends live,False


In [3]:
X_train = train_df['text']
y_train = train_df['humor']
X_test = test_df['text']
y_test = test_df['humor']

Bag of Words

In [None]:
bow_vectorizer = CountVectorizer()
bow_vectorizer.fit(X_train)

bow_X_train = bow_vectorizer.transform(X_train)
bow_X_test = bow_vectorizer.transform(X_test)

TF-IDF

In [None]:
# ngram_range=(1, 3), min_df=2, max_df=0.85
tfidf_vectorizer = TfidfVectorizer()
tfidf_vectorizer.fit(X_train)

tfidf_X_train = tfidf_vectorizer.transform(X_train)
tfidf_X_test = tfidf_vectorizer.transform(X_test)

Word2Vec: We will use a pretrained model

In [4]:
import gensim.downloader as api
# path = api.load("word2vec-google-news-300", return_path = True)
import json
info = api.info()
for model_name, model_data in sorted(info['models'].items()):
    print(
        '%s (%d records): %s' % (
            model_name,
            model_data.get('num_records', -1),
            model_data['description'],
        )
    )

__testing_word2vec-matrix-synopsis (-1 records): [THIS IS ONLY FOR TESTING] Word vecrors of the movie matrix.
conceptnet-numberbatch-17-06-300 (1917247 records): ConceptNet Numberbatch consists of state-of-the-art semantic vectors (also known as word embeddings) that can be used directly as a representation of word meanings or as a starting point for further machine learning. ConceptNet Numberbatch is part of the ConceptNet open data project. ConceptNet provides lots of ways to compute with word meanings, one of which is word embeddings. ConceptNet Numberbatch is a snapshot of just the word embeddings. It is built using an ensemble that combines data from ConceptNet, word2vec, GloVe, and OpenSubtitles 2016, using a variation on retrofitting.
fasttext-wiki-news-subwords-300 (999999 records): 1 million word vectors trained on Wikipedia 2017, UMBC webbase corpus and statmt.org news dataset (16B tokens).
glove-twitter-100 (1193514 records): Pre-trained vectors based on  2B tweets, 27B toke

In [5]:
path = api.load('word2vec-google-news-300', return_path=True)



In [6]:
model = KeyedVectors.load_word2vec_format(path, binary=True)

In [7]:
# as each piece of text has a different length, we need to use a function to average the
# vector representation of each word in the vector, so that every vector will have the same length

def sent_vec(sent, model):
    vector_size = model.vector_size
    wv_res = np.zeros(vector_size)
    ctr = 1
    for w in sent:
      if w in model:
        ctr += 1
        wv_res += model[w]
    wv_res = wv_res / ctr
    return wv_res

In [8]:
# since we cleaned our text data previously, we only need to convert the text in each row into lists, where each element is a token

def split_text(text):
  return text.split()

In [9]:
train_df['tokens'] = train_df['text'].apply(split_text)
train_df.head()

Unnamed: 0,text,humor,tokens
0,watch swimmer disappear winter storm jonas,False,"[watch, swimmer, disappear, winter, storm, jonas]"
1,laughed reagan trump idea outlast political stage,False,"[laughed, reagan, trump, idea, outlast, politi..."
2,hey cold go corner 90 degress,True,"[hey, cold, go, corner, 90, degress]"
3,cant get standing desk almost good,False,"[cant, get, standing, desk, almost, good]"
4,wanna hear joke penis never mind long,True,"[wanna, hear, joke, penis, never, mind, long]"


In [10]:
test_df['tokens'] = test_df['text'].apply(split_text)
test_df.head()

Unnamed: 0,text,humor,tokens
0,thought reddit joke today triangle rectangle f...,True,"[thought, reddit, joke, today, triangle, recta..."
1,much pirate pay corn buck ear,True,"[much, pirate, pay, corn, buck, ear]"
2,hillary clinton sent book every gop candidatee...,False,"[hillary, clinton, sent, book, every, gop, can..."
3,italian union lambast new museum bos working hard,False,"[italian, union, lambast, new, museum, bos, wo..."
4,life ocean surface wholly depends live,False,"[life, ocean, surface, wholly, depends, live]"


In [11]:
train_df['w2v'] = train_df['tokens'].apply(lambda x: sent_vec(x, model))
train_df.head()

Unnamed: 0,text,humor,tokens,w2v
0,watch swimmer disappear winter storm jonas,False,"[watch, swimmer, disappear, winter, storm, jonas]","[0.012259347098214286, 0.12681361607142858, -0..."
1,laughed reagan trump idea outlast political stage,False,"[laughed, reagan, trump, idea, outlast, politi...","[0.00337982177734375, 0.07329559326171875, 0.0..."
2,hey cold go corner 90 degress,True,"[hey, cold, go, corner, 90, degress]","[-0.021931966145833332, -0.022908528645833332,..."
3,cant get standing desk almost good,False,"[cant, get, standing, desk, almost, good]","[0.064453125, 0.008039202008928572, -0.0647212..."
4,wanna hear joke penis never mind long,True,"[wanna, hear, joke, penis, never, mind, long]","[0.0291748046875, 0.032379150390625, 0.0023498..."


In [12]:
test_df['w2v'] = test_df['tokens'].apply(lambda x: sent_vec(x, model))
test_df.head()

Unnamed: 0,text,humor,tokens,w2v
0,thought reddit joke today triangle rectangle f...,True,"[thought, reddit, joke, today, triangle, recta...","[-0.00616455078125, 0.013763427734375, 0.07757..."
1,much pirate pay corn buck ear,True,"[much, pirate, pay, corn, buck, ear]","[0.06841169084821429, -0.0003574916294642857, ..."
2,hillary clinton sent book every gop candidatee...,False,"[hillary, clinton, sent, book, every, gop, can...","[0.016082763671875, -0.0219268798828125, 0.052..."
3,italian union lambast new museum bos working hard,False,"[italian, union, lambast, new, museum, bos, wo...","[0.011237250434027778, -0.003567165798611111, ..."
4,life ocean surface wholly depends live,False,"[life, ocean, surface, wholly, depends, live]","[0.0007498604910714286, 0.004481724330357143, ..."


In [13]:
w2v_X_train = train_df['w2v'].to_list()
w2v_X_test = test_df['w2v'].to_list()

GloVe: We will use a pretrained model as well

In [18]:
model = api.load('glove-twitter-50')



In [19]:
train_df['glove'] = train_df['tokens'].apply(lambda x: sent_vec(x, model))
train_df.head()

Unnamed: 0,text,humor,tokens,w2v,glove
0,watch swimmer disappear winter storm jonas,False,"[watch, swimmer, disappear, winter, storm, jonas]","[0.012259347098214286, 0.12681361607142858, -0...","[-0.3556693465049778, -0.009269003357206072, 0..."
1,laughed reagan trump idea outlast political stage,False,"[laughed, reagan, trump, idea, outlast, politi...","[0.00337982177734375, 0.07329559326171875, 0.0...","[0.21184250432997942, 0.5106900054961443, 0.52..."
2,hey cold go corner 90 degress,True,"[hey, cold, go, corner, 90, degress]","[-0.021931966145833332, -0.022908528645833332,...","[0.025682834287484486, -0.12756834427515665, 0..."
3,cant get standing desk almost good,False,"[cant, get, standing, desk, almost, good]","[0.064453125, 0.008039202008928572, -0.0647212...","[0.06572856647627694, 0.16731114579098566, 0.4..."
4,wanna hear joke penis never mind long,True,"[wanna, hear, joke, penis, never, mind, long]","[0.0291748046875, 0.032379150390625, 0.0023498...","[0.10481374897062778, 0.23063786217244342, 0.0..."


In [20]:
test_df['glove'] = test_df['tokens'].apply(lambda x: sent_vec(x, model))
test_df.head()

Unnamed: 0,text,humor,tokens,w2v,glove
0,thought reddit joke today triangle rectangle f...,True,"[thought, reddit, joke, today, triangle, recta...","[-0.00616455078125, 0.013763427734375, 0.07757...","[0.5678675062954426, 0.03985125944018364, 0.04..."
1,much pirate pay corn buck ear,True,"[much, pirate, pay, corn, buck, ear]","[0.06841169084821429, -0.0003574916294642857, ...","[-0.16855413998876298, -0.073170006275177, -0...."
2,hillary clinton sent book every gop candidatee...,False,"[hillary, clinton, sent, book, every, gop, can...","[0.016082763671875, -0.0219268798828125, 0.052...","[0.3847669509705156, 0.23221537447534502, 0.38..."
3,italian union lambast new museum bos working hard,False,"[italian, union, lambast, new, museum, bos, wo...","[0.011237250434027778, -0.003567165798611111, ...","[0.20736749470233917, 0.29444137308746576, 0.1..."
4,life ocean surface wholly depends live,False,"[life, ocean, surface, wholly, depends, live]","[0.0007498604910714286, 0.004481724330357143, ...","[-0.31856086158326696, -0.21392584671931608, -..."


In [21]:
glove_X_train = train_df['glove'].to_list()
glove_X_test = test_df['glove'].to_list()

In [22]:
glove_X_train[0]

array([-0.35566935, -0.009269  ,  0.08900571, -0.27721143, -0.40477286,
        0.063286  ,  0.73867287,  0.1427943 ,  0.07588857, -0.3484257 ,
        0.28370175, -0.09037186, -2.72174289,  0.39477429, -0.17517   ,
        0.22776143, -0.19473571,  0.34294714,  0.08298514,  0.01084857,
        0.29838984, -0.07146714,  0.23767201, -0.17200172,  0.18784872,
        0.42515572, -0.29825604,  0.35699901, -0.38086286, -0.03017143,
        0.10801815, -0.122653  ,  0.24594   ,  0.02470657,  0.08025772,
       -0.15680642,  0.16635428, -0.10197428,  0.04339858,  0.27099   ,
       -0.29980144, -0.15298671,  0.16641286,  0.28875857, -0.09712286,
        0.32549314,  0.01627629,  0.03342057, -0.26915658,  0.55722173])

# Function to test the models

In [14]:
def train_and_eval(model, trainX, trainY, testX, testY):

    # training the model
    fitted_model = model.fit(trainX, trainY)

    # getting predictions
    y_preds_train = fitted_model.predict(trainX)
    y_preds_test = fitted_model.predict(testX)

    # evaluating the model
    print()
    print(model)
    print(f"Train accuracy score : {accuracy_score(trainY, y_preds_train)}")
    print(f"Test accuracy score : {accuracy_score(testY, y_preds_test)}")
    print(classification_report(testY, y_preds_test))
    print('\n',40*'-')

# Multinomial Naive Bayes

Multinomial Naive Bayes with BoW

In [None]:
nb_model = MultinomialNB()
train_and_eval(nb_model, bow_X_train, y_train, bow_X_test, y_test)


MultinomialNB()
Train accuracy score : 0.9181125
Test accuracy score : 0.901575
              precision    recall  f1-score   support

       False       0.91      0.89      0.90     20000
        True       0.89      0.91      0.90     20000

    accuracy                           0.90     40000
   macro avg       0.90      0.90      0.90     40000
weighted avg       0.90      0.90      0.90     40000


 ----------------------------------------


Multinomial Naive Bayes with TF-IDF

In [None]:
nb_model = MultinomialNB()
train_and_eval(nb_model, tfidf_X_train, y_train, tfidf_X_test, y_test)


MultinomialNB()
Train accuracy score : 0.9185125
Test accuracy score : 0.899375
              precision    recall  f1-score   support

       False       0.91      0.89      0.90     20000
        True       0.89      0.91      0.90     20000

    accuracy                           0.90     40000
   macro avg       0.90      0.90      0.90     40000
weighted avg       0.90      0.90      0.90     40000


 ----------------------------------------


Multinomial Naive Bayes with Word2Vec

In [15]:
nb_model = MultinomialNB()
scaler = MinMaxScaler()

# transform the inputs because Multinomial Naive Bayes does not take in negative inputs
scaler.fit(w2v_X_train)
nb_w2v_train = scaler.transform(w2v_X_train)
nb_w2v_test = scaler.transform(w2v_X_test)

train_and_eval(nb_model, nb_w2v_train, y_train, nb_w2v_test, y_test)


MultinomialNB()
Train accuracy score : 0.7801375
Test accuracy score : 0.77595
              precision    recall  f1-score   support

       False       0.78      0.77      0.77     20000
        True       0.77      0.78      0.78     20000

    accuracy                           0.78     40000
   macro avg       0.78      0.78      0.78     40000
weighted avg       0.78      0.78      0.78     40000


 ----------------------------------------


Multinomial Naive Bayes with GloVe

In [23]:
nb_model = MultinomialNB()
scaler = MinMaxScaler()

scaler.fit(glove_X_train)
nb_glove_train = scaler.transform(glove_X_train)
nb_glove_test = scaler.transform(glove_X_test)

train_and_eval(nb_model, nb_glove_train, y_train, nb_glove_test, y_test)


MultinomialNB()
Train accuracy score : 0.7880375
Test accuracy score : 0.78685
              precision    recall  f1-score   support

       False       0.80      0.77      0.78     20000
        True       0.78      0.81      0.79     20000

    accuracy                           0.79     40000
   macro avg       0.79      0.79      0.79     40000
weighted avg       0.79      0.79      0.79     40000


 ----------------------------------------


BoW works best for the Multinomial Naive Bayes model, giving an accuracy of 90.2%

# Logistic Regression

Logistic Regression with BoW

In [None]:
log_model = LogisticRegression(random_state=42)
train_and_eval(log_model, bow_X_train, y_train, bow_X_test, y_test)


LogisticRegression(random_state=42)
Train accuracy score : 0.93783125
Test accuracy score : 0.90545
              precision    recall  f1-score   support

       False       0.91      0.90      0.91     20000
        True       0.90      0.91      0.91     20000

    accuracy                           0.91     40000
   macro avg       0.91      0.91      0.91     40000
weighted avg       0.91      0.91      0.91     40000


 ----------------------------------------


Logistic Regression with TF-IDF

In [None]:
log_model = LogisticRegression(random_state=42)
train_and_eval(log_model, tfidf_X_train, y_train, tfidf_X_test, y_test)


LogisticRegression(random_state=42)
Train accuracy score : 0.91860625
Test accuracy score : 0.9008
              precision    recall  f1-score   support

       False       0.90      0.90      0.90     20000
        True       0.90      0.90      0.90     20000

    accuracy                           0.90     40000
   macro avg       0.90      0.90      0.90     40000
weighted avg       0.90      0.90      0.90     40000


 ----------------------------------------


Logistic Regression with Word2Vec

In [16]:
log_model = LogisticRegression(random_state=42)
train_and_eval(log_model, w2v_X_train, y_train, w2v_X_test, y_test)


LogisticRegression(random_state=42)
Train accuracy score : 0.84119375
Test accuracy score : 0.8408
              precision    recall  f1-score   support

       False       0.84      0.84      0.84     20000
        True       0.84      0.84      0.84     20000

    accuracy                           0.84     40000
   macro avg       0.84      0.84      0.84     40000
weighted avg       0.84      0.84      0.84     40000


 ----------------------------------------


Logistic Regression with GloVe

In [25]:
log_model = LogisticRegression(random_state=42)
train_and_eval(log_model, glove_X_train, y_train, glove_X_test, y_test)


LogisticRegression(random_state=42)
Train accuracy score : 0.83088125
Test accuracy score : 0.829875
              precision    recall  f1-score   support

       False       0.84      0.82      0.83     20000
        True       0.82      0.84      0.83     20000

    accuracy                           0.83     40000
   macro avg       0.83      0.83      0.83     40000
weighted avg       0.83      0.83      0.83     40000


 ----------------------------------------


Logistic Regression works best with BoW giving an accuracy of 90.5%

# Random Forest Classifier

Random Forest with BoW

In [None]:
clf = RandomForestClassifier(random_state=42)
train_and_eval(clf, bow_X_train, y_train, bow_X_test, y_test)


RandomForestClassifier(random_state=42)
Train accuracy score : 0.99999375
Test accuracy score : 0.87755
              precision    recall  f1-score   support

       False       0.89      0.86      0.88     20000
        True       0.86      0.90      0.88     20000

    accuracy                           0.88     40000
   macro avg       0.88      0.88      0.88     40000
weighted avg       0.88      0.88      0.88     40000


 ----------------------------------------


Random Forest with TF-IDF

In [None]:
clf = RandomForestClassifier(random_state=42)
train_and_eval(clf, tfidf_X_train, y_train, tfidf_X_test, y_test)


RandomForestClassifier(random_state=42)
Train accuracy score : 0.99999375
Test accuracy score : 0.874625
              precision    recall  f1-score   support

       False       0.89      0.85      0.87     20000
        True       0.86      0.90      0.88     20000

    accuracy                           0.87     40000
   macro avg       0.88      0.87      0.87     40000
weighted avg       0.88      0.87      0.87     40000


 ----------------------------------------


Random Forest with Word2Vec

In [24]:
clf = RandomForestClassifier(random_state=42)
train_and_eval(clf, w2v_X_train, y_train, w2v_X_test, y_test)


RandomForestClassifier(random_state=42)
Train accuracy score : 0.99999375
Test accuracy score : 0.830675
              precision    recall  f1-score   support

       False       0.83      0.84      0.83     20000
        True       0.83      0.82      0.83     20000

    accuracy                           0.83     40000
   macro avg       0.83      0.83      0.83     40000
weighted avg       0.83      0.83      0.83     40000


 ----------------------------------------


Random Forest with GloVe

In [None]:
clf = RandomForestClassifier(random_state=42)
train_and_eval(clf,glove_X_train, y_train, glove_X_test, y_test)


RandomForestClassifier(random_state=42)
Train accuracy score : 0.999975
Test accuracy score : 0.8408
              precision    recall  f1-score   support

       False       0.84      0.84      0.84     20000
        True       0.84      0.85      0.84     20000

    accuracy                           0.84     40000
   macro avg       0.84      0.84      0.84     40000
weighted avg       0.84      0.84      0.84     40000


 ----------------------------------------


Random Forest works best with BoW, giving an accuracy of 87.7%

# Gradient Boosted Classifier

XGB classifier with BoW

In [None]:
xgb = XGBClassifier(random_state=42)
train_and_eval(xgb, bow_X_train, y_train, bow_X_test, y_test)


XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, device=None, early_stopping_rounds=None,
              enable_categorical=False, eval_metric=None, feature_types=None,
              gamma=None, grow_policy=None, importance_type=None,
              interaction_constraints=None, learning_rate=None, max_bin=None,
              max_cat_threshold=None, max_cat_to_onehot=None,
              max_delta_step=None, max_depth=None, max_leaves=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              multi_strategy=None, n_estimators=None, n_jobs=None,
              num_parallel_tree=None, random_state=42, ...)
Train accuracy score : 0.836325
Test accuracy score : 0.8283
              precision    recall  f1-score   support

       False       0.80      0.88      0.84     20000
        True       0.87      0.78      0.82     20000

    accuracy      

XGB classifier with TF_IDF

In [None]:
xgb = XGBClassifier(random_state=42)
train_and_eval(xgb, tfidf_X_train, y_train, tfidf_X_test, y_test)


XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, device=None, early_stopping_rounds=None,
              enable_categorical=False, eval_metric=None, feature_types=None,
              gamma=None, grow_policy=None, importance_type=None,
              interaction_constraints=None, learning_rate=None, max_bin=None,
              max_cat_threshold=None, max_cat_to_onehot=None,
              max_delta_step=None, max_depth=None, max_leaves=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              multi_strategy=None, n_estimators=None, n_jobs=None,
              num_parallel_tree=None, random_state=42, ...)
Train accuracy score : 0.8391625
Test accuracy score : 0.8272
              precision    recall  f1-score   support

       False       0.79      0.88      0.84     20000
        True       0.87      0.77      0.82     20000

    accuracy     

XGB classifier with Word2Vec

In [17]:
xgb = XGBClassifier(random_state=42)
train_and_eval(xgb, w2v_X_train, y_train, w2v_X_test, y_test)


XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, device=None, early_stopping_rounds=None,
              enable_categorical=False, eval_metric=None, feature_types=None,
              gamma=None, grow_policy=None, importance_type=None,
              interaction_constraints=None, learning_rate=None, max_bin=None,
              max_cat_threshold=None, max_cat_to_onehot=None,
              max_delta_step=None, max_depth=None, max_leaves=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              multi_strategy=None, n_estimators=None, n_jobs=None,
              num_parallel_tree=None, random_state=42, ...)
Train accuracy score : 0.9145625
Test accuracy score : 0.854375
              precision    recall  f1-score   support

       False       0.86      0.85      0.85     20000
        True       0.85      0.86      0.85     20000

    accuracy   

XGB classifier with GloVe

In [None]:
xgb = XGBClassifier(random_state=42)
train_and_eval(xgb, glove_X_train, y_train, glove_X_test, y_test)


XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, device=None, early_stopping_rounds=None,
              enable_categorical=False, eval_metric=None, feature_types=None,
              gamma=None, grow_policy=None, importance_type=None,
              interaction_constraints=None, learning_rate=None, max_bin=None,
              max_cat_threshold=None, max_cat_to_onehot=None,
              max_delta_step=None, max_depth=None, max_leaves=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              multi_strategy=None, n_estimators=None, n_jobs=None,
              num_parallel_tree=None, random_state=42, ...)
Train accuracy score : 0.89088125
Test accuracy score : 0.84885
              precision    recall  f1-score   support

       False       0.85      0.84      0.85     20000
        True       0.84      0.86      0.85     20000

    accuracy   

XGB classifier works best with Word2Vec, giving an accuracy of 85.4%