## Product Sentiment Data - Learning Rates - 2nd Attempt

Data (public domain): https://data.world/crowdflower/brands-and-product-emotions

Notebook code based on IMDB notebook from bert-sklearn/other_examples

In [1]:
import numpy as np
import pandas as pd
import os
import sys
import csv
import re
from sklearn import metrics
from sklearn.metrics import classification_report
from sklearn.utils import shuffle
from ftfy import fix_text
 
from bert_sklearn import BertClassifier
from bert_sklearn import load_model

print(os.getcwd())

DATAFILE = "./data/judge-cleaned-up.csv"

/Users/joseph.porter/Data/nas2019/NAS2019


In [3]:
# Prep Data

def cleanup(txt):
    return fix_text(txt)
    
converters = {'tweet_text': cleanup}
    
raw_data = pd.read_csv(DATAFILE, converters=converters, encoding='unicode_escape')
raw_data.head(10)

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion
5,@teachntech00 New iPad Apps For #SpeechTherapy...,,No emotion toward brand or product
6,,,No emotion toward brand or product
7,"#SXSW is just starting, #CTIA is around the co...",Android,Positive emotion
8,Beautifully smart and simple idea RT @madebyma...,iPad or iPhone App,Positive emotion
9,Counting down the days to #sxsw plus strong Ca...,Apple,Positive emotion


In [4]:
## Transform columns
## ONLY RUN THIS CELL ONCE!!!

# Add columns to make the labels usable by the model
# tweet_text => text
# Positive / No emotion / Negative => 1, 0, -1
# Product: Apple stuff, Google stuff, NaN => Apple, Google, ''

def clean_text(txt):
    return txt
raw_data.insert(1, "text", np.vectorize(clean_text)(raw_data['tweet_text']))

def create_labels(sentiment):
    if sentiment.startswith('Positive'):
        return 1
    if sentiment.startswith('Negative'):
        return -1
    return 0
raw_data.insert(3, 'label', np.vectorize(create_labels)(raw_data['is_there_an_emotion_directed_at_a_brand_or_product']))

def get_company(product):
    if pd.isnull(product):
        return ''
    if 'iPad' in product or 'iPhone' in product or 'Apple' in product:
        return 'Apple'
    if 'Google' in product or 'Android' in product:
        return 'Google'
    return ''
raw_data.insert(2, 'company', np.vectorize(get_company)(raw_data['emotion_in_tweet_is_directed_at']))
raw_data.head(10)

Unnamed: 0,tweet_text,text,company,emotion_in_tweet_is_directed_at,label,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,Apple,iPhone,-1,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,@jessedee Know about @fludapp ? Awesome iPad/i...,Apple,iPad or iPhone App,1,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,@swonderlin Can not wait for #iPad 2 also. The...,Apple,iPad,1,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,@sxsw I hope this year's festival isn't as cra...,Apple,iPad or iPhone App,-1,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Google,1,Positive emotion
5,@teachntech00 New iPad Apps For #SpeechTherapy...,@teachntech00 New iPad Apps For #SpeechTherapy...,,,0,No emotion toward brand or product
6,,,,,0,No emotion toward brand or product
7,"#SXSW is just starting, #CTIA is around the co...","#SXSW is just starting, #CTIA is around the co...",Google,Android,1,Positive emotion
8,Beautifully smart and simple idea RT @madebyma...,Beautifully smart and simple idea RT @madebyma...,Apple,iPad or iPhone App,1,Positive emotion
9,Counting down the days to #sxsw plus strong Ca...,Counting down the days to #sxsw plus strong Ca...,Apple,Apple,1,Positive emotion


In [5]:
# Last Data Preparation Step
# Clean up characters and pull out columns of interest

def clean(text):
    text = re.sub(r'<.*?>', '', text)
    text = re.sub(r"\"", "", text)       
    return text

data = raw_data.filter(['text', 'company', 'label'], axis=1)
data['text'] = data['text'].transform(clean)

In [6]:
# Split into training and test data

msk = np.random.rand(len(data)) < 0.8
train = data[msk]
test = data[~msk]
print('Training data size: ' + str(train.shape))
print('Test data size: ' + str(test.shape))

Training data size: (7327, 3)
Test data size: (1766, 3)


In [7]:
train[:1].values

array([['.@wesley83 I have a 3G iPhone. After 3 hrs tweeting at #RISE_Austin, it was dead!  I need to upgrade. Plugin stations at #SXSW.',
        'Apple', -1]], dtype=object)

As you can see, each review is much longer than a sentence or two. The Google AI BERT models were trained on sequences of max length 512. Lets look at the performance for max_seq_length equal to  128, 256, and 512.

### max_seq_length = 128

In [8]:
## Set up data for the classifier

train = train.sample(500)
test = test.sample(300)

print("Train data size: %d "%(len(train)))
print("Test data size: %d "%(len(test)))

X_train = train['text']
y_train = train['label']

X_test = test['text']
y_test = test['label']

Train data size: 500 
Test data size: 300 


In [11]:
## Create the model

model = BertClassifier(bert_model='bert-base-uncased', label_list=[-1,0,1])
model.max_seq_length = 128
model.learning_rate = 2e-05
model.train_batch_size = 16
model.epochs = 4

print(model)


Building sklearn text classifier...
BertClassifier(bert_config_json=None, bert_model='bert-base-uncased',
               bert_vocab=None, do_lower_case=None, epochs=4, eval_batch_size=8,
               fp16=False, from_tf=False, gradient_accumulation_steps=1,
               ignore_label=None, label_list=[-1, 0, 1], learning_rate=2e-05,
               local_rank=-1, logfile='bert_sklearn.log', loss_scale=0,
               max_seq_length=128, num_mlp_hiddens=500, num_mlp_layers=0,
               random_state=42, restore_file=None, train_batch_size=16,
               use_cuda=True, validation_fraction=0.1, warmup_proportion=0.1)


In [12]:
%%time
## Train the model using our data (this could take a while)

model.fit(X_train, y_train)

accy = model.score(X_test, y_test)

Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 450, validation data size: 50



Training  :   0%|          | 0/29 [00:00<?, ?it/s][A
Training  :   0%|          | 0/29 [00:14<?, ?it/s, loss=1.26][A
Training  :   3%|▎         | 1/29 [00:14<06:36, 14.15s/it, loss=1.26][A
Training  :   3%|▎         | 1/29 [00:29<06:36, 14.15s/it, loss=1.27][A
Training  :   7%|▋         | 2/29 [00:29<06:30, 14.46s/it, loss=1.27][A
Training  :   7%|▋         | 2/29 [00:43<06:30, 14.46s/it, loss=1.24][A
Training  :  10%|█         | 3/29 [00:43<06:12, 14.31s/it, loss=1.24][A
Training  :  10%|█         | 3/29 [00:56<06:12, 14.31s/it, loss=1.22][A
Training  :  14%|█▍        | 4/29 [00:56<05:49, 13.98s/it, loss=1.22][A
Training  :  14%|█▍        | 4/29 [01:09<05:49, 13.98s/it, loss=1.18][A
Training  :  17%|█▋        | 5/29 [01:09<05:27, 13.67s/it, loss=1.18][A
Training  :  17%|█▋        | 5/29 [01:22<05:27, 13.67s/it, loss=1.16][A
Training  :  21%|██        | 6/29 [01:22<05:12, 13.59s/it, loss=1.16][A
Training  :  21%|██        | 6/29 [01:36<05:12, 13.59s/it, loss=1.16][A
Trai

Epoch 1, Train loss: 0.9549, Val loss: 0.8764, Val accy: 58.00%




Training  :   0%|          | 0/29 [00:00<?, ?it/s][A
Training  :   0%|          | 0/29 [00:14<?, ?it/s, loss=0.888][A
Training  :   3%|▎         | 1/29 [00:14<06:54, 14.79s/it, loss=0.888][A
Training  :   3%|▎         | 1/29 [00:30<06:54, 14.79s/it, loss=0.912][A
Training  :   7%|▋         | 2/29 [00:30<06:43, 14.95s/it, loss=0.912][A
Training  :   7%|▋         | 2/29 [00:43<06:43, 14.95s/it, loss=0.882][A
Training  :  10%|█         | 3/29 [00:43<06:18, 14.54s/it, loss=0.882][A
Training  :  10%|█         | 3/29 [00:57<06:18, 14.54s/it, loss=0.877][A
Training  :  14%|█▍        | 4/29 [00:57<05:58, 14.36s/it, loss=0.877][A
Training  :  14%|█▍        | 4/29 [01:11<05:58, 14.36s/it, loss=0.85] [A
Training  :  17%|█▋        | 5/29 [01:11<05:40, 14.21s/it, loss=0.85][A
Training  :  17%|█▋        | 5/29 [01:25<05:40, 14.21s/it, loss=0.827][A
Training  :  21%|██        | 6/29 [01:25<05:25, 14.13s/it, loss=0.827][A
Training  :  21%|██        | 6/29 [01:39<05:25, 14.13s/it, loss=0

Epoch 2, Train loss: 0.8602, Val loss: 0.8756, Val accy: 58.00%




Training  :   0%|          | 0/29 [00:00<?, ?it/s][A
Training  :   0%|          | 0/29 [00:14<?, ?it/s, loss=0.869][A
Training  :   3%|▎         | 1/29 [00:14<06:33, 14.05s/it, loss=0.869][A
Training  :   3%|▎         | 1/29 [00:27<06:33, 14.05s/it, loss=0.991][A
Training  :   7%|▋         | 2/29 [00:27<06:13, 13.83s/it, loss=0.991][A
Training  :   7%|▋         | 2/29 [00:39<06:13, 13.83s/it, loss=0.966][A
Training  :  10%|█         | 3/29 [00:39<05:49, 13.44s/it, loss=0.966][A
Training  :  10%|█         | 3/29 [00:52<05:49, 13.44s/it, loss=0.926][A
Training  :  14%|█▍        | 4/29 [00:52<05:29, 13.17s/it, loss=0.926][A
Training  :  14%|█▍        | 4/29 [01:04<05:29, 13.17s/it, loss=0.927][A
Training  :  17%|█▋        | 5/29 [01:04<05:10, 12.95s/it, loss=0.927][A
Training  :  17%|█▋        | 5/29 [01:17<05:10, 12.95s/it, loss=0.913][A
Training  :  21%|██        | 6/29 [01:17<04:54, 12.80s/it, loss=0.913][A
Training  :  21%|██        | 6/29 [01:29<04:54, 12.80s/it, loss=

Epoch 3, Train loss: 0.8539, Val loss: 0.8561, Val accy: 58.00%




Training  :   0%|          | 0/29 [00:00<?, ?it/s][A
Training  :   0%|          | 0/29 [00:13<?, ?it/s, loss=0.786][A
Training  :   3%|▎         | 1/29 [00:13<06:14, 13.38s/it, loss=0.786][A
Training  :   3%|▎         | 1/29 [00:26<06:14, 13.38s/it, loss=0.73] [A
Training  :   7%|▋         | 2/29 [00:26<05:55, 13.17s/it, loss=0.73][A
Training  :   7%|▋         | 2/29 [00:38<05:55, 13.17s/it, loss=0.693][A
Training  :  10%|█         | 3/29 [00:38<05:37, 12.98s/it, loss=0.693][A
Training  :  10%|█         | 3/29 [00:51<05:37, 12.98s/it, loss=0.723][A
Training  :  14%|█▍        | 4/29 [00:51<05:22, 12.89s/it, loss=0.723][A
Training  :  14%|█▍        | 4/29 [01:03<05:22, 12.89s/it, loss=0.738][A
Training  :  17%|█▋        | 5/29 [01:03<05:07, 12.82s/it, loss=0.738][A
Training  :  17%|█▋        | 5/29 [01:16<05:07, 12.82s/it, loss=0.738][A
Training  :  21%|██        | 6/29 [01:16<04:53, 12.78s/it, loss=0.738][A
Training  :  21%|██        | 6/29 [01:29<04:53, 12.78s/it, loss=0

Epoch 4, Train loss: 0.7902, Val loss: 0.8211, Val accy: 58.00%




Testing:   0%|          | 0/38 [00:00<?, ?it/s][A
Testing:   3%|▎         | 1/38 [00:01<00:58,  1.58s/it][A
Testing:   5%|▌         | 2/38 [00:03<00:56,  1.56s/it][A
Testing:   8%|▊         | 3/38 [00:04<00:54,  1.56s/it][A
Testing:  11%|█         | 4/38 [00:06<00:53,  1.58s/it][A
Testing:  13%|█▎        | 5/38 [00:07<00:52,  1.58s/it][A
Testing:  16%|█▌        | 6/38 [00:09<00:50,  1.57s/it][A
Testing:  18%|█▊        | 7/38 [00:10<00:48,  1.56s/it][A
Testing:  21%|██        | 8/38 [00:12<00:46,  1.54s/it][A
Testing:  24%|██▎       | 9/38 [00:14<00:45,  1.57s/it][A
Testing:  26%|██▋       | 10/38 [00:15<00:44,  1.60s/it][A
Testing:  29%|██▉       | 11/38 [00:17<00:42,  1.58s/it][A
Testing:  32%|███▏      | 12/38 [00:18<00:40,  1.56s/it][A
Testing:  34%|███▍      | 13/38 [00:20<00:38,  1.55s/it][A
Testing:  37%|███▋      | 14/38 [00:21<00:36,  1.54s/it][A
Testing:  39%|███▉      | 15/38 [00:23<00:35,  1.54s/it][A
Testing:  42%|████▏     | 16/38 [00:24<00:33,  1.53s/it]


Loss: 0.8150, Accuracy: 60.67%
CPU times: user 1h 3min 50s, sys: 5min 39s, total: 1h 9min 30s
Wall time: 26min 26s





In [17]:
y_pred = model.predict(X_test)
report = classification_report(y_pred, y_test, labels=[-1,0,1])
print(report)


Predicting:   0%|          | 0/38 [00:00<?, ?it/s][A
Predicting:   3%|▎         | 1/38 [00:02<01:30,  2.45s/it][A
Predicting:   5%|▌         | 2/38 [00:04<01:19,  2.21s/it][A
Predicting:   8%|▊         | 3/38 [00:05<01:11,  2.05s/it][A
Predicting:  11%|█         | 4/38 [00:07<01:06,  1.96s/it][A
Predicting:  13%|█▎        | 5/38 [00:09<01:01,  1.87s/it][A
Predicting:  16%|█▌        | 6/38 [00:10<00:58,  1.82s/it][A
Predicting:  18%|█▊        | 7/38 [00:12<00:55,  1.79s/it][A
Predicting:  21%|██        | 8/38 [00:14<00:53,  1.77s/it][A
Predicting:  24%|██▎       | 9/38 [00:15<00:50,  1.74s/it][A
Predicting:  26%|██▋       | 10/38 [00:17<00:48,  1.74s/it][A
Predicting:  29%|██▉       | 11/38 [00:19<00:46,  1.73s/it][A
Predicting:  32%|███▏      | 12/38 [00:21<00:44,  1.73s/it][A
Predicting:  34%|███▍      | 13/38 [00:22<00:42,  1.72s/it][A
Predicting:  37%|███▋      | 14/38 [00:24<00:41,  1.72s/it][A
Predicting:  39%|███▉      | 15/38 [00:26<00:39,  1.73s/it][A
Predictin

              precision    recall  f1-score   support

          -1       0.00      0.00      0.00         0
           0       1.00      0.61      0.76       300
           1       0.00      0.00      0.00         0

    accuracy                           0.61       300
   macro avg       0.33      0.20      0.25       300
weighted avg       1.00      0.61      0.76       300




  'recall', 'true', average, warn_for)


## Increase the Learning Rate 10X (to speed things up)

In [13]:
## Create the model

model2 = BertClassifier(bert_model='bert-base-uncased', label_list=[-1,0,1])
model2.max_seq_length = 128
model2.learning_rate = 2e-04
model2.train_batch_size = 16
model2.epochs = 4

print(model2)

Building sklearn text classifier...
BertClassifier(bert_config_json=None, bert_model='bert-base-uncased',
               bert_vocab=None, do_lower_case=None, epochs=4, eval_batch_size=8,
               fp16=False, from_tf=False, gradient_accumulation_steps=1,
               ignore_label=None, label_list=[-1, 0, 1], learning_rate=0.0002,
               local_rank=-1, logfile='bert_sklearn.log', loss_scale=0,
               max_seq_length=128, num_mlp_hiddens=500, num_mlp_layers=0,
               random_state=42, restore_file=None, train_batch_size=16,
               use_cuda=True, validation_fraction=0.1, warmup_proportion=0.1)


In [14]:
%%time
## Train the model using our data (this could take a while)

model2.fit(X_train, y_train)

accy = model2.score(X_test, y_test)

Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 450, validation data size: 50



Training  :   0%|          | 0/29 [00:00<?, ?it/s][A
Training  :   0%|          | 0/29 [00:13<?, ?it/s, loss=1.26][A
Training  :   3%|▎         | 1/29 [00:13<06:14, 13.39s/it, loss=1.26][A
Training  :   3%|▎         | 1/29 [00:26<06:14, 13.39s/it, loss=1.27][A
Training  :   7%|▋         | 2/29 [00:26<05:55, 13.18s/it, loss=1.27][A
Training  :   7%|▋         | 2/29 [00:39<05:55, 13.18s/it, loss=1.2] [A
Training  :  10%|█         | 3/29 [00:39<05:42, 13.15s/it, loss=1.2][A
Training  :  10%|█         | 3/29 [00:52<05:42, 13.15s/it, loss=1.17][A
Training  :  14%|█▍        | 4/29 [00:52<05:29, 13.20s/it, loss=1.17][A
Training  :  14%|█▍        | 4/29 [01:05<05:29, 13.20s/it, loss=1.13][A
Training  :  17%|█▋        | 5/29 [01:05<05:18, 13.26s/it, loss=1.13][A
Training  :  17%|█▋        | 5/29 [01:19<05:18, 13.26s/it, loss=1.08][A
Training  :  21%|██        | 6/29 [01:19<05:04, 13.22s/it, loss=1.08][A
Training  :  21%|██        | 6/29 [01:31<05:04, 13.22s/it, loss=1.06][A
Train

Epoch 1, Train loss: 0.9551, Val loss: 0.8929, Val accy: 58.00%




Training  :   0%|          | 0/29 [00:00<?, ?it/s][A
Training  :   0%|          | 0/29 [00:13<?, ?it/s, loss=0.918][A
Training  :   3%|▎         | 1/29 [00:13<06:30, 13.94s/it, loss=0.918][A
Training  :   3%|▎         | 1/29 [00:27<06:30, 13.94s/it, loss=0.954][A
Training  :   7%|▋         | 2/29 [00:27<06:13, 13.85s/it, loss=0.954][A
Training  :   7%|▋         | 2/29 [00:40<06:13, 13.85s/it, loss=0.897][A
Training  :  10%|█         | 3/29 [00:40<05:51, 13.52s/it, loss=0.897][A
Training  :  10%|█         | 3/29 [00:53<05:51, 13.52s/it, loss=0.939][A
Training  :  14%|█▍        | 4/29 [00:53<05:32, 13.32s/it, loss=0.939][A
Training  :  14%|█▍        | 4/29 [01:06<05:32, 13.32s/it, loss=0.991][A
Training  :  17%|█▋        | 5/29 [01:06<05:21, 13.38s/it, loss=0.991][A
Training  :  17%|█▋        | 5/29 [01:20<05:21, 13.38s/it, loss=0.932][A
Training  :  21%|██        | 6/29 [01:20<05:07, 13.38s/it, loss=0.932][A
Training  :  21%|██        | 6/29 [01:33<05:07, 13.38s/it, loss=

Epoch 2, Train loss: 0.9297, Val loss: 0.8889, Val accy: 58.00%




Training  :   0%|          | 0/29 [00:00<?, ?it/s][A
Training  :   0%|          | 0/29 [00:14<?, ?it/s, loss=0.775][A
Training  :   3%|▎         | 1/29 [00:14<06:51, 14.70s/it, loss=0.775][A
Training  :   3%|▎         | 1/29 [00:28<06:51, 14.70s/it, loss=0.907][A
Training  :   7%|▋         | 2/29 [00:28<06:27, 14.36s/it, loss=0.907][A
Training  :   7%|▋         | 2/29 [00:41<06:27, 14.36s/it, loss=0.935][A
Training  :  10%|█         | 3/29 [00:41<06:04, 14.03s/it, loss=0.935][A
Training  :  10%|█         | 3/29 [00:54<06:04, 14.03s/it, loss=0.915][A
Training  :  14%|█▍        | 4/29 [00:54<05:44, 13.78s/it, loss=0.915][A
Training  :  14%|█▍        | 4/29 [01:08<05:44, 13.78s/it, loss=0.904][A
Training  :  17%|█▋        | 5/29 [01:08<05:27, 13.63s/it, loss=0.904][A
Training  :  17%|█▋        | 5/29 [01:21<05:27, 13.63s/it, loss=0.897][A
Training  :  21%|██        | 6/29 [01:21<05:09, 13.48s/it, loss=0.897][A
Training  :  21%|██        | 6/29 [01:34<05:09, 13.48s/it, loss=

Epoch 3, Train loss: 0.8879, Val loss: 0.8923, Val accy: 58.00%




Training  :   0%|          | 0/29 [00:00<?, ?it/s][A
Training  :   0%|          | 0/29 [00:14<?, ?it/s, loss=0.806][A
Training  :   3%|▎         | 1/29 [00:14<06:39, 14.28s/it, loss=0.806][A
Training  :   3%|▎         | 1/29 [00:27<06:39, 14.28s/it, loss=0.75] [A
Training  :   7%|▋         | 2/29 [00:27<06:17, 13.96s/it, loss=0.75][A
Training  :   7%|▋         | 2/29 [00:40<06:17, 13.96s/it, loss=0.736][A
Training  :  10%|█         | 3/29 [00:40<05:56, 13.71s/it, loss=0.736][A
Training  :  10%|█         | 3/29 [00:53<05:56, 13.71s/it, loss=0.783][A
Training  :  14%|█▍        | 4/29 [00:53<05:37, 13.50s/it, loss=0.783][A
Training  :  14%|█▍        | 4/29 [01:06<05:37, 13.50s/it, loss=0.795][A
Training  :  17%|█▋        | 5/29 [01:06<05:20, 13.37s/it, loss=0.795][A
Training  :  17%|█▋        | 5/29 [01:19<05:20, 13.37s/it, loss=0.79] [A
Training  :  21%|██        | 6/29 [01:19<05:04, 13.26s/it, loss=0.79][A
Training  :  21%|██        | 6/29 [01:32<05:04, 13.26s/it, loss=0.

Epoch 4, Train loss: 0.8642, Val loss: 0.8801, Val accy: 58.00%




Testing:   0%|          | 0/38 [00:00<?, ?it/s][A
Testing:   3%|▎         | 1/38 [00:01<01:02,  1.68s/it][A
Testing:   5%|▌         | 2/38 [00:03<00:58,  1.64s/it][A
Testing:   8%|▊         | 3/38 [00:04<00:56,  1.60s/it][A
Testing:  11%|█         | 4/38 [00:06<00:53,  1.58s/it][A
Testing:  13%|█▎        | 5/38 [00:07<00:51,  1.58s/it][A
Testing:  16%|█▌        | 6/38 [00:09<00:50,  1.58s/it][A
Testing:  18%|█▊        | 7/38 [00:10<00:48,  1.57s/it][A
Testing:  21%|██        | 8/38 [00:12<00:46,  1.56s/it][A
Testing:  24%|██▎       | 9/38 [00:14<00:45,  1.58s/it][A
Testing:  26%|██▋       | 10/38 [00:15<00:43,  1.57s/it][A
Testing:  29%|██▉       | 11/38 [00:17<00:42,  1.56s/it][A
Testing:  32%|███▏      | 12/38 [00:18<00:40,  1.55s/it][A
Testing:  34%|███▍      | 13/38 [00:20<00:38,  1.56s/it][A
Testing:  37%|███▋      | 14/38 [00:21<00:37,  1.56s/it][A
Testing:  39%|███▉      | 15/38 [00:23<00:36,  1.57s/it][A
Testing:  42%|████▏     | 16/38 [00:25<00:34,  1.57s/it]


Loss: 0.8541, Accuracy: 60.67%
CPU times: user 1h 4min 38s, sys: 5min 45s, total: 1h 10min 24s
Wall time: 26min 26s





In [18]:
y_pred = model2.predict(X_test)
report2 = classification_report(y_pred, y_test, labels=[-1,0,1])
print(report2)


Predicting:   0%|          | 0/38 [00:00<?, ?it/s][A
Predicting:   3%|▎         | 1/38 [00:01<01:12,  1.97s/it][A
Predicting:   5%|▌         | 2/38 [00:03<01:08,  1.89s/it][A
Predicting:   8%|▊         | 3/38 [00:05<01:06,  1.90s/it][A
Predicting:  11%|█         | 4/38 [00:07<01:02,  1.83s/it][A
Predicting:  13%|█▎        | 5/38 [00:08<00:59,  1.79s/it][A
Predicting:  16%|█▌        | 6/38 [00:10<00:56,  1.77s/it][A
Predicting:  18%|█▊        | 7/38 [00:12<00:54,  1.76s/it][A
Predicting:  21%|██        | 8/38 [00:14<00:52,  1.75s/it][A
Predicting:  24%|██▎       | 9/38 [00:15<00:50,  1.74s/it][A
Predicting:  26%|██▋       | 10/38 [00:17<00:48,  1.73s/it][A
Predicting:  29%|██▉       | 11/38 [00:19<00:46,  1.73s/it][A
Predicting:  32%|███▏      | 12/38 [00:21<00:44,  1.73s/it][A
Predicting:  34%|███▍      | 13/38 [00:22<00:43,  1.73s/it][A
Predicting:  37%|███▋      | 14/38 [00:24<00:41,  1.73s/it][A
Predicting:  39%|███▉      | 15/38 [00:26<00:39,  1.73s/it][A
Predictin

              precision    recall  f1-score   support

          -1       0.00      0.00      0.00         0
           0       1.00      0.61      0.76       300
           1       0.00      0.00      0.00         0

    accuracy                           0.61       300
   macro avg       0.33      0.20      0.25       300
weighted avg       1.00      0.61      0.76       300






## Decrease the Learning Rate /10 (to get better accuracy)

In [15]:
model3 = BertClassifier(bert_model='bert-base-uncased', label_list=[-1,0,1])
model3.max_seq_length = 128
model3.learning_rate = 2e-04
model3.train_batch_size = 16
model3.epochs = 4

print(model3)

Building sklearn text classifier...
BertClassifier(bert_config_json=None, bert_model='bert-base-uncased',
               bert_vocab=None, do_lower_case=None, epochs=4, eval_batch_size=8,
               fp16=False, from_tf=False, gradient_accumulation_steps=1,
               ignore_label=None, label_list=[-1, 0, 1], learning_rate=0.0002,
               local_rank=-1, logfile='bert_sklearn.log', loss_scale=0,
               max_seq_length=128, num_mlp_hiddens=500, num_mlp_layers=0,
               random_state=42, restore_file=None, train_batch_size=16,
               use_cuda=True, validation_fraction=0.1, warmup_proportion=0.1)


In [16]:
%%time
## Train the model using our data (this could take a while)

model3.fit(X_train, y_train)

accy = model3.score(X_test, y_test)

Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 450, validation data size: 50



Training  :   0%|          | 0/29 [00:00<?, ?it/s][A
Training  :   0%|          | 0/29 [00:13<?, ?it/s, loss=1.26][A
Training  :   3%|▎         | 1/29 [00:13<06:20, 13.60s/it, loss=1.26][A
Training  :   3%|▎         | 1/29 [00:26<06:20, 13.60s/it, loss=1.27][A
Training  :   7%|▋         | 2/29 [00:26<06:03, 13.45s/it, loss=1.27][A
Training  :   7%|▋         | 2/29 [00:39<06:03, 13.45s/it, loss=1.2] [A
Training  :  10%|█         | 3/29 [00:39<05:44, 13.25s/it, loss=1.2][A
Training  :  10%|█         | 3/29 [00:52<05:44, 13.25s/it, loss=1.17][A
Training  :  14%|█▍        | 4/29 [00:52<05:28, 13.14s/it, loss=1.17][A
Training  :  14%|█▍        | 4/29 [01:05<05:28, 13.14s/it, loss=1.13][A
Training  :  17%|█▋        | 5/29 [01:05<05:13, 13.07s/it, loss=1.13][A
Training  :  17%|█▋        | 5/29 [01:18<05:13, 13.07s/it, loss=1.08][A
Training  :  21%|██        | 6/29 [01:18<05:00, 13.08s/it, loss=1.08][A
Training  :  21%|██        | 6/29 [01:31<05:00, 13.08s/it, loss=1.06][A
Train

Epoch 1, Train loss: 0.9551, Val loss: 0.8929, Val accy: 58.00%




Training  :   0%|          | 0/29 [00:00<?, ?it/s][A
Training  :   0%|          | 0/29 [00:13<?, ?it/s, loss=0.918][A
Training  :   3%|▎         | 1/29 [00:13<06:31, 13.99s/it, loss=0.918][A
Training  :   3%|▎         | 1/29 [00:26<06:31, 13.99s/it, loss=0.954][A
Training  :   7%|▋         | 2/29 [00:26<06:09, 13.67s/it, loss=0.954][A
Training  :   7%|▋         | 2/29 [00:39<06:09, 13.67s/it, loss=0.897][A
Training  :  10%|█         | 3/29 [00:39<05:46, 13.32s/it, loss=0.897][A
Training  :  10%|█         | 3/29 [00:52<05:46, 13.32s/it, loss=0.939][A
Training  :  14%|█▍        | 4/29 [00:52<05:27, 13.12s/it, loss=0.939][A
Training  :  14%|█▍        | 4/29 [01:04<05:27, 13.12s/it, loss=0.991][A
Training  :  17%|█▋        | 5/29 [01:04<05:13, 13.04s/it, loss=0.991][A
Training  :  17%|█▋        | 5/29 [01:17<05:13, 13.04s/it, loss=0.932][A
Training  :  21%|██        | 6/29 [01:17<04:58, 12.99s/it, loss=0.932][A
Training  :  21%|██        | 6/29 [01:31<04:58, 12.99s/it, loss=

Epoch 2, Train loss: 0.9297, Val loss: 0.8889, Val accy: 58.00%




Training  :   0%|          | 0/29 [00:00<?, ?it/s][A
Training  :   0%|          | 0/29 [00:14<?, ?it/s, loss=0.775][A
Training  :   3%|▎         | 1/29 [00:14<06:40, 14.31s/it, loss=0.775][A
Training  :   3%|▎         | 1/29 [00:27<06:40, 14.31s/it, loss=0.907][A
Training  :   7%|▋         | 2/29 [00:27<06:16, 13.96s/it, loss=0.907][A
Training  :   7%|▋         | 2/29 [00:40<06:16, 13.96s/it, loss=0.935][A
Training  :  10%|█         | 3/29 [00:40<05:57, 13.73s/it, loss=0.935][A
Training  :  10%|█         | 3/29 [00:53<05:57, 13.73s/it, loss=0.915][A
Training  :  14%|█▍        | 4/29 [00:53<05:39, 13.58s/it, loss=0.915][A
Training  :  14%|█▍        | 4/29 [01:06<05:39, 13.58s/it, loss=0.904][A
Training  :  17%|█▋        | 5/29 [01:06<05:22, 13.44s/it, loss=0.904][A
Training  :  17%|█▋        | 5/29 [01:20<05:22, 13.44s/it, loss=0.897][A
Training  :  21%|██        | 6/29 [01:20<05:07, 13.35s/it, loss=0.897][A
Training  :  21%|██        | 6/29 [01:33<05:07, 13.35s/it, loss=

Epoch 3, Train loss: 0.8879, Val loss: 0.8923, Val accy: 58.00%




Training  :   0%|          | 0/29 [00:00<?, ?it/s][A
Training  :   0%|          | 0/29 [00:14<?, ?it/s, loss=0.806][A
Training  :   3%|▎         | 1/29 [00:14<06:35, 14.13s/it, loss=0.806][A
Training  :   3%|▎         | 1/29 [00:27<06:35, 14.13s/it, loss=0.75] [A
Training  :   7%|▋         | 2/29 [00:27<06:13, 13.84s/it, loss=0.75][A
Training  :   7%|▋         | 2/29 [00:40<06:13, 13.84s/it, loss=0.736][A
Training  :  10%|█         | 3/29 [00:40<05:54, 13.64s/it, loss=0.736][A
Training  :  10%|█         | 3/29 [00:53<05:54, 13.64s/it, loss=0.783][A
Training  :  14%|█▍        | 4/29 [00:53<05:37, 13.50s/it, loss=0.783][A
Training  :  14%|█▍        | 4/29 [01:06<05:37, 13.50s/it, loss=0.795][A
Training  :  17%|█▋        | 5/29 [01:06<05:22, 13.44s/it, loss=0.795][A
Training  :  17%|█▋        | 5/29 [01:20<05:22, 13.44s/it, loss=0.79] [A
Training  :  21%|██        | 6/29 [01:20<05:08, 13.41s/it, loss=0.79][A
Training  :  21%|██        | 6/29 [01:33<05:08, 13.41s/it, loss=0.

Epoch 4, Train loss: 0.8642, Val loss: 0.8801, Val accy: 58.00%




Testing:   0%|          | 0/38 [00:00<?, ?it/s][A
Testing:   3%|▎         | 1/38 [00:01<01:01,  1.66s/it][A
Testing:   5%|▌         | 2/38 [00:03<00:59,  1.64s/it][A
Testing:   8%|▊         | 3/38 [00:04<00:56,  1.62s/it][A
Testing:  11%|█         | 4/38 [00:06<00:55,  1.63s/it][A
Testing:  13%|█▎        | 5/38 [00:08<00:53,  1.64s/it][A
Testing:  16%|█▌        | 6/38 [00:09<00:52,  1.64s/it][A
Testing:  18%|█▊        | 7/38 [00:11<00:50,  1.63s/it][A
Testing:  21%|██        | 8/38 [00:13<00:49,  1.66s/it][A
Testing:  24%|██▎       | 9/38 [00:14<00:47,  1.65s/it][A
Testing:  26%|██▋       | 10/38 [00:16<00:46,  1.65s/it][A
Testing:  29%|██▉       | 11/38 [00:18<00:44,  1.65s/it][A
Testing:  32%|███▏      | 12/38 [00:19<00:42,  1.64s/it][A
Testing:  34%|███▍      | 13/38 [00:21<00:40,  1.62s/it][A
Testing:  37%|███▋      | 14/38 [00:22<00:39,  1.64s/it][A
Testing:  39%|███▉      | 15/38 [00:24<00:38,  1.67s/it][A
Testing:  42%|████▏     | 16/38 [00:26<00:36,  1.66s/it]


Loss: 0.8541, Accuracy: 60.67%
CPU times: user 1h 4min 48s, sys: 5min 47s, total: 1h 10min 35s
Wall time: 26min 28s





In [19]:
y_pred = model3.predict(X_test)
report3 = classification_report(y_pred, y_test, labels=[-1,0,1])
print(report3)


Predicting:   0%|          | 0/38 [00:00<?, ?it/s][A
Predicting:   3%|▎         | 1/38 [00:01<01:11,  1.94s/it][A
Predicting:   5%|▌         | 2/38 [00:03<01:08,  1.90s/it][A
Predicting:   8%|▊         | 3/38 [00:05<01:05,  1.86s/it][A
Predicting:  11%|█         | 4/38 [00:07<01:02,  1.83s/it][A
Predicting:  13%|█▎        | 5/38 [00:08<00:59,  1.80s/it][A
Predicting:  16%|█▌        | 6/38 [00:10<00:57,  1.80s/it][A
Predicting:  18%|█▊        | 7/38 [00:12<00:55,  1.79s/it][A
Predicting:  21%|██        | 8/38 [00:14<00:53,  1.78s/it][A
Predicting:  24%|██▎       | 9/38 [00:16<00:51,  1.78s/it][A
Predicting:  26%|██▋       | 10/38 [00:17<00:49,  1.78s/it][A
Predicting:  29%|██▉       | 11/38 [00:19<00:48,  1.81s/it][A
Predicting:  32%|███▏      | 12/38 [00:21<00:46,  1.79s/it][A
Predicting:  34%|███▍      | 13/38 [00:23<00:44,  1.78s/it][A
Predicting:  37%|███▋      | 14/38 [00:24<00:42,  1.77s/it][A
Predicting:  39%|███▉      | 15/38 [00:26<00:40,  1.77s/it][A
Predictin

              precision    recall  f1-score   support

          -1       0.00      0.00      0.00         0
           0       1.00      0.61      0.76       300
           1       0.00      0.00      0.00         0

    accuracy                           0.61       300
   macro avg       0.33      0.20      0.25       300
weighted avg       1.00      0.61      0.76       300




