# ML4NLP1
## Starting Point for Exercise 1, part II

This notebook is supposed to serve as a starting point and/or inspiration when starting exercise 1, part II.

One of the goals of this exercise is o make you acquainted with **skorch**. You will probably need to consult the [documentation](https://skorch.readthedocs.io/en/stable/).

# Installing skorch and loading libraries

In [1]:
import subprocess

# Installation on Google Colab
try:
    import google.colab
    subprocess.run(['python', '-m', 'pip', 'install', 'skorch'])
except ImportError:
    pass

In [2]:
import torch
from torch import nn
import torch.nn.functional as F
from skorch import NeuralNetClassifier

In [3]:
torch.manual_seed(0)
torch.cuda.manual_seed(0)

In [4]:
import pandas as pd
import numpy as np
import csv
import re
import string
from collections import defaultdict

## Training a classifier and making predictions

In [5]:
# download dataset
!gdown 1QP6YuwdKFNUPpvhOaAcvv2Pcp4JMbIRs # x_train
!gdown 1QVo7PZAdiZKzifK8kwhEr_umosiDCUx6 # x_test
!gdown 1QbBeKcmG2ZyAEFB3AKGTgSWQ1YEMn2jl # y_train
!gdown 1QaZj6bI7_78ymnN8IpSk4gVvg-C9fA6X # y_test

Downloading...
From: https://drive.google.com/uc?id=1QP6YuwdKFNUPpvhOaAcvv2Pcp4JMbIRs
To: /content/x_train.txt
100% 64.1M/64.1M [00:00<00:00, 167MB/s]
Downloading...
From: https://drive.google.com/uc?id=1QVo7PZAdiZKzifK8kwhEr_umosiDCUx6
To: /content/x_test.txt
100% 65.2M/65.2M [00:00<00:00, 111MB/s]
Downloading...
From: https://drive.google.com/uc?id=1QbBeKcmG2ZyAEFB3AKGTgSWQ1YEMn2jl
To: /content/y_train.txt
100% 480k/480k [00:00<00:00, 78.1MB/s]
Downloading...
From: https://drive.google.com/uc?id=1QaZj6bI7_78ymnN8IpSk4gVvg-C9fA6X
To: /content/y_test.txt
100% 480k/480k [00:00<00:00, 37.1MB/s]


In [6]:
with open(f'x_train.txt') as f:
    x_train = f.read().splitlines()
with open(f'y_train.txt') as f:
    y_train = f.read().splitlines()
with open(f'x_test.txt') as f:
    x_test = f.read().splitlines()
with open(f'y_test.txt') as f:
    y_test = f.read().splitlines()

In [7]:
import pandas as pd
# combine x_train and y_train into one dataframe
train_df = pd.DataFrame({'text': x_train, 'label': y_train})

# combine x_test and y_test into one dataframe
test_df = pd.DataFrame({'text': x_test, 'label': y_test})

In [8]:
# T: Please use again the train/test data that includes English, German, Dutch, Danish, Swedish and Norwegian, plus 20 additional languages of your choice (the labels can be found in the file labels.csv)
# and adjust the train/test split if needed

# Use again the labels selected in part1
given_languages = ['eng', 'deu', 'nld', 'dan', 'swe', 'nno']

additional_languages = ['ara', 'hak', 'lzh', 'tha', 'fra',
                      'mal', 'zh-yue', 'zho', 'bar', 'tam',
                      'tur', 'ukr', 'vie', 'cdo', 'ces',
                      'div', 'ell', 'jpn', 'kor', 'lad']

selected_languages = given_languages + additional_languages

train_df_filtered = train_df[train_df['label'].isin(selected_languages)]
test_df_filtered = test_df[test_df['label'].isin(selected_languages)]

combined_df_filtered = pd.concat([train_df_filtered, test_df_filtered], axis=0)

# check the size of training set and test set
# print(len(train_df_filtered))
# print(len(test_df_filtered))
# print(len(combined_df_filtered))

In [9]:
from sklearn.model_selection import train_test_split

# Adjust the data split into 80% training and 20% testing
X_train, X_test, Y_train, Y_test = train_test_split(
    combined_df_filtered['text'],
    combined_df_filtered['label'],
    test_size=0.2
)

# check the size of adjuested training set and test set
# print(len(X_train))
# print(len(Y_train))
# print(len(X_test))
# print(len(Y_test))

In [10]:
# T: use your adjusted code to encode the labels here
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
Y_train = le.fit_transform(Y_train)
Y_test = le.transform(Y_test)

# check if all the labels are corecyly encoded
# print(np.unique(Y_train))
# print(np.unique(Y_test))
# print(Y_train.dtype)
# print(Y_test.dtype)

In [11]:
# T: In the following, you can find a small (almost) working example of a neural network. Unfortunately, again, the cat messed up some of the code. Please fix the code such that it is executable.

In [12]:
# First, we extract some simple features as input for the neural network
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(analyzer='char', ngram_range=(2, 2), max_features=100, binary=True)

# Vectorize X in to a sparse matrix
X = vectorizer.fit_transform(X_train.to_numpy())

In [13]:
X = X.astype(np.float32)
y = Y_train.astype(np.int64)  # ready for NN training

In the following, we define a vanilla neural network with two hidden layers. The output layer should have as many outputs as there are classes. In addition, it should have a nonlinearity function.

In [14]:
class ClassifierModule(nn.Module):
    def __init__(
            self,
            num_units=200,
            nonlin=F.relu,
    ):
        super(ClassifierModule, self).__init__()
        self.num_units = num_units
        self.nonlin = nonlin

        self.dense0 = nn.Linear(100, num_units)
        self.nonlin = nonlin
        self.dense1 = nn.Linear(num_units, 50)
        self.output = nn.Linear(50, 26)  # updated to the number of classes: 26

    def forward(self, X, **kwargs):
      X = self.nonlin(self.dense0(X))
      X = F.relu(self.dense1(X))
      X = self.output(X)
      return X.squeeze(dim=1)

In [15]:
from skorch.callbacks import EarlyStopping

net = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    criterion=nn.CrossEntropyLoss(),
    lr=0.1,
    # device='cuda',  # comment this to train with CPU
)

In [16]:
net.fit(X, y)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m2.9430[0m       [32m0.2820[0m        [35m2.5842[0m  3.2308
      2        [36m2.1704[0m       [32m0.4635[0m        [35m1.7737[0m  3.1706
      3        [36m1.5764[0m       [32m0.5488[0m        [35m1.3806[0m  1.9274
      4        [36m1.2986[0m       [32m0.5882[0m        [35m1.2092[0m  1.8819
      5        [36m1.1811[0m       [32m0.6099[0m        [35m1.1404[0m  1.9342
      6        [36m1.1235[0m       [32m0.6190[0m        [35m1.1033[0m  1.9701
      7        [36m1.0861[0m       [32m0.6269[0m        [35m1.0783[0m  2.1285
      8        [36m1.0583[0m       [32m0.6334[0m        [35m1.0600[0m  2.6380
      9        [36m1.0363[0m       [32m0.6356[0m        [35m1.0462[0m  2.6414
     10        [36m1.0184[0m       [32m0.6382[0m        [35m1.0356[0m  1.9051
     11        [36m1.0037[0m       [32m0.63

<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=ClassifierModule(
    (dense0): Linear(in_features=100, out_features=200, bias=True)
    (dense1): Linear(in_features=200, out_features=50, bias=True)
    (output): Linear(in_features=50, out_features=26, bias=True)
  ),
)

In [17]:
train_score = net.score(X, y)
print(f"Training accuracy: {train_score:.2f}")

Training accuracy: 0.66


In [18]:
# Prepare test set for testing the trained NN
X_T = vectorizer.transform(X_test.to_numpy())
X_T = X_T.astype(np.float32)
y_t = Y_test.astype(np.int64)

In [19]:
test_score = net.score(X_T, y_t)
print(f"Test accuracy: {test_score:.2f}")

Test accuracy: 0.64


## Question 2: Improving accuracy




Based on the training model above, we now try to improve the training and testing accuracy by tuning hyperparameters.   
*   In the first part, we explore the importance of different parameters by mannually selecting values.  
*   In the second part, we use GridSearchCV to find the best hyperparameter combination automatically.



In [45]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.preprocessing import FunctionTransformer
from skorch import NeuralNetClassifier
from skorch.callbacks import EarlyStopping
from sklearn.pipeline import Pipeline
import torch.optim as optim
import torch.nn.functional as F

###First try
 We expanded the number of hidden layer units to 300, and use tanh as the new nonlinear activation function.  
Also, we use the new Adam optimizer and a smaller learning rate: 0.01.

In [21]:
class ClassifierModule_1(nn.Module):
    def __init__(
            self,
            num_units=300, # 200 to 300
            nonlin=F.tanh, # F.relu to F.tanh
    ):
        super(ClassifierModule_1, self).__init__()
        self.num_units = num_units
        self.nonlin = nonlin

        self.dense0 = nn.Linear(100, num_units)
        self.nonlin = nonlin
        self.dense1 = nn.Linear(num_units, 50)
        self.output = nn.Linear(50, 26)

    def forward(self, X, **kwargs):
      X = self.nonlin(self.dense0(X))
      X = F.relu(self.dense1(X))
      X = self.output(X)
      return X.squeeze(dim=1)

In [22]:
net_1 = NeuralNetClassifier(
    ClassifierModule_1,
    max_epochs=20,
    criterion=nn.CrossEntropyLoss(),
    optimizer=optim.Adam, #
    lr=0.01, # 0.1 to 0.01
    # device='cuda',  # comment this to train with CPU
)

In [23]:
net_1.fit(X, y)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m1.5101[0m       [32m0.5702[0m        [35m1.2035[0m  2.0220
      2        [36m1.0868[0m       [32m0.6139[0m        [35m1.0901[0m  2.0142
      3        [36m1.0358[0m       [32m0.6197[0m        [35m1.0816[0m  2.5133
      4        [36m0.9940[0m       0.6185        1.0871  2.4442
      5        [36m0.9682[0m       [32m0.6288[0m        1.0894  1.9952
      6        [36m0.9478[0m       [32m0.6296[0m        [35m1.0778[0m  1.9695
      7        [36m0.9204[0m       0.6291        1.1016  2.9079
      8        [36m0.9003[0m       0.6255        1.1018  2.3498
      9        [36m0.8988[0m       0.6236        1.1388  2.8139
     10        [36m0.8622[0m       [32m0.6317[0m        1.1325  1.9857
     11        [36m0.8534[0m       0.6200        1.1964  2.0056
     12        [36m0.8445[0m       [32m0.6325[0m        1.1820  1.

<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=ClassifierModule_1(
    (dense0): Linear(in_features=100, out_features=300, bias=True)
    (dense1): Linear(in_features=300, out_features=50, bias=True)
    (output): Linear(in_features=50, out_features=26, bias=True)
  ),
)

In [24]:
train_score = net_1.score(X, y)
print(f"Training accuracy: {train_score:.2f}")

Training accuracy: 0.68


In [25]:
test_score = net_1.score(X_T, y_t)
print(f"Test accuracy: {test_score:.2f}")

Test accuracy: 0.60


###Second try
Compared to the first try, we use a even smaller learning rate: 0.005, and add the early stopping.

In [26]:
class ClassifierModule_2(nn.Module):
    def __init__(
            self,
            num_units=300,
            nonlin=F.tanh,
    ):
        super(ClassifierModule_2, self).__init__()
        self.num_units = num_units
        self.nonlin = nonlin

        self.dense0 = nn.Linear(100, num_units)
        self.nonlin = nonlin
        self.dense1 = nn.Linear(num_units, 50)
        self.output = nn.Linear(50, 26)

    def forward(self, X, **kwargs):
      X = self.nonlin(self.dense0(X))
      X = F.relu(self.dense1(X))
      X = self.output(X)
      return X.squeeze(dim=1)

In [27]:
net_2 = NeuralNetClassifier(
    ClassifierModule_2,
    max_epochs=20,
    criterion=nn.CrossEntropyLoss(),
    optimizer=optim.Adam,
    lr=0.005, # 0.01 to 0.005
    callbacks=[EarlyStopping(patience=5)], # add early stopping condition
    # device='cuda',  # comment this to train with CPU
)

In [28]:
net_2.fit(X, y)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m1.5355[0m       [32m0.5889[0m        [35m1.1388[0m  2.0542
      2        [36m1.0501[0m       [32m0.6317[0m        [35m1.0298[0m  2.1422
      3        [36m0.9873[0m       [32m0.6334[0m        1.0312  2.1270
      4        [36m0.9618[0m       0.6320        1.0345  2.2988
      5        [36m0.9359[0m       0.6238        1.0644  2.6388
      6        [36m0.9180[0m       [32m0.6358[0m        1.0584  1.9403
Stopping since valid_loss has not improved in the last 5 epochs.


<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=ClassifierModule_2(
    (dense0): Linear(in_features=100, out_features=300, bias=True)
    (dense1): Linear(in_features=300, out_features=50, bias=True)
    (output): Linear(in_features=50, out_features=26, bias=True)
  ),
)

In [29]:
train_score = net_2.score(X, y)
print(f"Training accuracy: {train_score:.2f}")

Training accuracy: 0.67


In [30]:
test_score = net_2.score(X_T, y_t)
print(f"Test accuracy: {test_score:.2f}")

Test accuracy: 0.64


###Third try
In this round, we try to replace the Adam optimizer with the SGD optimizer. We found that the accuracy was much lower after training, however there was an evident trend of growth. So we also increased the number of epochs and adjusted the early stopping patience accordingly.  
Here we don't show the whole process of our exploration.

In [31]:
class ClassifierModule_3(nn.Module):
    def __init__(
            self,
            num_units=300,
            nonlin=F.tanh,
    ):
        super(ClassifierModule_3, self).__init__()
        self.num_units = num_units
        self.nonlin = nonlin

        self.dense0 = nn.Linear(100, num_units)
        self.nonlin = nonlin
        self.dense1 = nn.Linear(num_units, 50)
        self.output = nn.Linear(50, 26)

    def forward(self, X, **kwargs):
      X = self.nonlin(self.dense0(X))
      X = F.relu(self.dense1(X))
      X = self.output(X)
      return X.squeeze(dim=1)

In [32]:
net_3 = NeuralNetClassifier(
    ClassifierModule_3,
    max_epochs=200, # increase from 20 to 200
    criterion=nn.CrossEntropyLoss(),
    optimizer=optim.SGD, # Adam to SGD
    lr=0.005,
    callbacks=[EarlyStopping(patience=20)], # adjusted accordingly
    # device='cuda',  # comment this to train with CPU
)

In [33]:
net_3.fit(X, y)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m3.2358[0m       [32m0.0423[0m        [35m3.2187[0m  1.9601
      2        [36m3.1987[0m       [32m0.0560[0m        [35m3.1782[0m  2.8086
      3        [36m3.1528[0m       [32m0.0716[0m        [35m3.1275[0m  1.9926
      4        [36m3.0973[0m       [32m0.0901[0m        [35m3.0691[0m  1.8843
      5        [36m3.0379[0m       [32m0.1474[0m        [35m3.0112[0m  1.9032
      6        [36m2.9826[0m       [32m0.1962[0m        [35m2.9608[0m  1.8834
      7        [36m2.9360[0m       [32m0.2385[0m        [35m2.9185[0m  1.8979
      8        [36m2.8952[0m       [32m0.2683[0m        [35m2.8794[0m  2.7937
      9        [36m2.8558[0m       [32m0.2892[0m        [35m2.8401[0m  1.9987
     10        [36m2.8153[0m       [32m0.3394[0m        [35m2.7990[0m  1.9023
     11        [36m2.7724[0m       [32m0.36

<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=ClassifierModule_3(
    (dense0): Linear(in_features=100, out_features=300, bias=True)
    (dense1): Linear(in_features=300, out_features=50, bias=True)
    (output): Linear(in_features=50, out_features=26, bias=True)
  ),
)

In [34]:
train_score = net_3.score(X, y)
print(f"Training accuracy: {train_score:.2f}")

Training accuracy: 0.66


In [35]:
test_score = net_3.score(X_T, y_t)
print(f"Test accuracy: {test_score:.2f}")

Test accuracy: 0.65


Unfortunately, there is no obvious improvement of accuracy compared to using the Adam optimizer.

### Fourth try

In this round, we try a new nonlinear activation function: sigmoid.  
The tuning process of other parameters combined with sigmoid is not completely presented here. When the accuracy obtained under multiple parameter combinations is not very different, we arbitrarily choose one as the final presentation.

In [36]:
class ClassifierModule_4(nn.Module):
    def __init__(
            self,
            num_units=300,
            nonlin=F.sigmoid, #
    ):
        super(ClassifierModule_4, self).__init__()
        self.num_units = num_units
        self.nonlin = nonlin

        self.dense0 = nn.Linear(100, num_units)
        self.nonlin = nonlin
        self.dense1 = nn.Linear(num_units, 50)
        self.output = nn.Linear(50, 26)

    def forward(self, X, **kwargs):
      X = self.nonlin(self.dense0(X))
      X = F.relu(self.dense1(X))
      X = self.output(X)
      return X.squeeze(dim=1)

In [37]:
net_4 = NeuralNetClassifier(
    ClassifierModule_4,
    max_epochs=20,
    criterion=nn.CrossEntropyLoss(),
    optimizer=optim.Adam,
    lr=0.01
    # callbacks=[EarlyStopping(patience=5)],#
    # device='cuda',  # comment this to train with CPU
)

In [38]:
net_4.fit(X, y)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m2.3922[0m       [32m0.4267[0m        [35m1.6573[0m  2.9829
      2        [36m1.4156[0m       [32m0.5329[0m        [35m1.2848[0m  2.6766
      3        [36m1.2049[0m       [32m0.5832[0m        [35m1.1973[0m  2.2800
      4        [36m1.1340[0m       [32m0.5938[0m        [35m1.1609[0m  2.1023
      5        [36m1.0850[0m       [32m0.6029[0m        [35m1.1405[0m  1.9761
      6        [36m1.0481[0m       [32m0.6091[0m        [35m1.1203[0m  2.0462
      7        [36m1.0170[0m       [32m0.6188[0m        [35m1.1041[0m  2.1022
      8        [36m0.9882[0m       [32m0.6216[0m        [35m1.0955[0m  3.9923
      9        [36m0.9642[0m       [32m0.6252[0m        [35m1.0903[0m  2.1661
     10        [36m0.9420[0m       [32m0.6262[0m        1.0905  2.0745
     11        [36m0.9213[0m       [32m0.6267[0m   

<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=ClassifierModule_4(
    (dense0): Linear(in_features=100, out_features=300, bias=True)
    (dense1): Linear(in_features=300, out_features=50, bias=True)
    (output): Linear(in_features=50, out_features=26, bias=True)
  ),
)

In [39]:
train_score = net_4.score(X, y)
print(f"Training accuracy: {train_score:.2f}")

Training accuracy: 0.69


In [40]:
test_score = net_4.score(X_T, y_t)
print(f"Test accuracy: {test_score:.2f}")

Test accuracy: 0.62


The accuracy of sigmoid is even worse. Therefore, we don't include sigmoid as an option of activation function in the following grid search experiment.

###Grid Search


In former experiments, there are not obvious improvement of accuracy when we tune some parameters. So now we use grid search to find the best combination of these parameters.

In [41]:
# Redifine NN for GridSearch
class ClassifierModule_grid(nn.Module):
    def __init__(
            self,
            num_units=200,
            nonlin=F.relu,
    ):
        super(ClassifierModule_grid, self).__init__()
        self.num_units = num_units
        self.nonlin = nonlin

        self.dense0 = nn.Linear(100, num_units)
        self.nonlin = nonlin
        self.dense1 = nn.Linear(num_units, 50)
        self.output = nn.Linear(50, 26)

    def forward(self, X, **kwargs):
      X = self.nonlin(self.dense0(X))
      X = F.relu(self.dense1(X))
      X = self.output(X)
      return X.squeeze(dim=1)

In [42]:
from skorch.callbacks import EarlyStopping

net_grid = NeuralNetClassifier(
    ClassifierModule_grid,
    max_epochs=20,
    criterion=nn.CrossEntropyLoss(),
    lr=0.1,
    callbacks=[EarlyStopping(patience=5)], # adjusted accordingly
    # device='cuda',  # comment this to train with CPU
)

In [43]:
# Define a function to adjust datatype before NN
def type_adjust(a):
  a = a.astype(np.float32)
  return a

In [46]:
# Construct pipeline
pipeline = Pipeline(
    [
        ('vect', CountVectorizer(analyzer='char', max_features=100, binary=True)),
        ('adjust_datatype', FunctionTransformer(func=type_adjust, validate=False)),
        ('clf', net_grid),
    ]
)

pipeline

For some parameters explored in former tunation, we set fixed value to them in order to decrease the time and resource GridSearchCV needs.

In [47]:
# Parameters grid
para_grid = {
    # 'vect__max_df': [0.2, 0.4, 0.6],
    # 'vect__ngram_range': [(1, 1), (1, 2), (2, 2), (1, 4)],
    'vect__ngram_range': [(1, 1), (2, 2)],  # vary n-gram range
    'clf__module__nonlin': [F.relu, F.tanh], # activation function
    'clf__module__num_units': [300],  # hidden layer size
    'clf__optimizer': [optim.SGD, optim.Adam],  # solvers
    'clf__lr': [0.1, 0.01],  # learning rate
    'clf__callbacks__EarlyStopping__patience': [5], # early stopping
}

In [48]:
from sklearn.model_selection import GridSearchCV

grid_search = GridSearchCV(
    pipeline,
    para_grid,
    verbose=1,
    cv=3,
    scoring='accuracy',
)

In [49]:
# Use grid search on unvectorized training set
grid_search.fit(X_train, y)

Fitting 3 folds for each of 16 candidates, totalling 48 fits
  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m3.1187[0m       [32m0.3086[0m        [35m2.8608[0m  1.2760
      2        [36m2.4636[0m       [32m0.4369[0m        [35m2.0840[0m  1.2802
      3        [36m1.8189[0m       [32m0.5537[0m        [35m1.5671[0m  1.2781
      4        [36m1.4111[0m       [32m0.6056[0m        [35m1.2606[0m  1.2609
      5        [36m1.1744[0m       [32m0.6255[0m        [35m1.0918[0m  1.2910
      6        [36m1.0402[0m       [32m0.6478[0m        [35m0.9932[0m  1.5814
      7        [36m0.9584[0m       [32m0.6651[0m        [35m0.9309[0m  1.8286
      8        [36m0.9038[0m       [32m0.6716[0m        [35m0.8887[0m  2.2498
      9        [36m0.8655[0m       [32m0.6813[0m        [35m0.8592[0m  1.4698
     10        [36m0.8378[0m       [32m0.6889[0m        [35m0.83

In [50]:
# Get the best hyperparameters and corresponding accuracy
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_

print("Best score: {:.3f}, \nBest params: {}".format(best_accuracy, best_params))

best score: 0.712, 
best params: {'clf__callbacks__EarlyStopping__patience': 5, 'clf__lr': 0.01, 'clf__module__nonlin': <function relu at 0x7d49601c0a60>, 'clf__module__num_units': 300, 'clf__optimizer': <class 'torch.optim.adam.Adam'>, 'vect__ngram_range': (1, 1)}


In [53]:
# Get the test accuracy with the best model
test_accuracy = grid_search.score(X_test, y_t)

print(f"Best test accuracy: {test_accuracy:.3f}")

Best test accuracy: 0.714
