# Subjectivity classification with CNNs

In this notebook we implement the approched described in this [paper](https://arxiv.org/pdf/1408.5882.pdf) for classifiying sentences using Convolutional Neural Networks. In particular, we will classify sentences into "subjective" or "objective". 

## Subjectivity Dataset

The subjectivity dataset has 5000 subjective and 5000 objective processed sentences. To get the data:
```
wget http://www.cs.cornell.edu/people/pabo/movie-review-data/rotten_imdb.tar.gz
```

In [1]:
def unpack_dataset():
    ! wget http://www.cs.cornell.edu/people/pabo/movie-review-data/rotten_imdb.tar.gz
    ! mkdir data
    ! tar -xvf rotten_imdb.tar.gz -C data

In [2]:
 unpack_dataset()

--2020-02-28 08:57:47--  http://www.cs.cornell.edu/people/pabo/movie-review-data/rotten_imdb.tar.gz
Resolving www.cs.cornell.edu (www.cs.cornell.edu)... 132.236.207.20
Connecting to www.cs.cornell.edu (www.cs.cornell.edu)|132.236.207.20|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 519599 (507K) [application/x-gzip]
Saving to: ‘rotten_imdb.tar.gz’


2020-02-28 08:57:48 (818 KB/s) - ‘rotten_imdb.tar.gz’ saved [519599/519599]

mkdir: data: File exists
x quote.tok.gt9.5000
x plot.tok.gt9.5000
x subjdata.README.1.0


In [3]:
from pathlib import Path
PATH = Path("data")
list(PATH.iterdir())

[PosixPath('data/ml-latest-small.zip'),
 PosixPath('data/ml-latest-small'),
 PosixPath('data/plot.tok.gt9.5000'),
 PosixPath('data/subjdata.README.1.0'),
 PosixPath('data/tripAdvisor.zip'),
 PosixPath('data/quote.tok.gt9.5000')]

From the readme file:
- quote.tok.gt9.5000 contains 5000 subjective sentences (or snippets)
- plot.tok.gt9.5000 contains 5000 objective sentences

In [4]:
! head data/plot.tok.gt9.5000

the movie begins in the past where a young boy named sam attempts to save celebi from a hunter . 
emerging from the human psyche and showing characteristics of abstract expressionism , minimalism and russian constructivism , graffiti removal has secured its place in the history of modern art while being created by artists who are unconscious of their artistic achievements . 
spurning her mother's insistence that she get on with her life , mary is thrown out of the house , rejected by joe , and expelled from school as she grows larger with child . 
amitabh can't believe the board of directors and his mind is filled with revenge and what better revenge than robbing the bank himself , ironic as it may sound . 
she , among others excentricities , talks to a small rock , gertrude , like if she was alive . 
this gives the girls a fair chance of pulling the wool over their eyes using their sexiness to poach any last vestige of common sense the dons might have had . 
styled after vh1's "

## String cleaning functions

In [5]:
import numpy as np
from collections import defaultdict
import re

In [6]:
def read_file(path):
    """ Read file returns a shuttled list.
    """
    with open(path, encoding = "ISO-8859-1") as f:
        content = np.array(f.readlines())
    return content

In [7]:
def get_vocab(list_of_content):
    """Computes Dict of counts of words.
    
    Computes the number of times a word is on a document.
    """
    vocab = defaultdict(float)
    for content in list_of_content:
        for line in content:
            line = line.strip()
            words = set(line.split())
            for word in words:
                vocab[word] += 1
    return vocab       

## Split train and test

In [8]:
sub_content = read_file(PATH/"quote.tok.gt9.5000")
obj_content = read_file(PATH/"plot.tok.gt9.5000")
sub_content = np.array([line.strip() for line in sub_content])
obj_content = np.array([line.strip() for line in obj_content])
sub_y = np.zeros(len(sub_content))
obj_y = np.ones(len(obj_content))
X = np.append(sub_content, obj_content)
y = np.append(sub_y, obj_y)

In [9]:
from sklearn.model_selection import train_test_split

In [10]:
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

In [11]:
X_train[:5], y_train[:5]

(array(['will god let her fall or give her a new path ?',
        "the director's twitchy sketchbook style and adroit perspective shifts grow wearisome amid leaden pacing and indifferent craftsmanship ( most notably wretched sound design ) .",
        "welles groupie/scholar peter bogdanovich took a long time to do it , but he's finally provided his own broadside at publishing giant william randolph hearst .",
        'based on the 1997 john king novel of the same name with a rather odd synopsis : " a first novel about a seasoned chelsea football club hooligan who represents a disaffected society operating by brutal rules .',
        'yet , beneath an upbeat appearance , she is struggling desperately with the emotional and physical scars left by the attack .'],
       dtype='<U691'), array([1., 0., 0., 1., 1.]))

In [12]:
# getting vocab from training sets
data_vocab = get_vocab([X_train])

In [13]:
#data_vocab

## Embedding Layer

In [14]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [15]:
# an Embedding module containing 10 (words) tensors of size 3
embed = nn.Embedding(10, 3)
a = torch.LongTensor([[1,2,4,5,1]])
embed(a)

tensor([[[-0.8974,  0.5728, -0.4083],
         [-2.3058,  0.2786,  0.0788],
         [-0.8688,  0.3737,  0.6774],
         [-0.5391, -0.1792, -0.1027],
         [-0.8974,  0.5728, -0.4083]]], grad_fn=<EmbeddingBackward>)

In [16]:
## here is the randomly initialized embeddings
embed.weight.data

tensor([[ 0.6361,  1.2975,  0.0431],
        [-0.8974,  0.5728, -0.4083],
        [-2.3058,  0.2786,  0.0788],
        [ 0.3489, -0.5256,  0.4184],
        [-0.8688,  0.3737,  0.6774],
        [-0.5391, -0.1792, -0.1027],
        [ 1.6468,  0.9773,  0.6132],
        [-0.3383, -0.6915, -0.5389],
        [-0.3251, -0.9121,  0.6641],
        [ 0.6914,  0.2644, -0.2163]])

### Initializing embedding layer with Glove embeddings

To get glove pre-trained embeddings:
    `wget http://nlp.stanford.edu/data/glove.6B.zip`

In [None]:
def unpack_glove():
    ! wget http://nlp.stanford.edu/data/glove.6B.zip
    ! mkdir data
    ! unzip glove.6B.zip -C data

In this section I am keeping the whole Glove embeddings. You can decide to keep just the words on your training set.

In [18]:
! head -2 data/glove.6B.50d.txt

the 0.418 0.24968 -0.41242 0.1217 0.34527 -0.044457 -0.49688 -0.17862 -0.00066023 -0.6566 0.27843 -0.14767 -0.55677 0.14658 -0.0095095 0.011658 0.10204 -0.12792 -0.8443 -0.12181 -0.016801 -0.33279 -0.1552 -0.23131 -0.19181 -1.8823 -0.76746 0.099051 -0.42125 -0.19526 4.0071 -0.18594 -0.52287 -0.31681 0.00059213 0.0074449 0.17778 -0.15897 0.012041 -0.054223 -0.29871 -0.15749 -0.34758 -0.045637 -0.44251 0.18785 0.0027849 -0.18411 -0.11514 -0.78581
, 0.013441 0.23682 -0.16899 0.40951 0.63812 0.47709 -0.42852 -0.55641 -0.364 -0.23938 0.13001 -0.063734 -0.39575 -0.48162 0.23291 0.090201 -0.13324 0.078639 -0.41634 -0.15428 0.10068 0.48891 0.31226 -0.1252 -0.037512 -1.5179 0.12612 -0.02442 -0.042961 -0.28351 3.5416 -0.11956 -0.014533 -0.1499 0.21864 -0.33412 -0.13872 0.31806 0.70358 0.44858 -0.080262 0.63003 0.32111 -0.46765 0.22786 0.36034 -0.37818 -0.56657 0.044691 0.30392


We would like to initialize the embeddings from our model with the pre-trained Glove embeddings. After initializing we should "freeze" the embeddings at least initially. The rationale is that we first want the network to learn weights for the other parameters that were randomly initialize. After that phase we could finetune the embeddings to our task. 

`embed.weight.requires_grad = False` freezes the embedding parameters.

The following code initializes the embedding. Here `V` is the vocabulary size and `D` is the embedding size. `pretrained_weight` is a numpy matrix of shape `(V, D)`.

In [19]:
def loadGloveModel(gloveFile=PATH/"glove.6B.300d.txt"):
    """ Loads word vectors into a dictionary."""
    f = open(gloveFile,'r')
    word_vecs = {}
    for line in f:
        splitLine = line.split()
        word = splitLine[0]
        word_vecs[word] = np.array([float(val) for val in splitLine[1:]])
    return word_vecs

In [20]:
word_vecs = loadGloveModel()

In [21]:
print(len(word_vecs.keys()), len(data_vocab.keys()))

400000 21416


In [22]:
def delete_rare_words(word_vecs, data_vocab, min_df=2):
    """ Deletes rare words from data_vocab
    
    Deletes words from data_vocab if they are not in word_vecs
    and don't have at least min_df occurrencies in data_vocab.
    """
    words_delete = []
    for word in data_vocab:
        if data_vocab[word] < min_df and word not in word_vecs:
            words_delete.append(word)
    for word in words_delete: data_vocab.pop(word)
    return data_vocab

In [23]:
len(data_vocab.keys())

21416

In [24]:
# clean up issues here
data_vocab = delete_rare_words(word_vecs, data_vocab, min_df=2) 

In [25]:
len(data_vocab.keys())

18756

In [26]:
def create_embedding_matrix(word_vecs, data_vocab, min_df=2, D=300):
    """Creates embedding matrix from word vectors. """
    data_vocab = delete_rare_words(word_vecs, data_vocab, min_df)
    V = len(data_vocab.keys()) + 2
    vocab2index = {}
    W = np.zeros((V, D), dtype="float32")
    vocab = ["", "UNK"]
    # adding a vector for padding
    W[0] = np.zeros(D, dtype='float32')
    # adding a vector for rare words 
    W[1] = np.random.uniform(-0.25, 0.25, D)
    vocab2index["UNK"] = 1
    i = 2
    for word in data_vocab:
        if word in word_vecs:
            W[i] = word_vecs[word]
            vocab2index[word] = i
            vocab.append(word)
            i += 1
        else:
            W[i] = np.random.uniform(-0.25,0.25,D)
            vocab2index[word] = i
            vocab.append(word)
            i += 1   
    return W, np.array(vocab), vocab2index

In [27]:
pretrained_weight, vocab, vocab2index = create_embedding_matrix(word_vecs, data_vocab)

In [28]:
len(pretrained_weight) # note that index 0 is for padding

18758

In [29]:
D = 300
V = len(pretrained_weight)
emb = nn.Embedding(V, D)
emb.weight.data.copy_(torch.from_numpy(pretrained_weight))

tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [-0.0854,  0.2321, -0.0931,  ..., -0.0487,  0.0453,  0.0823],
        [ 0.1088, -0.2172, -0.5577,  ..., -0.2167, -0.0288,  0.1785],
        ...,
        [-0.1369, -0.0570, -0.1921,  ..., -0.2036, -0.4955, -0.2766],
        [-0.3170, -0.4958,  0.3020,  ..., -0.1990,  0.0607, -0.1257],
        [ 0.2028, -0.3397, -0.1055,  ...,  0.5841, -0.4893,  0.0245]])

Question: How many parameters do we have in this embedding matrix?

## Encoding training and validation sets

We will be using 1D Convolutional neural networks as our model. CNNs assume a fixed input size so we need to assume a fixed size and truncate or pad the sentences as needed. Let's find a good value to set our sequence length to.

In [30]:
x_len = np.array([len(x.split()) for x in X_train])

In [31]:
np.percentile(x_len, 95) # let set the max sequence len to N=40

43.0

In [32]:
X_train[0]

'will god let her fall or give her a new path ?'

In [33]:
# returns the index of the word or the index of "UNK" otherwise
vocab2index.get("will", vocab2index["UNK"])

12

In [34]:
np.array([vocab2index.get(w, vocab2index["UNK"]) for w in X_train[0].split()])

array([12,  3, 10, 11,  7,  9,  2, 11,  5,  6,  4,  8])

In [35]:
def encode_sentence(s, N=40):
    enc = np.zeros(N, dtype=np.int32)
    enc1 = np.array([vocab2index.get(w, vocab2index["UNK"]) for w in s.split()])
    l = min(N, len(enc1))
    enc[:l] = enc1[:l]
    return enc

In [36]:
encode_sentence(X_train[0])

array([12,  3, 10, 11,  7,  9,  2, 11,  5,  6,  4,  8,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0], dtype=int32)

In [37]:
x_train = np.vstack([encode_sentence(x) for x in X_train])
x_train.shape

(8000, 40)

In [38]:
x_val = np.vstack([encode_sentence(x) for x in X_val])
x_val.shape

(2000, 40)

## Playing and debugging CNN layers

In [39]:
V = len(pretrained_weight)
D = 300
N = 40

In [40]:
emb = nn.Embedding(V, D)
emb.weight.data.copy_(torch.from_numpy(pretrained_weight))

tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [-0.0854,  0.2321, -0.0931,  ..., -0.0487,  0.0453,  0.0823],
        [ 0.1088, -0.2172, -0.5577,  ..., -0.2167, -0.0288,  0.1785],
        ...,
        [-0.1369, -0.0570, -0.1921,  ..., -0.2036, -0.4955, -0.2766],
        [-0.3170, -0.4958,  0.3020,  ..., -0.1990,  0.0607, -0.1257],
        [ 0.2028, -0.3397, -0.1055,  ...,  0.5841, -0.4893,  0.0245]])

In [41]:
x = x_train[:2]
x.shape

(2, 40)

In [42]:
x = torch.LongTensor(x)
x

tensor([[12,  3, 10, 11,  7,  9,  2, 11,  5,  6,  4,  8,  0,  0,  0,  0,  0,  0,
          0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
          0,  0,  0,  0],
        [26, 32, 24, 19, 25, 20, 16, 27, 22, 13, 17, 23, 36, 14, 20, 33, 18, 21,
         30, 28, 15, 35, 29, 31, 34,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
          0,  0,  0,  0]])

In [43]:
x1 = emb(x)
x1.shape

torch.Size([2, 40, 300])

In [44]:
x1 = x1.transpose(1,2)  # needs to convert x to (batch, embedding_dim, sentence_len)
x1.size()

torch.Size([2, 300, 40])

In [45]:
conv_3 = nn.Conv1d(in_channels=D, out_channels=100, kernel_size=3)

In [46]:
x3 = conv_3(x1)

In [47]:
x3.size()

torch.Size([2, 100, 38])

In [48]:
conv_4 = nn.Conv1d(in_channels=D, out_channels=100, kernel_size=4)
conv_5 = nn.Conv1d(in_channels=D, out_channels=100, kernel_size=5)

In [49]:
x4 = conv_4(x1)
x5 = conv_5(x1)
print(x4.size(), x5.size())

torch.Size([2, 100, 37]) torch.Size([2, 100, 36])


Note that the convolution all apply to the same `x1`. How do we combine now the results of the convolutions? 

In [50]:
# 100 3-gram detectors
x3 = nn.ReLU()(x3)
x3 = nn.MaxPool1d(kernel_size = 38)(x3)
x3.size()

torch.Size([2, 100, 1])

In [51]:
# 100 4-gram detectors
x4 = nn.ReLU()(x4)
x4 = nn.MaxPool1d(kernel_size = 37)(x4)
x4.size()

torch.Size([2, 100, 1])

In [52]:
# 100 5-gram detectors
x5 = nn.ReLU()(x5)
x5 = nn.MaxPool1d(kernel_size = 36)(x5)
x5.size()

torch.Size([2, 100, 1])

In [53]:
# concatenate x3, x4, x5
out = torch.cat([x3, x4, x5], 2)
out.size()

torch.Size([2, 100, 3])

In [54]:
out = out.view(out.size(0), -1)
out.size()

torch.Size([2, 300])

After this we have a fully connected network. Let's write a network that implements this.

## 1D CNN model for sentence classification

Notation:
* V -- vocabulary size
* D -- embedding size
* N -- MAX Sentence length

In [55]:
class SentenceCNN(nn.Module):
    
    def __init__(self, V, D, glove_weights):
        super(SentenceCNN, self).__init__()
        self.glove_weights = glove_weights
        self.embedding = nn.Embedding(V, D, padding_idx=0)
        self.embedding.weight.data.copy_(torch.from_numpy(self.glove_weights))
        self.embedding.weight.requires_grad = False ## freeze embeddings

        self.conv_3 = nn.Conv1d(in_channels=D, out_channels=100, kernel_size=3)
        self.conv_4 = nn.Conv1d(in_channels=D, out_channels=100, kernel_size=4)
        self.conv_5 = nn.Conv1d(in_channels=D, out_channels=100, kernel_size=5)
        
        self.dropout = nn.Dropout(p=0.5)
        self.fc = nn.Linear(300, 1)
        
    def forward(self, x):
        x = self.embedding(x)
        x = x.transpose(1,2)
        x3 = F.relu(self.conv_3(x))
        x4 = F.relu(self.conv_4(x))
        x5 = F.relu(self.conv_5(x))
        x3 = nn.MaxPool1d(kernel_size = 38)(x3)
        x4 = nn.MaxPool1d(kernel_size = 37)(x4)
        x5 = nn.MaxPool1d(kernel_size = 36)(x5)
        out = torch.cat([x3, x4, x5], 2)
        out = out.view(out.size(0), -1)
        out = self.dropout(out)
        return self.fc(out)   

In [56]:
V = len(pretrained_weight)
D = 300
N = 40
model = SentenceCNN(V, D, glove_weights=pretrained_weight)

In [57]:
# testing the model
x = x_train[:10]
print(x.shape)
x = torch.LongTensor(x)

(10, 40)


In [58]:
y_hat = model(x)
y_hat.size()

torch.Size([10, 1])

## Training

Note that I am not bodering with mini-batches since our dataset is small.

In [59]:
model = SentenceCNN(V, D, glove_weights=pretrained_weight) #.cuda()

In [60]:
def val_metrics(m):
    model.eval()
    x = torch.LongTensor(x_val) #.cuda()
    y = torch.Tensor(y_val).unsqueeze(1) #).cuda()
    y_hat = m(x)
    loss = F.binary_cross_entropy_with_logits(y_hat, y)
    y_pred = y_hat > 0
    correct = (y_pred.float() == y).float().sum()
    accuracy = correct/y_pred.shape[0]
    return loss.item(), accuracy.item()

In [61]:
# accuracy of a random model should be around 0.5
val_metrics(model)

(0.7114821672439575, 0.5059999823570251)

In [62]:
# this filters parameters with p.requires_grad=True
parameters = filter(lambda p: p.requires_grad, model.parameters())
optimizer = torch.optim.Adam(parameters, lr=0.01)

In [63]:
def train_epocs(model, epochs=10, lr=0.01):
    parameters = filter(lambda p: p.requires_grad, model.parameters())
    optimizer = torch.optim.Adam(parameters, lr=lr)
    model.train()
    for i in range(epochs):
        model.train()
        x = torch.LongTensor(x_train)  #.cuda()
        y = torch.Tensor(y_train).unsqueeze(1)
        y_hat = model(x)
        loss = F.binary_cross_entropy_with_logits(y_hat, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        val_loss, accuracy = val_metrics(model)
        print("train loss %.3f test loss %.3f and accuracy %.3f" % 
              (loss.item(), val_loss, accuracy))

In [64]:
model = SentenceCNN(V, D, glove_weights=pretrained_weight)

In [65]:
train_epocs(model, epochs=10, lr=0.005)

train loss 0.713 test loss 1.289 and accuracy 0.506
train loss 1.299 test loss 0.449 and accuracy 0.839
train loss 0.442 test loss 0.793 and accuracy 0.543
train loss 0.727 test loss 0.661 and accuracy 0.606
train loss 0.599 test loss 0.448 and accuracy 0.792
train loss 0.406 test loss 0.385 and accuracy 0.864
train loss 0.361 test loss 0.401 and accuracy 0.842
train loss 0.389 test loss 0.424 and accuracy 0.816
train loss 0.417 test loss 0.426 and accuracy 0.817
train loss 0.418 test loss 0.407 and accuracy 0.840


In [66]:
# how to figure out the parameters
parameters = filter(lambda p: p.requires_grad, model.parameters())
print([p.size() for p in parameters])

[torch.Size([100, 300, 3]), torch.Size([100]), torch.Size([100, 300, 4]), torch.Size([100]), torch.Size([100, 300, 5]), torch.Size([100]), torch.Size([1, 300]), torch.Size([1])]


### Unfreezing the embeddings

In [67]:
# unfreezing the embeddings
model.embedding.weight.requires_grad = True

In [68]:
parameters = filter(lambda p: p.requires_grad, model.parameters())
print([p.size() for p in parameters])

[torch.Size([18758, 300]), torch.Size([100, 300, 3]), torch.Size([100]), torch.Size([100, 300, 4]), torch.Size([100]), torch.Size([100, 300, 5]), torch.Size([100]), torch.Size([1, 300]), torch.Size([1])]


In [69]:
train_epocs(model, epochs=10, lr=0.001)

train loss 0.396 test loss 0.369 and accuracy 0.867
train loss 0.343 test loss 0.361 and accuracy 0.863
train loss 0.320 test loss 0.364 and accuracy 0.853
train loss 0.314 test loss 0.366 and accuracy 0.850
train loss 0.307 test loss 0.358 and accuracy 0.852
train loss 0.294 test loss 0.341 and accuracy 0.854
train loss 0.274 test loss 0.321 and accuracy 0.866
train loss 0.255 test loss 0.304 and accuracy 0.875
train loss 0.240 test loss 0.291 and accuracy 0.878
train loss 0.227 test loss 0.284 and accuracy 0.882


In [70]:
train_epocs(model, epochs=10, lr=0.001)

train loss 0.218 test loss 0.298 and accuracy 0.874
train loss 0.213 test loss 0.287 and accuracy 0.881
train loss 0.201 test loss 0.271 and accuracy 0.883
train loss 0.187 test loss 0.263 and accuracy 0.887
train loss 0.179 test loss 0.257 and accuracy 0.893
train loss 0.167 test loss 0.253 and accuracy 0.895
train loss 0.160 test loss 0.252 and accuracy 0.891
train loss 0.147 test loss 0.253 and accuracy 0.888
train loss 0.141 test loss 0.251 and accuracy 0.891
train loss 0.133 test loss 0.246 and accuracy 0.896


## Whithout pretrain emmbeddings

In [77]:
class SentenceCNN2(nn.Module):
    
    def __init__(self, V, D):
        super(SentenceCNN2, self).__init__()
        self.embedding = nn.Embedding(V, D, padding_idx=0)

        self.conv_3 = nn.Conv1d(in_channels=D, out_channels=100, kernel_size=3)
        self.conv_4 = nn.Conv1d(in_channels=D, out_channels=100, kernel_size=4)
        self.conv_5 = nn.Conv1d(in_channels=D, out_channels=100, kernel_size=5)
        
        self.dropout = nn.Dropout(p=0.2)
        self.fc = nn.Linear(300, 1)
        
    def forward(self, x):
        x = self.embedding(x)
        x = x.transpose(1,2)
        x3 = F.relu(self.conv_3(x))
        x4 = F.relu(self.conv_4(x))
        x5 = F.relu(self.conv_5(x))
        x3 = nn.MaxPool1d(kernel_size = 38)(x3)
        x4 = nn.MaxPool1d(kernel_size = 37)(x4)
        x5 = nn.MaxPool1d(kernel_size = 36)(x5)
        out = torch.cat([x3, x4, x5], 2)
        out = out.view(out.size(0), -1)
        out = self.dropout(out)
        return self.fc(out)   

In [78]:
V = len(pretrained_weight)
model = SentenceCNN2(V, D=100)

In [79]:
train_epocs(model, epochs=10, lr=0.01)

train loss 0.708 test loss 2.417 and accuracy 0.506
train loss 2.336 test loss 0.589 and accuracy 0.679
train loss 0.573 test loss 1.059 and accuracy 0.531
train loss 1.043 test loss 0.563 and accuracy 0.701
train loss 0.537 test loss 0.603 and accuracy 0.674
train loss 0.513 test loss 0.748 and accuracy 0.612
train loss 0.627 test loss 0.538 and accuracy 0.725
train loss 0.436 test loss 0.432 and accuracy 0.806
train loss 0.359 test loss 0.510 and accuracy 0.762
train loss 0.440 test loss 0.492 and accuracy 0.768


In [80]:
train_epocs(model, epochs=10, lr=0.001)

train loss 0.411 test loss 0.415 and accuracy 0.815
train loss 0.322 test loss 0.403 and accuracy 0.823
train loss 0.295 test loss 0.427 and accuracy 0.799
train loss 0.305 test loss 0.448 and accuracy 0.783
train loss 0.316 test loss 0.447 and accuracy 0.786
train loss 0.310 test loss 0.429 and accuracy 0.798
train loss 0.293 test loss 0.407 and accuracy 0.813
train loss 0.277 test loss 0.389 and accuracy 0.828
train loss 0.263 test loss 0.379 and accuracy 0.831
train loss 0.257 test loss 0.376 and accuracy 0.835


In [81]:
train_epocs(model, epochs=10, lr=0.001)

train loss 0.255 test loss 0.389 and accuracy 0.822
train loss 0.251 test loss 0.380 and accuracy 0.831
train loss 0.240 test loss 0.368 and accuracy 0.835
train loss 0.230 test loss 0.363 and accuracy 0.841
train loss 0.223 test loss 0.360 and accuracy 0.841
train loss 0.219 test loss 0.357 and accuracy 0.841
train loss 0.208 test loss 0.355 and accuracy 0.840
train loss 0.197 test loss 0.355 and accuracy 0.841
train loss 0.191 test loss 0.354 and accuracy 0.843
train loss 0.187 test loss 0.351 and accuracy 0.846


In [82]:
train_epocs(model, epochs=10, lr=0.001)

train loss 0.181 test loss 0.348 and accuracy 0.851
train loss 0.185 test loss 0.342 and accuracy 0.849
train loss 0.168 test loss 0.347 and accuracy 0.849
train loss 0.164 test loss 0.349 and accuracy 0.845
train loss 0.159 test loss 0.343 and accuracy 0.850
train loss 0.152 test loss 0.337 and accuracy 0.857
train loss 0.144 test loss 0.334 and accuracy 0.854
train loss 0.141 test loss 0.332 and accuracy 0.855
train loss 0.135 test loss 0.331 and accuracy 0.856
train loss 0.129 test loss 0.329 and accuracy 0.858


In [None]:
train_epocs(model, epochs=10, lr=0.001)

train loss 0.124 test loss 0.339 and accuracy 0.855
train loss 0.121 test loss 0.328 and accuracy 0.862
train loss 0.113 test loss 0.327 and accuracy 0.859
train loss 0.107 test loss 0.326 and accuracy 0.862
train loss 0.104 test loss 0.325 and accuracy 0.865
train loss 0.096 test loss 0.327 and accuracy 0.862


## Lab 

* Improve tokenization
* Use fasttext instead of globe model. (https://fasttext.cc/docs/en/english-vectors.html)

   `! pip install git+https://github.com/facebookresearch/fastText.git`
* Extend this code to do cross-validation. Look at https://github.com/yoonkim/CNN_sentence/blob/master/process_data.py for an example on how to do it.

## References

The CNN is adapted from here https://github.com/junwang4/CNN-sentence-classification-pytorch-2017/blob/master/cnn_pytorch.py.
Code for the original paper can be found here https://github.com/yoonkim/CNN_sentence/blob/master/process_data.py.