**Load pre-trained Word2Vec**

In [47]:
from gensim.models import KeyedVectors
from gensim.models import Word2Vec


#w2v_model = Word2Vec.load('trained_w2v.bin')
w2v_model = KeyedVectors.load('trained_w2v.bin').wv

In [48]:
"""
from gensim.models import KeyedVectors
filename = 'GoogleNews-vectors-negative300.bin'
w2v_model = KeyedVectors.load_word2vec_format(filename, binary=True)
"""

"\nfrom gensim.models import KeyedVectors\nfilename = 'GoogleNews-vectors-negative300.bin'\nw2v_model = KeyedVectors.load_word2vec_format(filename, binary=True)\n"

In [49]:
print(w2v_model.wv.vectors)

[[-4.2674062e-01  1.0696528e-01 -1.4644409e+00 ...  4.9810160e-02
   2.0994665e-01  3.0871007e-01]
 [-1.2648560e+00  2.7750504e-01 -1.3830882e+00 ... -2.6240353e-02
   5.7821620e-01  3.2545090e-01]
 [-2.6542953e-01 -3.4512815e-01 -4.5403066e-01 ...  4.4313240e-01
   3.5355222e-01 -2.0003715e-01]
 ...
 [-4.9631060e-03 -1.6257972e-03 -1.8345876e-02 ...  2.1563626e-03
  -6.7100041e-03  2.7423985e-03]
 [-5.2258233e-03 -1.2848869e-03 -1.5606534e-02 ... -1.0143375e-03
  -1.7633450e-03  1.0247926e-03]
 [-3.0858757e-03 -2.2082239e-04 -1.2977008e-02 ...  2.3611996e-03
  -3.9495192e-03  2.3848482e-03]]


  print(w2v_model.wv.vectors)


---
### Load in and visualize the data

In [50]:
import numpy as np

# read data from text files
with open('data/tweets_full.txt', 'r') as f:
    reviews = f.read()
with open('data/tweets_full_target.txt', 'r') as f:
    labels = f.read()

In [51]:
# read data from text files
with open('data/tweets_authors.txt', 'r') as f:
    authors = f.read()

In [52]:
print(reviews[:1000])
print()
print(labels[:20])
print()
print(authors[:20])

@tiffanylue i know  i was listenin to bad habit earlier and i started freakin at his part =[ 
Layin n bed with a headache  ughhhh...waitin on your call... 
Funeral ceremony...gloomy friday... 
wants to hang out with friends SOON! 
@dannycastillo We want to trade with someone who has Houston tickets, but no one will. 
Re-pinging @ghostridah14: why didn't you go to prom? BC my bf didn't like my friends 
I should be sleep, but im not! thinking about an old friend who I want. but he's married now. damn, &amp; he wants me 2! scandalous! 
Hmmm. http://www.djhero.com/ is down 
@charviray Charlene my love. I miss you 
@kelcouch I'm sorry  at least it's Friday? 
cant fall asleep 
Choked on her retainers 
Ugh! I have to beat this stupid song to get to the next  rude! 
@BrodyJenner if u watch the hills in london u will realise what tourture it is because were weeks and weeks late  i just watch itonlinelol 
Got the news 
The storm is here and the electricity is gone 
@annarosekerr agreed 
So sleep

## Data pre-processing

The first step when building a neural network model is getting your data into the proper form to feed into the network. Since we're using embedding layers, we'll need to encode each word with an integer. We'll also want to clean it up a bit.

You can see an example of the reviews data above. Here are the processing steps, we'll want to take:
>* We'll want to get rid of periods and extraneous punctuation.
* Also, you might notice that the reviews are delimited with newline characters `\n`. To deal with those, I'm going to split the text into each review using `\n` as the delimiter. 
* Then I can combined all the reviews back together into one big string.

First, let's remove all punctuation. Then get all the text without the newlines and split it into individual words.

In [53]:
from string import punctuation

# get rid of punctuation
reviews = reviews.lower() # lowercase, standardize
all_text = ''.join([c for c in reviews if c not in punctuation])

# split by new lines and spaces
reviews_split = all_text.split('\n')
all_text = ' '.join(reviews_split)

# create a list of words
words = all_text.split()

In [54]:
words[:39]

['tiffanylue',
 'i',
 'know',
 'i',
 'was',
 'listenin',
 'to',
 'bad',
 'habit',
 'earlier',
 'and',
 'i',
 'started',
 'freakin',
 'at',
 'his',
 'part',
 'layin',
 'n',
 'bed',
 'with',
 'a',
 'headache',
 'ughhhhwaitin',
 'on',
 'your',
 'call',
 'funeral',
 'ceremonygloomy',
 'friday',
 'wants',
 'to',
 'hang',
 'out',
 'with',
 'friends',
 'soon',
 'dannycastillo',
 'we']

### Encoding the words

The embedding lookup requires that we pass in integers to our network. The easiest way to do this is to create dictionaries that map the words in the vocabulary to integers. Then we can convert each of our reviews into integers so they can be passed into the network.

> **Exercise:** Now you're going to encode the words with integers. Build a dictionary that maps words to integers. Later we're going to pad our input vectors with zeros, so make sure the integers **start at 1, not 0**.
> Also, convert the reviews to integers and store the reviews in a new list called `reviews_ints`. 

In [55]:
# feel free to use this import 
from collections import Counter

## Build a dictionary that maps words to integers
counts = Counter(words)
vocab = sorted(counts, key=counts.get, reverse=True)
vocab_to_int = {word: ii for ii, word in enumerate(vocab, 1)}
vocab_to_int["none"] = 0
int_to_vocab = {v: k for k, v in vocab_to_int.items()}

## use the dict to tokenize each review in reviews_split
## store the tokenized reviews in reviews_ints
reviews_ints = []
for review in reviews_split:
    reviews_ints.append([vocab_to_int[word] for word in review.split()])

**Test your code**

As a text that you've implemented the dictionary correctly, print out the number of unique words in your vocabulary and the contents of the first, tokenized review.

In [56]:
# stats about vocabulary
print('Unique words: ', len((vocab_to_int)))  # should ~ 74000+
print()

# print tokens in first review
print('Tokenized review: \n', reviews_ints[:1])

Unique words:  53747

Tokenized review: 
 [[14939, 1, 54, 1, 26, 3127, 2, 118, 5082, 802, 6, 1, 578, 1104, 25, 171, 519]]


### Encoding the labels

Our labels are "positive" or "negative". To use these labels in our network, we need to convert them to 0 and 1.


In [57]:
# 1=positive, 0=negative label conversion
labels_split = labels.split('\n')
labels_split = [label.strip() for label in labels_split]

word_2_int = {'neutral': 1, 'worry':0, 'happiness':0, 'sadness':0, 'love':1, 'surprise':1,
              'fun':1, 'relief':1, 'hate':0, 'empty':0, 
             'enthusiasm':1, 'boredom': 0, 'anger':0}

encoded_labels = np.array([0 if len(label) == 0 else word_2_int[label] for label in labels_split])



In [58]:
print(encoded_labels)

[0 0 0 ... 0 1 0]


### Encode the authors of each tweet

**Remove punctuation**

In [59]:
# get rid of punctuation
authors = authors.lower() # lowercase, standardize
all_authors_text = ''.join([c for c in authors if c not in punctuation])

# split by new lines and spaces
authors_split = all_authors_text.split('\n')
all_authors_text = ' '.join(authors_split)

# create a list of words
all_authors = all_authors_text.split()

**Encoding the authors**

In [60]:
# feel free to use this import 
from collections import Counter

## Build a dictionary that maps words to integers
counts_auth = Counter(all_authors)
vocab_auth = sorted(counts_auth, key=counts_auth.get, reverse=True)
vocab_to_int_auth = {word: ii for ii, word in enumerate(vocab_auth, 1)}

## use the dict to tokenize each review in reviews_split
## store the tokenized reviews in reviews_ints
authors_split = [author.strip() for author in authors_split][:-1]
authors_ints = [[vocab_to_int_auth[word] for word in authors_split]]

authors_ints = np.array(authors_ints).squeeze()

### Removing Outliers

As an additional pre-processing step, we want to make sure that our reviews are in good shape for standard processing. That is, our network will expect a standard input text size, and so, we'll want to shape our reviews into a specific length. We'll approach this task in two main steps:

1. Getting rid of extremely long or short reviews; the outliers
2. Padding/truncating the remaining data so that we have reviews of the same length.

Before we pad our review text, we should check for reviews of extremely short or long lengths; outliers that may mess with our training.

In [61]:
# outlier review stats
review_lens = Counter([len(x) for x in reviews_ints])
print("Zero-length reviews: {}".format(review_lens[0]))
print("Maximum review length: {}".format(max(review_lens)))

Zero-length reviews: 1
Maximum review length: 33


Okay, a couple issues here. We seem to have one review with zero length. And, the maximum review length is way too many steps for our RNN. We'll have to remove any super short reviews and truncate super long reviews. This removes outliers and should allow our model to train more efficiently.

> **Exercise:** First, remove *any* reviews with zero length from the `reviews_ints` list and their corresponding label in `encoded_labels`.

In [62]:
print('Number of reviews before removing outliers: ', len(reviews_ints))

## remove any reviews/labels with zero length from the reviews_ints list.

# get indices of any reviews with length 0
non_zero_idx = [ii for ii, review in enumerate(reviews_ints) if len(review) != 0]

# remove 0-length reviews and their labels
reviews_ints = [reviews_ints[ii] for ii in non_zero_idx]
encoded_labels = np.array([encoded_labels[ii] for ii in non_zero_idx])

print('Number of reviews after removing outliers: ', len(reviews_ints))

Number of reviews before removing outliers:  40001
Number of reviews after removing outliers:  40000


---
## Padding sequences

To deal with both short and very long reviews, we'll pad or truncate all our reviews to a specific length. For reviews shorter than some `seq_length`, we'll pad with 0s. For reviews longer than `seq_length`, we can truncate them to the first `seq_length` words. A good `seq_length`, in this case, is 200.

> **Exercise:** Define a function that returns an array `features` that contains the padded data, of a standard size, that we'll pass to the network. 
* The data should come from `review_ints`, since we want to feed integers to the network. 
* Each row should be `seq_length` elements long. 
* For reviews shorter than `seq_length` words, **left pad** with 0s. That is, if the review is `['best', 'movie', 'ever']`, `[117, 18, 128]` as integers, the row will look like `[0, 0, 0, ..., 0, 117, 18, 128]`. 
* For reviews longer than `seq_length`, use only the first `seq_length` words as the feature vector.

As a small example, if the `seq_length=10` and an input review is: 
```
[117, 18, 128]
```
The resultant, padded sequence should be: 

```
[0, 0, 0, 0, 0, 0, 0, 117, 18, 128]
```

**Your final `features` array should be a 2D array, with as many rows as there are reviews, and as many columns as the specified `seq_length`.**

This isn't trivial and there are a bunch of ways to do this. But, if you're going to be building your own deep learning networks, you're going to have to get used to preparing your data.

In [63]:
def pad_features(reviews_ints, seq_length):
    ''' Return features of review_ints, where each review is padded with 0's 
        or truncated to the input seq_length.
    '''
    
    # getting the correct rows x cols shape
    features = np.zeros((len(reviews_ints), seq_length), dtype=int)

    # for each review, I grab that review and 
    for i, row in enumerate(reviews_ints):
        features[i, -len(row):] = np.array(row)[:seq_length]
    
    return features

In [64]:
# Test your implementation!

seq_length = 30

features = pad_features(reviews_ints, seq_length=seq_length)

## test statements - do not change - ##
assert len(features)==len(reviews_ints), "Your features should have as many rows as reviews."
assert len(features[0])==seq_length, "Each feature row should contain seq_length values."

# print first 10 values of the first 30 batches 
print(features[:30,:10])

[[    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     1   134    23   131    19    13]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0  9423    70    57   207     3]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0

## Training, Validation, Test

With our data in nice shape, we'll split it into training, validation, and test sets.

> **Exercise:** Create the training, validation, and test sets. 
* You'll need to create sets for the features and the labels, `train_x` and `train_y`, for example. 
* Define a split fraction, `split_frac` as the fraction of data to **keep** in the training set. Usually this is set to 0.8 or 0.9. 
* Whatever data is left will be split in half to create the validation and *testing* data.

In [65]:
split_frac = 0.8

## split data into training, validation, and test data (features and labels, x and y)

split_idx = int(len(features)*split_frac)
train_x, remaining_x = features[:split_idx], features[split_idx:]
train_y, remaining_y = encoded_labels[:split_idx], encoded_labels[split_idx:]

test_idx = int(len(remaining_x)*0.5)
val_x, test_x = remaining_x[:test_idx], remaining_x[test_idx:]
val_y, test_y = remaining_y[:test_idx], remaining_y[test_idx:]

## print out the shapes of your resultant feature data
print("\t\t\tFeature Shapes:")
print("Train set: \t\t{}".format(train_x.shape), 
      "\nValidation set: \t{}".format(val_x.shape),
      "\nTest set: \t\t{}".format(test_x.shape))


			Feature Shapes:
Train set: 		(32000, 30) 
Validation set: 	(4000, 30) 
Test set: 		(4000, 30)


**Split the authors accordingly**

In [66]:
authors_x, remaining_authors = authors_ints[:split_idx], authors_ints[split_idx:]

authors_val, authors_test = remaining_authors[:test_idx], remaining_authors[test_idx:]

print("\t\t\tFeature Shapes:")
print("Train set: \t\t{}".format(authors_x.shape), 
      "\nValidation set: \t{}".format(authors_val.shape),
      "\nTest set: \t\t{}".format(authors_test.shape))

			Feature Shapes:
Train set: 		(32000,) 
Validation set: 	(4000,) 
Test set: 		(4000,)


**Check your work**

With train, validation, and test fractions equal to 0.8, 0.1, 0.1, respectively, the final, feature data shapes should look like:
```
                    Feature Shapes:
Train set: 		 (train_size, word_length) 
Validation set: 	(val_size, word_length) 
Test set: 		  (test_size, word_length)
```

---
## DataLoaders and Batching

After creating training, test, and validation data, we can create DataLoaders for this data by following two steps:
1. Create a known format for accessing our data, using [TensorDataset](https://pytorch.org/docs/stable/data.html#) which takes in an input set of data and a target set of data with the same first dimension, and creates a dataset.
2. Create DataLoaders and batch our training, validation, and test Tensor datasets.

```
train_data = TensorDataset(torch.from_numpy(train_x), torch.from_numpy(train_y))
train_loader = DataLoader(train_data, batch_size=batch_size)
```

This is an alternative to creating a generator function for batching our data into full batches.

In [67]:
import torch
from torch.utils.data import TensorDataset, DataLoader

# create Tensor datasets
train_data = TensorDataset(torch.from_numpy(train_x), torch.from_numpy(train_y), torch.from_numpy(authors_x))
valid_data = TensorDataset(torch.from_numpy(val_x), torch.from_numpy(val_y), torch.from_numpy(authors_val))
test_data = TensorDataset(torch.from_numpy(test_x), torch.from_numpy(test_y), torch.from_numpy(authors_test))

# dataloaders
batch_size = 50

# make sure the SHUFFLE your training data
train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
valid_loader = DataLoader(valid_data, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_data, shuffle=True, batch_size=batch_size)

In [68]:
# obtain one batch of training data
dataiter = iter(train_loader)
sample_x, sample_y, author = dataiter.next()

print('Sample input size: ', sample_x.size()) # batch_size, seq_length
print('Sample input: \n', sample_x)
print()
print('Sample label size: ', sample_y.size()) # batch_size
print('Sample label: \n', sample_y)
print()
print('Sample label size: ', author.size()) # batch_size
print('Sample label: \n', author)

Sample input size:  torch.Size([50, 30])
Sample input: 
 tensor([[   0,    0,    0,  ...,  244,    2,  156],
        [   0,    0,    0,  ...,  277,   84,  128],
        [   0,    0,    0,  ...,  184,  120, 6038],
        ...,
        [   0,    0,    0,  ...,    4,  767, 3040],
        [   0,    0,    0,  ...,  109,   38,  236],
        [   0,    0,    0,  ...,   25, 8518,  996]])

Sample label size:  torch.Size([50])
Sample label: 
 tensor([1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1,
        1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0,
        0, 0])

Sample label size:  torch.Size([50])
Sample label: 
 tensor([ 5568, 16433, 13927,  1439, 11370, 20829, 10476,  4219, 11275, 21327,
         7173, 15374,  1718, 16474, 24823, 22837, 21232, 25705, 24003,    17,
         5310, 26467, 22195,  6255, 27811, 27058, 25563, 27175,  4853,  8464,
         7207,  1854, 11261,  2850,  6839, 15946, 11950, 23370,  7906, 16136,
        10150, 13698, 

---
# Sentiment Network with PyTorch

Below is where you'll define the network.

<img src="assets/network_diagram.png" width=40%>

The layers are as follows:
1. An [embedding layer](https://pytorch.org/docs/stable/nn.html#embedding) that converts our word tokens (integers) into embeddings of a specific size.
2. An [LSTM layer](https://pytorch.org/docs/stable/nn.html#lstm) defined by a hidden_state size and number of layers
3. A fully-connected output layer that maps the LSTM layer outputs to a desired output_size
4. A sigmoid activation layer which turns all outputs into a value 0-1; return **only the last sigmoid output** as the output of this network.

### The Embedding Layer

We need to add an [embedding layer](https://pytorch.org/docs/stable/nn.html#embedding) because there are 74000+ words in our vocabulary. It is massively inefficient to one-hot encode that many classes. So, instead of one-hot encoding, we can have an embedding layer and use that layer as a lookup table. You could train an embedding layer using Word2Vec, then load it here. But, it's fine to just make a new layer, using it for only dimensionality reduction, and let the network learn the weights.


### The LSTM Layer(s)

We'll create an [LSTM](https://pytorch.org/docs/stable/nn.html#lstm) to use in our recurrent network, which takes in an input_size, a hidden_dim, a number of layers, a dropout probability (for dropout between multiple layers), and a batch_first parameter.

Most of the time, you're network will have better performance with more layers; between 2-3. Adding more layers allows the network to learn really complex relationships. 

> **Exercise:** Complete the `__init__`, `forward`, and `init_hidden` functions for the SentimentRNN model class.

Note: `init_hidden` should initialize the hidden and cell state of an lstm layer to all zeros, and move those state to GPU, if available.

In [69]:
# First checking if GPU is available
train_on_gpu=torch.cuda.is_available()

if(train_on_gpu):
    print('Training on GPU.')
else:
    print('No GPU available, training on CPU.')

Training on GPU.


In [70]:

def int_2_word2vec(tensor, embed_size = 300):
    #1. Convert ints to words
    tensor = tensor.cpu().numpy()
    new_tensor_array = np.zeros(tensor.shape, dtype=object)
    
    for i, row in enumerate(tensor):
        for j, word in enumerate(row):
            new_tensor_array[i,j] = int_to_vocab[word]
            
    batch_size, length = tensor.shape
            
    #2. Pass words to wor2vec
    new_tensor = np.zeros((batch_size, length, embed_size), dtype=float)

    for i, batch in enumerate(new_tensor):
        for j, row in enumerate(batch):
            try:
                new_tensor[i,j] = w2v_model.wv[new_tensor_array[i,j]]
            except:
                new_tensor[i,j] = w2v_model.wv["none"]
            
    if train_on_gpu:
            return torch.tensor(new_tensor, dtype=torch.float64).cuda()

    else:
            return torch.tensor(new_tensor, dtype=torch.float64).cpu()

    


In [71]:
import torch.nn as nn

class SentimentRNN(nn.Module):
    """
    The RNN model that will be used to perform Sentiment analysis.
    """

    def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, drop_prob=0.5):
        """
        Initialize the model by setting up the layers.
        """
        super(SentimentRNN, self).__init__()

        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim
        
        # embedding and LSTM layers
        self.embedding = nn.Embedding(vocab_size, embedding_dim) #.from_pretrained(torch.tensor(w2v_model.vectors))
        self.embedding_author = nn.Embedding(len(vocab_to_int_auth)+1, hidden_dim) # make it same dimension as h_t
            

        self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers, 
                            dropout=drop_prob, batch_first=True)
        
        # dropout layer
        self.dropout = nn.Dropout(0.3)
        
        # linear and sigmoid layers
        self.fc = nn.Linear(hidden_dim, output_size)
        self.sig = nn.Sigmoid()
        

    def forward(self, x, hidden, author):
        """
        Perform a forward pass of our model on some input and hidden state.
        """
        # embeddings and lstm_out
        x = x.long()
        
        # Embedding for tweet and auhors
        embeds = self.embedding(x)
        embeds_auth = self.embedding_author(author)
        
        # Lstm output
        lstm_out, hidden = self.lstm(embeds, hidden) # shape (batch_size, word_length, embedding_size)
        
        # Get the vector for last batch shape = (batch_size, embedding_size)
        lstm_out_last_batch = lstm_out[:, -1] # We are only getting the last word representation for each batch
        
        # ADD embedding and embedding_author tensors.
        fc_input = lstm_out_last_batch.add(embeds_auth)
                
        # dropout and fully-connected layer
        out = self.dropout(fc_input)
        out = self.fc(out)
        
        # sigmoid function
        sig_out = self.sig(out)
        # get last batch of labels
        sig_out = sig_out[:, -1].squeeze()
 
        # return last sigmoid output and hidden state
        return sig_out, hidden
    
    
    def init_hidden(self, batch_size):
        ''' Initializes hidden state '''
        # Create two new tensors with sizes n_layers x batch_size x hidden_dim,
        # initialized to zero, for hidden state and cell state of LSTM
        weight = next(self.parameters()).data
        
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
        
        return hidden
        

## Instantiate the network

Here, we'll instantiate the network. First up, defining the hyperparameters.

* `vocab_size`: Size of our vocabulary or the range of values for our input, word tokens.
* `output_size`: Size of our desired output; the number of class scores we want to output (pos/neg).
* `embedding_dim`: Number of columns in the embedding lookup table; size of our embeddings.
* `hidden_dim`: Number of units in the hidden layers of our LSTM cells. Usually larger is better performance wise. Common values are 128, 256, 512, etc.
* `n_layers`: Number of LSTM layers in the network. Typically between 1-3

> **Exercise:** Define the model  hyperparameters.


In [72]:
# Instantiate the model w/ hyperparams
vocab_size = len(vocab_to_int)+1 # +1 for the 0 padding + our word tokens
output_size = 1
embedding_dim = 300
hidden_dim = 256
n_layers = 2

net = SentimentRNN(vocab_size, output_size, embedding_dim, hidden_dim, n_layers)

print(net)

SentimentRNN(
  (embedding): Embedding(53748, 300)
  (embedding_author): Embedding(33863, 256)
  (lstm): LSTM(300, 256, num_layers=2, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.3, inplace=False)
  (fc): Linear(in_features=256, out_features=1, bias=True)
  (sig): Sigmoid()
)


---
## Training

Below is the typical training code. If you want to do this yourself, feel free to delete all this code and implement it yourself. You can also add code to save a model by name.

>We'll also be using a new kind of cross entropy loss, which is designed to work with a single Sigmoid output. [BCELoss](https://pytorch.org/docs/stable/nn.html#bceloss), or **Binary Cross Entropy Loss**, applies cross entropy loss to a single value between 0 and 1.

We also have some data and training hyparameters:

* `lr`: Learning rate for our optimizer.
* `epochs`: Number of times to iterate through the training dataset.
* `clip`: The maximum gradient value to clip at (to prevent exploding gradients).

In [73]:
# loss and optimization functions
lr=0.000001

criterion = nn.BCELoss()
optimizer = torch.optim.Adam(net.parameters(), lr=lr)


In [74]:
# training params
from tqdm import tqdm


epochs = 10 # 3-4 is approx where I noticed the validation loss stop decreasing

counter = 0
print_every = 100
clip=5 # gradient clipping
net.double()
# move model to GPU, if available
if(train_on_gpu):
    net.cuda()

net.train()
# train for some number of epochs
for e in tqdm(range(epochs)):
    # initialize hidden state
    h = net.init_hidden(batch_size)

    # batch loop
    for inputs, labels, authors in train_loader:
        counter += 1

        if(train_on_gpu):
            inputs, labels, authors = inputs.cuda(), labels.cuda(), authors.cuda()

        # Creating new variables for the hidden state, otherwise
        # we'd backprop through the entire training history
        h = tuple([each.data for each in h])

        # zero accumulated gradients
        net.zero_grad()

        # get the output from the model
        output, h = net(inputs, h, authors)

        # calculate the loss and perform backprop
        loss = criterion(output.squeeze(), labels.double())
        loss.backward()
        # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
        nn.utils.clip_grad_norm_(net.parameters(), clip)
        optimizer.step()

        # loss stats
        if counter % print_every == 0:
            # Get validation loss
            val_h = net.init_hidden(batch_size)
            val_losses = []
            net.eval()
            for inputs, labels, authors in valid_loader:

                # Creating new variables for the hidden state, otherwise
                # we'd backprop through the entire training history
                val_h = tuple([each.data for each in val_h])

                if(train_on_gpu):
                    inputs, labels, authors = inputs.cuda(), labels.cuda(), authors.cuda()

                output, val_h = net(inputs, val_h, authors)
                val_loss = criterion(output.squeeze(), labels.double())

                val_losses.append(val_loss.item())

            net.train()
            print("Epoch: {}/{}...".format(e+1, epochs),
                  "Step: {}...".format(counter),
                  "Loss: {:.6f}...".format(loss.item()),
                  "Val Loss: {:.6f}".format(np.mean(val_losses)))

  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1/10... Step: 100... Loss: 0.738341... Val Loss: 0.723889
Epoch: 1/10... Step: 200... Loss: 0.742876... Val Loss: 0.724146
Epoch: 1/10... Step: 300... Loss: 0.783227... Val Loss: 0.724358
Epoch: 1/10... Step: 400... Loss: 0.821488... Val Loss: 0.724610
Epoch: 1/10... Step: 500... Loss: 0.762960... Val Loss: 0.724840
Epoch: 1/10... Step: 600... Loss: 0.769826... Val Loss: 0.725080


 10%|█         | 1/10 [04:22<39:19, 262.17s/it]

Epoch: 2/10... Step: 700... Loss: 0.659354... Val Loss: 0.725267
Epoch: 2/10... Step: 800... Loss: 0.721499... Val Loss: 0.725511
Epoch: 2/10... Step: 900... Loss: 0.659714... Val Loss: 0.725727
Epoch: 2/10... Step: 1000... Loss: 0.709184... Val Loss: 0.725946
Epoch: 2/10... Step: 1100... Loss: 0.842020... Val Loss: 0.726175
Epoch: 2/10... Step: 1200... Loss: 0.846518... Val Loss: 0.726380


 20%|██        | 2/10 [08:43<34:55, 261.93s/it]

Epoch: 3/10... Step: 1300... Loss: 0.696207... Val Loss: 0.726653
Epoch: 3/10... Step: 1400... Loss: 0.800339... Val Loss: 0.726826
Epoch: 3/10... Step: 1500... Loss: 0.720274... Val Loss: 0.727138
Epoch: 3/10... Step: 1600... Loss: 0.678574... Val Loss: 0.727438
Epoch: 3/10... Step: 1700... Loss: 0.769911... Val Loss: 0.727512
Epoch: 3/10... Step: 1800... Loss: 0.837292... Val Loss: 0.727793
Epoch: 3/10... Step: 1900... Loss: 0.723575... Val Loss: 0.728066


 30%|███       | 3/10 [13:11<30:45, 263.63s/it]

Epoch: 4/10... Step: 2000... Loss: 0.679692... Val Loss: 0.728257
Epoch: 4/10... Step: 2100... Loss: 0.663667... Val Loss: 0.728495
Epoch: 4/10... Step: 2200... Loss: 0.825859... Val Loss: 0.728723
Epoch: 4/10... Step: 2300... Loss: 0.694486... Val Loss: 0.728972
Epoch: 4/10... Step: 2400... Loss: 0.770328... Val Loss: 0.729206
Epoch: 4/10... Step: 2500... Loss: 0.712810... Val Loss: 0.729421


 40%|████      | 4/10 [17:32<26:17, 262.89s/it]

Epoch: 5/10... Step: 2600... Loss: 0.760062... Val Loss: 0.729641
Epoch: 5/10... Step: 2700... Loss: 0.692623... Val Loss: 0.729824
Epoch: 5/10... Step: 2800... Loss: 0.730543... Val Loss: 0.730059
Epoch: 5/10... Step: 2900... Loss: 0.740862... Val Loss: 0.730217
Epoch: 5/10... Step: 3000... Loss: 0.697437... Val Loss: 0.730531
Epoch: 5/10... Step: 3100... Loss: 0.749698... Val Loss: 0.730792


 50%|█████     | 5/10 [21:59<22:00, 264.15s/it]

Epoch: 5/10... Step: 3200... Loss: 0.716705... Val Loss: 0.731104
Epoch: 6/10... Step: 3300... Loss: 0.797482... Val Loss: 0.731387
Epoch: 6/10... Step: 3400... Loss: 0.898836... Val Loss: 0.731568
Epoch: 6/10... Step: 3500... Loss: 0.701266... Val Loss: 0.731841
Epoch: 6/10... Step: 3600... Loss: 0.761352... Val Loss: 0.732111
Epoch: 6/10... Step: 3700... Loss: 0.762428... Val Loss: 0.732423
Epoch: 6/10... Step: 3800... Loss: 0.732591... Val Loss: 0.732703


 60%|██████    | 6/10 [26:19<17:32, 263.03s/it]

Epoch: 7/10... Step: 3900... Loss: 0.643148... Val Loss: 0.732800
Epoch: 7/10... Step: 4000... Loss: 0.769051... Val Loss: 0.733259
Epoch: 7/10... Step: 4100... Loss: 0.710745... Val Loss: 0.733558
Epoch: 7/10... Step: 4200... Loss: 0.778492... Val Loss: 0.733776
Epoch: 7/10... Step: 4300... Loss: 0.674938... Val Loss: 0.733963
Epoch: 7/10... Step: 4400... Loss: 0.738237... Val Loss: 0.734238


 70%|███████   | 7/10 [30:40<13:07, 262.40s/it]

Epoch: 8/10... Step: 4500... Loss: 0.663398... Val Loss: 0.734536
Epoch: 8/10... Step: 4600... Loss: 0.824229... Val Loss: 0.734974
Epoch: 8/10... Step: 4700... Loss: 0.738850... Val Loss: 0.735093
Epoch: 8/10... Step: 4800... Loss: 0.700915... Val Loss: 0.735350
Epoch: 8/10... Step: 4900... Loss: 0.704260... Val Loss: 0.735552
Epoch: 8/10... Step: 5000... Loss: 0.791826... Val Loss: 0.735887
Epoch: 8/10... Step: 5100... Loss: 0.667017... Val Loss: 0.736242


 80%|████████  | 8/10 [35:11<08:49, 264.76s/it]

Epoch: 9/10... Step: 5200... Loss: 0.687790... Val Loss: 0.736581
Epoch: 9/10... Step: 5300... Loss: 0.724714... Val Loss: 0.736788
Epoch: 9/10... Step: 5400... Loss: 0.753155... Val Loss: 0.737060
Epoch: 9/10... Step: 5500... Loss: 0.794759... Val Loss: 0.737317
Epoch: 9/10... Step: 5600... Loss: 0.796686... Val Loss: 0.737753
Epoch: 9/10... Step: 5700... Loss: 0.723066... Val Loss: 0.738056


 90%|█████████ | 9/10 [39:34<04:24, 264.35s/it]

Epoch: 10/10... Step: 5800... Loss: 0.775900... Val Loss: 0.738107
Epoch: 10/10... Step: 5900... Loss: 0.800279... Val Loss: 0.738283
Epoch: 10/10... Step: 6000... Loss: 0.761304... Val Loss: 0.738640
Epoch: 10/10... Step: 6100... Loss: 0.660460... Val Loss: 0.738997
Epoch: 10/10... Step: 6200... Loss: 0.726162... Val Loss: 0.739312
Epoch: 10/10... Step: 6300... Loss: 0.791093... Val Loss: 0.739800


100%|██████████| 10/10 [44:04<00:00, 264.43s/it]

Epoch: 10/10... Step: 6400... Loss: 0.688135... Val Loss: 0.740076





---
## Testing

There are a few ways to test your network.

* **Test data performance:** First, we'll see how our trained model performs on all of our defined test_data, above. We'll calculate the average loss and accuracy over the test data.

* **Inference on user-generated data:** Second, we'll see if we can input just one example review at a time (without a label), and see what the trained model predicts. Looking at new, user input data like this, and predicting an output label, is called **inference**.

In [75]:
# Get test data loss and accuracy

test_losses = [] # track loss
num_correct = 0

# init hidden state
h = net.init_hidden(batch_size)

net.eval()
# iterate over test data
for inputs, labels, authors in test_loader:

    # Creating new variables for the hidden state, otherwise
    # we'd backprop through the entire training history
    h = tuple([each.data for each in h])

    if(train_on_gpu):
        inputs, labels, authors = inputs.cuda(), labels.cuda(), authors.cuda()
    
    # get predicted outputs
    output, h = net(inputs, h, authors)
    
    # calculate loss
    test_loss = criterion(output.squeeze(), labels.double())
    test_losses.append(test_loss.item())
    
    # convert output probabilities to predicted class (0 or 1)
    pred = torch.round(output.squeeze())  # rounds to the nearest integer
    
    # compare predictions to true label
    correct_tensor = pred.eq(labels.float().view_as(pred))
    correct = np.squeeze(correct_tensor.numpy()) if not train_on_gpu else np.squeeze(correct_tensor.cpu().numpy())
    num_correct += np.sum(correct)


# -- stats! -- ##
# avg test loss
print("Test loss: {:.3f}".format(np.mean(test_losses)))

# accuracy over all test data
test_acc = num_correct/len(test_loader.dataset)
print("Test accuracy: {:.3f}".format(test_acc))

Test loss: 0.737
Test accuracy: 0.506


### Inference on a test review

You can change this test_review to any text that you want. Read it and think: is it pos or neg? Then see if your model predicts correctly!
    
> **Exercise:** Write a `predict` function that takes in a trained net, a plain text_review, and a sequence length, and prints out a custom statement for a positive or negative review!
* You can use any functions that you've already defined or define any helper functions you want to complete `predict`, but it should just take in a trained net, a text review, and a sequence length.


In [76]:
# negative test review
test_review_neg = 'The worst movie I have seen; acting was terrible and I want my money back. This movie had bad acting and the dialogue was slow.'


In [77]:
from string import punctuation

def tokenize_review(test_review):
    test_review = test_review.lower() # lowercase
    # get rid of punctuation
    test_text = ''.join([c for c in test_review if c not in punctuation])

    # splitting by spaces
    test_words = test_text.split()

    # tokens
    test_ints = []
    test_ints.append([vocab_to_int[word] for word in test_words])

    return test_ints

# test code and generate tokenized review
test_ints = tokenize_review(test_review_neg)
print(test_ints)

[[3, 805, 201, 1, 16, 409, 2217, 26, 1300, 6, 1, 76, 5, 358, 55, 30, 201, 64, 118, 2217, 6, 3, 43882, 26, 753]]


In [78]:
# test sequence padding
seq_length=35
features = pad_features(test_ints, seq_length)

print(features)

[[    0     0     0     0     0     0     0     0     0     0     3   805
    201     1    16   409  2217    26  1300     6     1    76     5   358
     55    30   201    64   118  2217     6     3 43882    26   753]]


In [79]:
# test conversion to tensor and pass into your model
feature_tensor = torch.from_numpy(features)
print(feature_tensor.size())

torch.Size([1, 35])


In [80]:
def predict(net, test_review, author=None, sequence_length=200, embed_size=300, batch_size=50):
    
    if author==None:
        author = np.zeros((batch_size, embed_size)) 
    
    net.eval()
    
    # tokenize review
    test_ints = tokenize_review(test_review)
    
    # pad tokenized sequence
    seq_length=sequence_length
    features = pad_features(test_ints, seq_length)
    
    # convert to tensor to pass into your model
    feature_tensor = torch.from_numpy(features)
    author_tensor = torch.from_numpy(author, dtype=torch.long)
    
    batch_size = feature_tensor.size(0)
    
    # initialize hidden state
    h = net.init_hidden(batch_size)
    
    if(train_on_gpu):
        feature_tensor = feature_tensor.cuda()
        author_tensor = author_tensor.cuda()
    
    # get the output from the model
    output, h = net(feature_tensor, h, author_tensor)
    
    # convert output probabilities to predicted class (0 or 1)
    pred = torch.round(output.squeeze()) 
    # printing output value, before rounding
    print('Prediction value, pre-rounding: {:.6f}'.format(output.item()))
    
    # print custom response
    if(pred.item()==1):
        print("Positive")
    else:
        print("Negative")
        

In [81]:
# positive test review
test_review_pos = "I love my life!"
author = "tiffanylue"

In [82]:
# call function
seq_length=35 # good to use the length that was trained on

predict(net, test_review_pos)

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.DoubleTensor instead (while checking arguments for embedding)

### Try out test_reviews of your own!

Now that you have a trained model and a predict function, you can pass in _any_ kind of text and this model will predict whether the text has a positive or negative sentiment. Push this model to its limits and try to find what words it associates with positive or negative.

Later, you'll learn how to deploy a model like this to a production environment so that it can respond to any kind of user data put into a web app!