# TV Script Generation
In this notebook, we'll construct an RNN model to generate television scripts for a TV show. 

The notebook implements some ideas found from [this /r/machinelearning post](https://www.reddit.com/r/MachineLearning/comments/3psqil/sentence_to_sentence_text_generation_using_lstms/), [this paper on applying dropout to LSTM units](https://arxiv.org/pdf/1409.2329.pdf), and [this paper on output embedding for language models](https://arxiv.org/pdf/1409.2329.pdf).

In [1]:
import torch

GPU_AVAILABLE = torch.cuda.is_available()
GPU_AVAILABLE

True

## Preprocessing the data

To start, we can use the [The Simpsons by the Data](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data) dataset from Kaggle.

The tv scripts are originally in .csv format, but since we're only going to train our model on the raw text of the scripts, I've converted the .csv into a simple text file, [here](./data/simpsons/simpsons_script_lines.txt). 

I've also split the original dataset into 50% [training](./data/simpsons/train.txt), 25% [testing](./data/simpsons/test.txt), and 25% [validation](valid.txt) sets.

In [2]:
%%bash
tail ./data/simpsons/simpsons_script_lines.txt

Marge Simpson: Yes.
Lisa Simpson: Can we do it this week?
(Springfield Elementary School: INT. ELEMENTARY - HALLWAY)
Lisa Simpson: (REHEARSING) Mr. Bergstrom, we request the pleasure of your company... no... Mr. Bergstrom, if you're not doing anything this Friday... no... Mr. Bergstrom, do you like pork chops... oh no, of course you wouldn't...
Miss Hoover: Good morning, Lisa.
Miss Hoover: (OFF LISA'S REACTION) I'm back.
Miss Hoover: You see, class, my Lyme disease turned out to be...
Miss Hoover: Psy-cho-so-ma-tic.
Ralph Wiggum: Does that mean you were crazy?
JANEY: No, that means she was faking it.

In the snippet above, there are three instances of the word `no`. Unfortunately, when we start adding words to our dictionary, `no...` and `no,` will be treated as entirely different words. To remedy this, we can create a token lookup function to map special characters to their own words:

In [3]:
def token_lookup():
    return {
        '.': '||period||',
        ',': '||comma||',
        '"': '||quotation_mark||',
        ';': '||semicolon||',
        '!': '||exclamation_mark',
        '?': '||question_mark',
        '(': '||l_parantheses||',
        ')': '||r_parantheses||',
        '-': '||dash||',
        '\'': '||apostrophe||',
        '\n': '||return||'
    }

### Dictionary Class
The Dictionary can hold our text encodings for later referral. It will also have a helper function to easily query the number of words and to add words to the dictionary instance.

In [4]:
import os
import torch

class Dictionary(object):
    def __init__(self):
        self.word_to_idx = {}
        self.idx_to_word = []
        
    def add_word(self, word):
        if word not in self.word_to_idx:
            self.idx_to_word.append(word)
            self.word_to_idx[word] = len(self.idx_to_word) -1
        return self.word_to_idx[word]
    
    def __len__(self):
        return len(self.idx_to_word)

### Corpus Class
The Corpus can contain the training, testing, and validation sets for the model, as well as a dictionary of words. The corpus is capable of tokenizing its data to help with punctuation and capitalization. Note that the tokenization adds delimeters around the tokenized punctuation.

In [5]:
class Corpus(object):
    def __init__(self, path):
        self.dictionary = Dictionary()
        self.train = self.tokenize(os.path.join(path, 'train.txt'))
        self.test = self.tokenize(os.path.join(path, 'test.txt'))
        self.valid = self.tokenize(os.path.join(path, 'valid.txt'))
        
    def tokenize(self, path):
        assert os.path.exists(path), "Could not find file matching %s" % path
        
        # Add words to the diictionary
        with open(path, 'r') as f:
            tokens = 0
            for line in f:
                for key, tok in token_lookup().items():
                    line = line.replace(key, ' {} '.format(tok))
                line = line.lower()
                
                words = line.split() + ['||return||']
                tokens += len(words)
                
                for word in words:
                    self.dictionary.add_word(word)
        
        # Tokenize the words and return a LongTensor of ids
        with open(path, 'r') as f:
            ids = torch.LongTensor(tokens)
            token = 0
            for line in f:
                for key, tok in token_lookup().items():
                    line = line.replace(key, ' {} '.format(tok))
                line = line.lower()
                words = line.split() + ['||return||']
                
                for word in words:
                    ids[token] = self.dictionary.word_to_idx[word]
                    token += 1
        return ids            

### Load the Data
With those two classes, we can actually load our data:

In [6]:
pathname = './data/simpsons/'
corpus = Corpus(pathname)

### Batch the data
We need to split up our data into batches for training. We can do so with a batchify method like so:

In [7]:
def batchify(data, batch_size):
    # Work out how cleanly we can divide the dataset into :batch_size: parts
    n_batches = data.size(0) // batch_size
    
    # Trim off any extra elements that wouldn't cleanly fit (remainders)
    data = data.narrow(0, 0, n_batches * batch_size)
    
    # Evenly divide the data across the :batch_size: batches.
    data = data.view(batch_size, -1).t().contiguous()
    
    if GPU_AVAILABLE:
        data = data.cuda()
    
    return data

In [8]:
evaluation_batch_size = 10
batch_size = 20
train_data = batchify(corpus.train, batch_size)
test_data = batchify(corpus.test, evaluation_batch_size)
validation_data = batchify(corpus.valid, evaluation_batch_size)
n_tokens = len(corpus.dictionary)

## Build the Model

In [9]:
import torch.nn as nn
from torch.autograd import Variable

class RNNModel(nn.Module):
    def __init__(self, n_tokens, n_inputs, n_hidden, n_layers, dropout=0.5):
        super(RNNModel, self).__init__()
        
        # Randomly zeroes some of the elements of
        #  the input tensor with probability :dropout:
        #  using samples from a bernoulli distribution
        self.dropout = nn.Dropout(dropout)
        
        # A simple lookup table that stores embeddings 
        #  of a fixed dictionary and size.
        # We are using it to store word embeddings
        self.encoder = nn.Embedding(n_tokens, n_inputs)
        
        # A multi-layer long short-term memory cell
        self.rnn = nn.LSTM(n_inputs, n_hidden, n_layers, dropout=dropout)
        
        # A layer to apply a linear transformation to the incoming data
        self.decoder = nn.Linear(n_hidden, n_tokens)
        
        self.init_weights()
        
        self.n_hidden = n_hidden
        self.n_layers = n_layers
        
    def init_weights(self):
        # Initialize the weights of our encoder and decoder with random 
        #  values sampled from a uniform distribution over (-0.1, 0.1).
        # Also set the bias of our linear transformation to 0
        initrange = 0.1
        self.encoder.weight.data.uniform_(-initrange, initrange)
        self.decoder.bias.data.fill_(0)
        self.decoder.weight.data.uniform_(-initrange, initrange)
        
    def forward(self, input, hidden):
        # At every call, run the input through here
        embedding = self.dropout(self.encoder(input))
        output, hidden = self.rnn(embedding, hidden)
        output = self.dropout(output)
        decoded = self.decoder(output.view(output.size(0)*output.size(1), output.size(2)))
        return decoded.view(output.size(0), output.size(1), decoded.size(1)), hidden
    
    def init_hidden(self, batch_size):
        # Initialize the hidden layer with zeros
        weight = next(self.parameters()).data
        
        return (
            Variable(weight.new(self.n_layers, batch_size, self.n_hidden).zero_()),
            Variable(weight.new(self.n_layers, batch_size, self.n_hidden).zero_())
        )

### Define the Hyperparameters

In [10]:
embedding_size = 1500
num_hidden = 650
dropout_probability = 0.65
num_epochs = 50
num_layers = 2
num_tokens = len(corpus.dictionary)
sequence_length = 15
learning_rate = 10.0
gradient_clipping = 0.3
logging_interval = 250

### Build the Model

In [11]:
model = RNNModel(num_tokens, embedding_size, num_hidden, num_layers, dropout_probability)

if GPU_AVAILABLE:
    model.cuda()

### Define our Criterion
We will calculate loss based off of PyTorch's `nn.CrossEntropyLoss`. This criterion performs a `Log(SoftMax(x))` function to a tensor, and passes the result to a negative log likelihood function.

In [12]:
criterion = nn.CrossEntropyLoss()

## Train the Model

`repackage_hidden_states` detaches variables from their history to avoid backpropogating through the entire training history

In [13]:
def repackage_hidden_states(h):
    if type(h) == Variable:
        return Variable(h.data)
    else:
        return tuple(repackage_hidden_states(v) for v in h)

`get_batch` gets a batch to use in each training or evaluation pass

In [14]:
def get_batch(source, i, evaluation=False):
    sequence_len = min(sequence_length, len(source)-1-i)
    data = Variable(source[i:i+sequence_len], volatile=evaluation)
    target = Variable(source[i+1:i+1+sequence_len].view(-1))
    
    return data, target

`evaluate` Turns on evaluation mode, which disables dropout.

In [15]:
def evaluate(data_source):
    model.eval()
    total_loss = 0
    ntok = len(corpus.dictionary)
    hidden = model.init_hidden(evaluation_batch_size)
    
    for i in range(0, data_source.size(0) -1, sequence_length):
        data, targets = get_batch(data_source, i, evaluation=True)
        output, hidden = model(data, hidden)
        output_flat = output.view(-1, ntok)
        total_loss += len(data) * criterion(output_flat, targets).data
        hidden = repackage_hidden_states(hidden)
    return total_loss[0] / len(data_source)

In [16]:
import math
import time
import numpy as np
import matplotlib.pyplot as plt

def train():
    lr = learning_rate
    best_validation_loss = None
#     validation_losses = np.zeros(num_epochs)
#     learning_rates = np.zeros(num_epochs)
#     print(validation_losses)
#     print(learning_rates)
    
    for epoch in range(1, num_epochs+1):
        epoch_start_time = time.time()
        
        model.train()
        total_loss = 0
        n_tokens = len(corpus.dictionary)
        hidden = model.init_hidden(batch_size)
        start_time = time.time()

        for batch, i in enumerate(range(0, train_data.size(0) - 1, sequence_length)):
            data, targets = get_batch(train_data, i)

            hidden = repackage_hidden_states(hidden)
            model.zero_grad()
            output, hidden = model(data, hidden)
            loss = criterion(output.view(-1, n_tokens), targets)
            loss.backward()

            # clip_grad_norm prevents the exploding gradient problem in RNNs and LSTMs
            torch.nn.utils.clip_grad_norm(model.parameters(), gradient_clipping)

            for p in model.parameters():
                p.data.add_(-lr, p.grad.data)

            total_loss += loss.data

            if batch % logging_interval == 0 and batch > 0:
                current_loss = total_loss[0] / logging_interval
                elapsed = time.time() - start_time
                print('Epoch {:3d} {:5d}/{:5d} batches: lr {:02.2f}, loss {:5.2f}, perplexity: {:8.2f}'.format(
                        epoch, batch, len(train_data) // sequence_length, lr, current_loss, math.exp(current_loss))
                     )
                total_loss = 0
        validation_loss = evaluate(validation_data)
        
#         validation_losses = np.put(validation_losses, epoch+1, validation_loss)
#         learning_rates = np.put(learning_rates, epoch+1, lr)
        
#         fig = plt.figure()
#         ax = plt.subplot(111)
#         print(learning_rates)
#         print(np.arange(1, num_epochs+1))
#         ax.plot(np.arange(1, num_epochs+1), learning_rates, label='Learning Rate')
#         ax.plot(np.arange(1, num_epochs+1), validation_losses, label='Validation Loss')
#         ax.legend()
#         plt.show()
    
        
        
        print('='*76)
        print('Epoch {:3d} results: time: {:5.2f}s, validation loss {:5.2f}, perplexity {:8.2f}'.format(
            epoch, (time.time() - epoch_start_time), validation_loss, math.exp(validation_loss)
        ))
        print('='*76)

        
        # If the model's validation loss is the best we've seen, save the model
        if not best_validation_loss or validation_loss < best_validation_loss:
            with open('model_checkpoint.pt', 'wb') as f:
                torch.save(model, f)
            best_validation_loss = validation_loss
        else:
            # Anneal the learning rate if no improvement has been seen in the validation dataset
            lr /= 4.0

In [17]:
# At any point we can hit Ctrl+C to break out of training early
try:
    train()
except KeyboardInterrupt:
    print('='*75)
    print('Exiting from training early')
    
# Load the best saved model
with open('model_checkpoint.pt', 'rb') as f:
    model = torch.load(f)

# Run on the test data
test_loss = evaluate(test_data)
print('='*75)
print('/n End of training. Test loss: {:5.2f}, Test perplexity: {:8.2f}'.format(
    test_loss, math.exp(test_loss)
))
print('='*75)

Epoch   1   250/ 6203 batches: lr 10.00, loss  6.41, perplexity:   608.77
Epoch   1   500/ 6203 batches: lr 10.00, loss  5.05, perplexity:   156.34
Epoch   1   750/ 6203 batches: lr 10.00, loss  4.74, perplexity:   114.48
Epoch   1  1000/ 6203 batches: lr 10.00, loss  4.53, perplexity:    92.61
Epoch   1  1250/ 6203 batches: lr 10.00, loss  4.44, perplexity:    84.88
Epoch   1  1500/ 6203 batches: lr 10.00, loss  4.39, perplexity:    80.91
Epoch   1  1750/ 6203 batches: lr 10.00, loss  4.42, perplexity:    82.88
Epoch   1  2000/ 6203 batches: lr 10.00, loss  4.36, perplexity:    78.38
Epoch   1  2250/ 6203 batches: lr 10.00, loss  4.32, perplexity:    74.83
Epoch   1  2500/ 6203 batches: lr 10.00, loss  4.22, perplexity:    68.10
Epoch   1  2750/ 6203 batches: lr 10.00, loss  4.25, perplexity:    69.90
Epoch   1  3000/ 6203 batches: lr 10.00, loss  4.27, perplexity:    71.78
Epoch   1  3250/ 6203 batches: lr 10.00, loss  4.16, perplexity:    64.11
Epoch   1  3500/ 6203 batches: lr 10.0

  "type " + obj.__name__ + ". It won't be checked "


Epoch   2   250/ 6203 batches: lr 10.00, loss  3.97, perplexity:    52.92
Epoch   2   500/ 6203 batches: lr 10.00, loss  3.93, perplexity:    51.16
Epoch   2   750/ 6203 batches: lr 10.00, loss  3.92, perplexity:    50.56
Epoch   2  1000/ 6203 batches: lr 10.00, loss  3.85, perplexity:    47.03
Epoch   2  1250/ 6203 batches: lr 10.00, loss  3.87, perplexity:    47.75
Epoch   2  1500/ 6203 batches: lr 10.00, loss  3.89, perplexity:    48.82
Epoch   2  1750/ 6203 batches: lr 10.00, loss  3.94, perplexity:    51.54
Epoch   2  2000/ 6203 batches: lr 10.00, loss  3.94, perplexity:    51.48
Epoch   2  2250/ 6203 batches: lr 10.00, loss  3.92, perplexity:    50.51
Epoch   2  2500/ 6203 batches: lr 10.00, loss  3.86, perplexity:    47.57
Epoch   2  2750/ 6203 batches: lr 10.00, loss  3.90, perplexity:    49.44
Epoch   2  3000/ 6203 batches: lr 10.00, loss  3.95, perplexity:    51.89
Epoch   2  3250/ 6203 batches: lr 10.00, loss  3.86, perplexity:    47.28
Epoch   2  3500/ 6203 batches: lr 10.0

Epoch   6  1000/ 6203 batches: lr 10.00, loss  3.52, perplexity:    33.76
Epoch   6  1250/ 6203 batches: lr 10.00, loss  3.55, perplexity:    34.98
Epoch   6  1500/ 6203 batches: lr 10.00, loss  3.56, perplexity:    35.29
Epoch   6  1750/ 6203 batches: lr 10.00, loss  3.62, perplexity:    37.27
Epoch   6  2000/ 6203 batches: lr 10.00, loss  3.63, perplexity:    37.72
Epoch   6  2250/ 6203 batches: lr 10.00, loss  3.62, perplexity:    37.30
Epoch   6  2500/ 6203 batches: lr 10.00, loss  3.56, perplexity:    35.12
Epoch   6  2750/ 6203 batches: lr 10.00, loss  3.59, perplexity:    36.40
Epoch   6  3000/ 6203 batches: lr 10.00, loss  3.65, perplexity:    38.60
Epoch   6  3250/ 6203 batches: lr 10.00, loss  3.56, perplexity:    35.24
Epoch   6  3500/ 6203 batches: lr 10.00, loss  3.55, perplexity:    34.75
Epoch   6  3750/ 6203 batches: lr 10.00, loss  3.61, perplexity:    37.10
Epoch   6  4000/ 6203 batches: lr 10.00, loss  3.59, perplexity:    36.15
Epoch   6  4250/ 6203 batches: lr 10.0

Epoch  10  1750/ 6203 batches: lr 10.00, loss  3.50, perplexity:    33.04
Epoch  10  2000/ 6203 batches: lr 10.00, loss  3.51, perplexity:    33.53
Epoch  10  2250/ 6203 batches: lr 10.00, loss  3.50, perplexity:    33.14
Epoch  10  2500/ 6203 batches: lr 10.00, loss  3.43, perplexity:    31.03
Epoch  10  2750/ 6203 batches: lr 10.00, loss  3.48, perplexity:    32.57
Epoch  10  3000/ 6203 batches: lr 10.00, loss  3.54, perplexity:    34.38
Epoch  10  3250/ 6203 batches: lr 10.00, loss  3.45, perplexity:    31.59
Epoch  10  3500/ 6203 batches: lr 10.00, loss  3.44, perplexity:    31.10
Epoch  10  3750/ 6203 batches: lr 10.00, loss  3.50, perplexity:    33.11
Epoch  10  4000/ 6203 batches: lr 10.00, loss  3.48, perplexity:    32.33
Epoch  10  4250/ 6203 batches: lr 10.00, loss  3.44, perplexity:    31.11
Epoch  10  4500/ 6203 batches: lr 10.00, loss  3.43, perplexity:    30.98
Epoch  10  4750/ 6203 batches: lr 10.00, loss  3.46, perplexity:    31.90
Epoch  10  5000/ 6203 batches: lr 10.0

Epoch  14  2500/ 6203 batches: lr 10.00, loss  3.37, perplexity:    29.01
Epoch  14  2750/ 6203 batches: lr 10.00, loss  3.41, perplexity:    30.27
Epoch  14  3000/ 6203 batches: lr 10.00, loss  3.47, perplexity:    32.16
Epoch  14  3250/ 6203 batches: lr 10.00, loss  3.38, perplexity:    29.27
Epoch  14  3500/ 6203 batches: lr 10.00, loss  3.37, perplexity:    29.03
Epoch  14  3750/ 6203 batches: lr 10.00, loss  3.43, perplexity:    30.88
Epoch  14  4000/ 6203 batches: lr 10.00, loss  3.40, perplexity:    30.05
Epoch  14  4250/ 6203 batches: lr 10.00, loss  3.37, perplexity:    29.05
Epoch  14  4500/ 6203 batches: lr 10.00, loss  3.36, perplexity:    28.85
Epoch  14  4750/ 6203 batches: lr 10.00, loss  3.39, perplexity:    29.66
Epoch  14  5000/ 6203 batches: lr 10.00, loss  3.37, perplexity:    28.98
Epoch  14  5250/ 6203 batches: lr 10.00, loss  3.38, perplexity:    29.48
Epoch  14  5500/ 6203 batches: lr 10.00, loss  3.40, perplexity:    29.84
Epoch  14  5750/ 6203 batches: lr 10.0

Epoch  18  3500/ 6203 batches: lr 0.62, loss  3.22, perplexity:    25.14
Epoch  18  3750/ 6203 batches: lr 0.62, loss  3.28, perplexity:    26.57
Epoch  18  4000/ 6203 batches: lr 0.62, loss  3.26, perplexity:    25.97
Epoch  18  4250/ 6203 batches: lr 0.62, loss  3.22, perplexity:    25.08
Epoch  18  4500/ 6203 batches: lr 0.62, loss  3.22, perplexity:    24.94
Epoch  18  4750/ 6203 batches: lr 0.62, loss  3.24, perplexity:    25.49
Epoch  18  5000/ 6203 batches: lr 0.62, loss  3.21, perplexity:    24.79
Epoch  18  5250/ 6203 batches: lr 0.62, loss  3.22, perplexity:    25.13
Epoch  18  5500/ 6203 batches: lr 0.62, loss  3.23, perplexity:    25.19
Epoch  18  5750/ 6203 batches: lr 0.62, loss  3.25, perplexity:    25.76
Epoch  18  6000/ 6203 batches: lr 0.62, loss  3.25, perplexity:    25.72
Epoch  18 results: time: 454.12s, validation loss  3.60, perplexity    36.68
Epoch  19   250/ 6203 batches: lr 0.62, loss  3.27, perplexity:    26.24
Epoch  19   500/ 6203 batches: lr 0.62, loss  3

Epoch  22  4500/ 6203 batches: lr 0.16, loss  3.19, perplexity:    24.24
Epoch  22  4750/ 6203 batches: lr 0.16, loss  3.21, perplexity:    24.74
Epoch  22  5000/ 6203 batches: lr 0.16, loss  3.18, perplexity:    24.16
Epoch  22  5250/ 6203 batches: lr 0.16, loss  3.20, perplexity:    24.45
Epoch  22  5500/ 6203 batches: lr 0.16, loss  3.20, perplexity:    24.63
Epoch  22  5750/ 6203 batches: lr 0.16, loss  3.22, perplexity:    25.10
Epoch  22  6000/ 6203 batches: lr 0.16, loss  3.23, perplexity:    25.25
Epoch  22 results: time: 454.37s, validation loss  3.60, perplexity    36.50
Epoch  23   250/ 6203 batches: lr 0.16, loss  3.24, perplexity:    25.63
Epoch  23   500/ 6203 batches: lr 0.16, loss  3.24, perplexity:    25.59
Epoch  23   750/ 6203 batches: lr 0.16, loss  3.24, perplexity:    25.56
Epoch  23  1000/ 6203 batches: lr 0.16, loss  3.17, perplexity:    23.90
Epoch  23  1250/ 6203 batches: lr 0.16, loss  3.21, perplexity:    24.87
Epoch  23  1500/ 6203 batches: lr 0.16, loss  3

Epoch  26  5500/ 6203 batches: lr 0.04, loss  3.20, perplexity:    24.61
Epoch  26  5750/ 6203 batches: lr 0.04, loss  3.23, perplexity:    25.18
Epoch  26  6000/ 6203 batches: lr 0.04, loss  3.22, perplexity:    25.04
Epoch  26 results: time: 453.92s, validation loss  3.60, perplexity    36.45
Epoch  27   250/ 6203 batches: lr 0.04, loss  3.24, perplexity:    25.61
Epoch  27   500/ 6203 batches: lr 0.04, loss  3.24, perplexity:    25.57
Epoch  27   750/ 6203 batches: lr 0.04, loss  3.24, perplexity:    25.48
Epoch  27  1000/ 6203 batches: lr 0.04, loss  3.18, perplexity:    23.93
Epoch  27  1250/ 6203 batches: lr 0.04, loss  3.21, perplexity:    24.84
Epoch  27  1500/ 6203 batches: lr 0.04, loss  3.21, perplexity:    24.67
Epoch  27  1750/ 6203 batches: lr 0.04, loss  3.26, perplexity:    25.99
Epoch  27  2000/ 6203 batches: lr 0.04, loss  3.27, perplexity:    26.42
Epoch  27  2250/ 6203 batches: lr 0.04, loss  3.27, perplexity:    26.26
Epoch  27  2500/ 6203 batches: lr 0.04, loss  3

Epoch  31   250/ 6203 batches: lr 0.01, loss  3.24, perplexity:    25.50
Epoch  31   500/ 6203 batches: lr 0.01, loss  3.24, perplexity:    25.56
Epoch  31   750/ 6203 batches: lr 0.01, loss  3.23, perplexity:    25.36
Epoch  31  1000/ 6203 batches: lr 0.01, loss  3.17, perplexity:    23.85
Epoch  31  1250/ 6203 batches: lr 0.01, loss  3.21, perplexity:    24.82
Epoch  31  1500/ 6203 batches: lr 0.01, loss  3.20, perplexity:    24.59
Epoch  31  1750/ 6203 batches: lr 0.01, loss  3.26, perplexity:    26.08
Epoch  31  2000/ 6203 batches: lr 0.01, loss  3.27, perplexity:    26.28
Epoch  31  2250/ 6203 batches: lr 0.01, loss  3.27, perplexity:    26.25
Epoch  31  2500/ 6203 batches: lr 0.01, loss  3.19, perplexity:    24.25
Epoch  31  2750/ 6203 batches: lr 0.01, loss  3.23, perplexity:    25.41
Epoch  31  3000/ 6203 batches: lr 0.01, loss  3.29, perplexity:    26.97
Epoch  31  3250/ 6203 batches: lr 0.01, loss  3.20, perplexity:    24.58
Epoch  31  3500/ 6203 batches: lr 0.01, loss  3.19,

Epoch  35  1250/ 6203 batches: lr 0.00, loss  3.20, perplexity:    24.66
Epoch  35  1500/ 6203 batches: lr 0.00, loss  3.21, perplexity:    24.76
Epoch  35  1750/ 6203 batches: lr 0.00, loss  3.26, perplexity:    26.05
Epoch  35  2000/ 6203 batches: lr 0.00, loss  3.27, perplexity:    26.23
Epoch  35  2250/ 6203 batches: lr 0.00, loss  3.27, perplexity:    26.24
Epoch  35  2500/ 6203 batches: lr 0.00, loss  3.20, perplexity:    24.45
Epoch  35  2750/ 6203 batches: lr 0.00, loss  3.23, perplexity:    25.40
Epoch  35  3000/ 6203 batches: lr 0.00, loss  3.29, perplexity:    26.97
Epoch  35  3250/ 6203 batches: lr 0.00, loss  3.21, perplexity:    24.71
Epoch  35  3500/ 6203 batches: lr 0.00, loss  3.19, perplexity:    24.38
Epoch  35  3750/ 6203 batches: lr 0.00, loss  3.25, perplexity:    25.78
Epoch  35  4000/ 6203 batches: lr 0.00, loss  3.23, perplexity:    25.22
Epoch  35  4250/ 6203 batches: lr 0.00, loss  3.19, perplexity:    24.34
Epoch  35  4500/ 6203 batches: lr 0.00, loss  3.18,

Epoch  39  2250/ 6203 batches: lr 0.00, loss  3.27, perplexity:    26.22
Epoch  39  2500/ 6203 batches: lr 0.00, loss  3.19, perplexity:    24.36
Epoch  39  2750/ 6203 batches: lr 0.00, loss  3.24, perplexity:    25.51
Epoch  39  3000/ 6203 batches: lr 0.00, loss  3.29, perplexity:    26.87
Epoch  39  3250/ 6203 batches: lr 0.00, loss  3.20, perplexity:    24.63
Epoch  39  3500/ 6203 batches: lr 0.00, loss  3.19, perplexity:    24.39
Epoch  39  3750/ 6203 batches: lr 0.00, loss  3.25, perplexity:    25.71
Epoch  39  4000/ 6203 batches: lr 0.00, loss  3.23, perplexity:    25.18
Epoch  39  4250/ 6203 batches: lr 0.00, loss  3.20, perplexity:    24.47
Epoch  39  4500/ 6203 batches: lr 0.00, loss  3.18, perplexity:    24.10
Epoch  39  4750/ 6203 batches: lr 0.00, loss  3.21, perplexity:    24.70
Epoch  39  5000/ 6203 batches: lr 0.00, loss  3.18, perplexity:    24.02
Epoch  39  5250/ 6203 batches: lr 0.00, loss  3.19, perplexity:    24.35
Epoch  39  5500/ 6203 batches: lr 0.00, loss  3.20,

Epoch  43  3250/ 6203 batches: lr 0.00, loss  3.21, perplexity:    24.70
Epoch  43  3500/ 6203 batches: lr 0.00, loss  3.19, perplexity:    24.24
Epoch  43  3750/ 6203 batches: lr 0.00, loss  3.25, perplexity:    25.83
Epoch  43  4000/ 6203 batches: lr 0.00, loss  3.23, perplexity:    25.24
Epoch  43  4250/ 6203 batches: lr 0.00, loss  3.19, perplexity:    24.36
Epoch  43  4500/ 6203 batches: lr 0.00, loss  3.18, perplexity:    24.04
Epoch  43  4750/ 6203 batches: lr 0.00, loss  3.20, perplexity:    24.66
Epoch  43  5000/ 6203 batches: lr 0.00, loss  3.18, perplexity:    23.93
Epoch  43  5250/ 6203 batches: lr 0.00, loss  3.19, perplexity:    24.37
Epoch  43  5500/ 6203 batches: lr 0.00, loss  3.20, perplexity:    24.54
Epoch  43  5750/ 6203 batches: lr 0.00, loss  3.22, perplexity:    25.14
Epoch  43  6000/ 6203 batches: lr 0.00, loss  3.22, perplexity:    25.05
Epoch  43 results: time: 454.72s, validation loss  3.60, perplexity    36.43
Epoch  44   250/ 6203 batches: lr 0.00, loss  3

Epoch  47  4250/ 6203 batches: lr 0.00, loss  3.19, perplexity:    24.34
Epoch  47  4500/ 6203 batches: lr 0.00, loss  3.18, perplexity:    24.12
Epoch  47  4750/ 6203 batches: lr 0.00, loss  3.20, perplexity:    24.65
Epoch  47  5000/ 6203 batches: lr 0.00, loss  3.18, perplexity:    23.96
Epoch  47  5250/ 6203 batches: lr 0.00, loss  3.19, perplexity:    24.38
Epoch  47  5500/ 6203 batches: lr 0.00, loss  3.20, perplexity:    24.55
Epoch  47  5750/ 6203 batches: lr 0.00, loss  3.22, perplexity:    25.11
Epoch  47  6000/ 6203 batches: lr 0.00, loss  3.22, perplexity:    25.05
Epoch  47 results: time: 454.89s, validation loss  3.60, perplexity    36.43
Epoch  48   250/ 6203 batches: lr 0.00, loss  3.24, perplexity:    25.56
Epoch  48   500/ 6203 batches: lr 0.00, loss  3.24, perplexity:    25.57
Epoch  48   750/ 6203 batches: lr 0.00, loss  3.23, perplexity:    25.30
Epoch  48  1000/ 6203 batches: lr 0.00, loss  3.17, perplexity:    23.82
Epoch  48  1250/ 6203 batches: lr 0.00, loss  3



/n End of training. Test loss:  3.63, Test perplexity:    37.88


## Generate Sample Output
Now that we've trained our model, we can generate new sentences from our model

In [18]:
random_seed = 423
num_output_words = 500

In [19]:
torch.manual_seed(random_seed)

if GPU_AVAILABLE:
    torch.cuda.manual_seed(random_seed)

with open('model_checkpoint.pt', 'rb') as f:
    model = torch.load(f)

model.eval()

if GPU_AVAILABLE:
    model.cuda()
else:
    model.cpu()

corpus = Corpus(pathname)
n_tokens = len(corpus.dictionary)
hidden = model.init_hidden(1)
input = Variable(torch.rand(1, 1).mul(n_tokens).long(), volatile=True)

if GPU_AVAILABLE:
    input.data = input.data.cuda()

def get_token_map():
    return {v: k for k, v in token_lookup().items()}

with open('output.txt', 'w') as outf:
    for i in range(num_output_words):
        output, hidden = model(input, hidden)
        word_weights = output.squeeze().data.exp().cpu()
        word_idx = torch.multinomial(word_weights, 1)[0]
        input.data.fill_(word_idx)
        word = corpus.dictionary.idx_to_word[word_idx]
        
        token_map = get_token_map()
        is_token = False
        
        if token_map.get(word):
            is_token = True
            word = token_map[word]
            
        outf.write(('' if is_token else ' ') + word)
        
        if i % logging_interval == 0:
            print('Generated {}/{} words'.format(i, num_output_words))

Generated 0/500 words
Generated 250/500 words


In [20]:
%%bash
cat ./output.txt

, doctor.

 crowd:( concerned gasp)

( springfield mall: int. springfield mall- airport)

 reporter #1: catch, nelson, our only score our beatles are as big as a mess of a tiara!

 martin prince: i reckon we' ve been asleep.

 miss hoover: not for chief wiggum, you have changed the pudding super- cheeked! i got big and case you' re the husband!( snaps fingers)

 homer simpson: really? well, they got sex and worthwhile.

 lenny leonard: yeah, i' m lookin' for me.

 lenny leonard: oh, hi.

 homer simpson: you know if emotion must be in the water scrupulous mae?

 moe szyslak: ah, hello, family shootout.

 moe szyslak: ah, you' re not worse than i mean, lou. that' s what i' m thinking of--

 barney gumble:( pained sounds)( fading quietly) don' t worry. we have to be out. here they are... whoever' s all our names. we' ll be thankful for all your communist!

 grampa simpson: stop pushing snowball! i' ve got to talk to that right.

 lenny leonard:( trying to stop homer) you merely get him a 