Link: https://pytorch.org/docs/master/nn.html


# RNN for Text Generation

## Generating Text (encoded variables)

We saw how to generate continuous values, now let's see how to generalize this to generate categorical sequences (such as words or letters).

## Imports

In [1]:
import torch
from torch import nn
import torch.nn.functional as F

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## Get Text Data

In [2]:
from google.colab import drive
drive.mount('drive')

Mounted at drive


In [3]:
import os
import zipfile
import numpy as np

dir = "/content/drive/My Drive/Colab Notebooks/PyTorch/Udemy/Data/"
files = os.listdir(dir)
files

['meditations_by_marcus_aurelius.txt',
 'NYCTaxiFares.csv',
 'war_and_peace.txt',
 'sinewave.csv',
 'iris.csv',
 'TomSawyer.txt',
 'UK_Food',
 'income.csv',
 'bank.csv',
 'shakespeare.txt',
 'pride_and_prejudice.txt',
 '.ipynb_checkpoints',
 'TimeSeriesData',
 'MNIST',
 'FashionMNIST']

In [4]:
with open(dir + 'shakespeare.txt','r',encoding='utf8') as f:
    text = f.read()

In [5]:
text[:1000]

"\n                     1\n  From fairest creatures we desire increase,\n  That thereby beauty's rose might never die,\n  But as the riper should by time decease,\n  His tender heir might bear his memory:\n  But thou contracted to thine own bright eyes,\n  Feed'st thy light's flame with self-substantial fuel,\n  Making a famine where abundance lies,\n  Thy self thy foe, to thy sweet self too cruel:\n  Thou that art now the world's fresh ornament,\n  And only herald to the gaudy spring,\n  Within thine own bud buriest thy content,\n  And tender churl mak'st waste in niggarding:\n    Pity the world, or else this glutton be,\n    To eat the world's due, by the grave and thee.\n\n\n                     2\n  When forty winters shall besiege thy brow,\n  And dig deep trenches in thy beauty's field,\n  Thy youth's proud livery so gazed on now,\n  Will be a tattered weed of small worth held:  \n  Then being asked, where all thy beauty lies,\n  Where all the treasure of thy lusty days;\n  To sa

In [6]:
print(text[:1000])


                     1
  From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But as the riper should by time decease,
  His tender heir might bear his memory:
  But thou contracted to thine own bright eyes,
  Feed'st thy light's flame with self-substantial fuel,
  Making a famine where abundance lies,
  Thy self thy foe, to thy sweet self too cruel:
  Thou that art now the world's fresh ornament,
  And only herald to the gaudy spring,
  Within thine own bud buriest thy content,
  And tender churl mak'st waste in niggarding:
    Pity the world, or else this glutton be,
    To eat the world's due, by the grave and thee.


                     2
  When forty winters shall besiege thy brow,
  And dig deep trenches in thy beauty's field,
  Thy youth's proud livery so gazed on now,
  Will be a tattered weed of small worth held:  
  Then being asked, where all thy beauty lies,
  Where all the treasure of thy lusty days;
  To say within thine own deep su

In [7]:
len(text)

5445609

## Encode Entire Text

In [8]:
all_characters = set(text)
all_characters

{'\n',
 ' ',
 '!',
 '"',
 '&',
 "'",
 '(',
 ')',
 ',',
 '-',
 '.',
 '0',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 ':',
 ';',
 '<',
 '>',
 '?',
 'A',
 'B',
 'C',
 'D',
 'E',
 'F',
 'G',
 'H',
 'I',
 'J',
 'K',
 'L',
 'M',
 'N',
 'O',
 'P',
 'Q',
 'R',
 'S',
 'T',
 'U',
 'V',
 'W',
 'X',
 'Y',
 'Z',
 '[',
 ']',
 '_',
 '`',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z',
 '|',
 '}'}

In [9]:
decoder = dict(enumerate(all_characters))
decoder

{0: 'i',
 1: 'm',
 2: 'a',
 3: ']',
 4: '"',
 5: 'J',
 6: 'q',
 7: 'c',
 8: ';',
 9: 'P',
 10: 'W',
 11: 'v',
 12: '>',
 13: 'y',
 14: 'O',
 15: '}',
 16: 'e',
 17: 'V',
 18: '[',
 19: 'E',
 20: 'd',
 21: '2',
 22: '\n',
 23: 'U',
 24: 'j',
 25: 'o',
 26: 'G',
 27: 'C',
 28: ':',
 29: 'x',
 30: 'L',
 31: '_',
 32: 'u',
 33: '&',
 34: ')',
 35: 'D',
 36: 'A',
 37: 'B',
 38: 'l',
 39: ' ',
 40: '1',
 41: 'I',
 42: '(',
 43: ',',
 44: '3',
 45: 's',
 46: 'w',
 47: 'S',
 48: 'f',
 49: 'Q',
 50: '`',
 51: '?',
 52: '8',
 53: 'K',
 54: '-',
 55: 'F',
 56: '!',
 57: '.',
 58: 'Y',
 59: 'R',
 60: '9',
 61: '0',
 62: 'T',
 63: 'n',
 64: 'h',
 65: 'M',
 66: 'z',
 67: "'",
 68: '6',
 69: '7',
 70: 'k',
 71: 'b',
 72: 'H',
 73: 'p',
 74: '<',
 75: '4',
 76: 'Z',
 77: 'X',
 78: '|',
 79: 'r',
 80: '5',
 81: 'N',
 82: 't',
 83: 'g'}

In [10]:
decoder.items()

dict_items([(0, 'i'), (1, 'm'), (2, 'a'), (3, ']'), (4, '"'), (5, 'J'), (6, 'q'), (7, 'c'), (8, ';'), (9, 'P'), (10, 'W'), (11, 'v'), (12, '>'), (13, 'y'), (14, 'O'), (15, '}'), (16, 'e'), (17, 'V'), (18, '['), (19, 'E'), (20, 'd'), (21, '2'), (22, '\n'), (23, 'U'), (24, 'j'), (25, 'o'), (26, 'G'), (27, 'C'), (28, ':'), (29, 'x'), (30, 'L'), (31, '_'), (32, 'u'), (33, '&'), (34, ')'), (35, 'D'), (36, 'A'), (37, 'B'), (38, 'l'), (39, ' '), (40, '1'), (41, 'I'), (42, '('), (43, ','), (44, '3'), (45, 's'), (46, 'w'), (47, 'S'), (48, 'f'), (49, 'Q'), (50, '`'), (51, '?'), (52, '8'), (53, 'K'), (54, '-'), (55, 'F'), (56, '!'), (57, '.'), (58, 'Y'), (59, 'R'), (60, '9'), (61, '0'), (62, 'T'), (63, 'n'), (64, 'h'), (65, 'M'), (66, 'z'), (67, "'"), (68, '6'), (69, '7'), (70, 'k'), (71, 'b'), (72, 'H'), (73, 'p'), (74, '<'), (75, '4'), (76, 'Z'), (77, 'X'), (78, '|'), (79, 'r'), (80, '5'), (81, 'N'), (82, 't'), (83, 'g')])

In [11]:
encoder = {char: ind for ind,char in decoder.items()}

In [12]:
encoder

{'\n': 22,
 ' ': 39,
 '!': 56,
 '"': 4,
 '&': 33,
 "'": 67,
 '(': 42,
 ')': 34,
 ',': 43,
 '-': 54,
 '.': 57,
 '0': 61,
 '1': 40,
 '2': 21,
 '3': 44,
 '4': 75,
 '5': 80,
 '6': 68,
 '7': 69,
 '8': 52,
 '9': 60,
 ':': 28,
 ';': 8,
 '<': 74,
 '>': 12,
 '?': 51,
 'A': 36,
 'B': 37,
 'C': 27,
 'D': 35,
 'E': 19,
 'F': 55,
 'G': 26,
 'H': 72,
 'I': 41,
 'J': 5,
 'K': 53,
 'L': 30,
 'M': 65,
 'N': 81,
 'O': 14,
 'P': 9,
 'Q': 49,
 'R': 59,
 'S': 47,
 'T': 62,
 'U': 23,
 'V': 17,
 'W': 10,
 'X': 77,
 'Y': 58,
 'Z': 76,
 '[': 18,
 ']': 3,
 '_': 31,
 '`': 50,
 'a': 2,
 'b': 71,
 'c': 7,
 'd': 20,
 'e': 16,
 'f': 48,
 'g': 83,
 'h': 64,
 'i': 0,
 'j': 24,
 'k': 70,
 'l': 38,
 'm': 1,
 'n': 63,
 'o': 25,
 'p': 73,
 'q': 6,
 'r': 79,
 's': 45,
 't': 82,
 'u': 32,
 'v': 11,
 'w': 46,
 'x': 29,
 'y': 13,
 'z': 66,
 '|': 78,
 '}': 15}

In [13]:
encoded_text = np.array([encoder[char] for char in text])
encoded_text

array([22, 39, 39, ..., 19, 81, 35])

In [14]:
encoded_text[:500]

array([22, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39,
       39, 39, 39, 39, 39, 40, 22, 39, 39, 55, 79, 25,  1, 39, 48,  2,  0,
       79, 16, 45, 82, 39,  7, 79, 16,  2, 82, 32, 79, 16, 45, 39, 46, 16,
       39, 20, 16, 45,  0, 79, 16, 39,  0, 63,  7, 79, 16,  2, 45, 16, 43,
       22, 39, 39, 62, 64,  2, 82, 39, 82, 64, 16, 79, 16, 71, 13, 39, 71,
       16,  2, 32, 82, 13, 67, 45, 39, 79, 25, 45, 16, 39,  1,  0, 83, 64,
       82, 39, 63, 16, 11, 16, 79, 39, 20,  0, 16, 43, 22, 39, 39, 37, 32,
       82, 39,  2, 45, 39, 82, 64, 16, 39, 79,  0, 73, 16, 79, 39, 45, 64,
       25, 32, 38, 20, 39, 71, 13, 39, 82,  0,  1, 16, 39, 20, 16,  7, 16,
        2, 45, 16, 43, 22, 39, 39, 72,  0, 45, 39, 82, 16, 63, 20, 16, 79,
       39, 64, 16,  0, 79, 39,  1,  0, 83, 64, 82, 39, 71, 16,  2, 79, 39,
       64,  0, 45, 39,  1, 16,  1, 25, 79, 13, 28, 22, 39, 39, 37, 32, 82,
       39, 82, 64, 25, 32, 39,  7, 25, 63, 82, 79,  2,  7, 82, 16, 20, 39,
       82, 25, 39, 82, 64

In [16]:
decoder[39]

' '

## One Hot Encoding

As previously discussed, we need to one-hot encode our data inorder for it to work with the network structure. Make sure to review numpy if any of these operations confuse you!

In [17]:
def one_hot_encoder(encoded_text, num_uni_chars):
    '''
    encoded_text : batch of encoded text
    
    num_uni_chars = number of unique characters (len(set(text)))
    '''
    
    # METHOD FROM:
    # https://stackoverflow.com/questions/29831489/convert-encoded_textay-of-indices-to-1-hot-encoded-numpy-encoded_textay
      
    # Create a placeholder for zeros.
    one_hot = np.zeros((encoded_text.size, num_uni_chars))
    
    # Convert data type for later use with pytorch (errors if we dont!)
    one_hot = one_hot.astype(np.float32)

    # Using fancy indexing fill in the 1s at the correct index locations
    one_hot[np.arange(one_hot.shape[0]), encoded_text.flatten()] = 1.0
    

    # Reshape it so it matches the batch sahe
    one_hot = one_hot.reshape((*encoded_text.shape, num_uni_chars))
    
    return one_hot

In [18]:
one_hot_encoder(np.array([1,2,0,1]),3)

array([[0., 1., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 1., 0.]], dtype=float32)

--------------
---------------
# Creating Training Batches

We need to create a function that will generate batches of characters along with the next character in the sequence as a label.

-----------------
------------

In [19]:
example_text = np.arange(10)

In [20]:
example_text

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [21]:
# If we wanted 5 batches
example_text.reshape((5,-1))

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [22]:
def generate_batches(encoded_text, samp_per_batch=10, seq_len=50):
    
    '''
    Generate (using yield) batches for training.
    
    X: Encoded Text of length seq_len
    Y: Encoded Text shifted by one
    
    Example:
    
    X:
    
    [[1 2 3]]
    
    Y:
    
    [[ 2 3 4]]
    
    encoded_text : Complete Encoded Text to make batches from
    batch_size : Number of samples per batch
    seq_len : Length of character sequence
       
    '''
    
    # Total number of characters per batch
    # Example: If samp_per_batch is 2 and seq_len is 50, then 100 characters come out per batch.
    char_per_batch = samp_per_batch * seq_len
    
    
    # Number of batches available to make
    # Use int() to roun to nearest integer
    num_batches_avail = int(len(encoded_text)/char_per_batch)
    
    # Cut off end of encoded_text that won't fit evenly into a batch
    encoded_text = encoded_text[:num_batches_avail * char_per_batch]
    
    
    # Reshape text into rows the size of a batch
    encoded_text = encoded_text.reshape((samp_per_batch, -1))
    

    # Go through each row in array.
    for n in range(0, encoded_text.shape[1], seq_len):
        
        # Grab feature characters
        x = encoded_text[:, n:n+seq_len]
        
        # y is the target shifted over by 1
        y = np.zeros_like(x)
       
        #
        try:
            y[:, :-1] = x[:, 1:]
            y[:, -1]  = encoded_text[:, n+seq_len]
            
        # FOR POTENTIAL INDEXING ERROR AT THE END    
        except:
            y[:, :-1] = x[:, 1:]
            y[:, -1] = encoded_text[:, 0]
            
        yield x, y

### Example of generating a batch

In [23]:
sample_text = np.arange(20) # encoded_text[:20]

In [24]:
sample_text

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [25]:
batch_generator = generate_batches(sample_text,samp_per_batch=2,seq_len=5)

In [26]:
# Grab first batch
x, y = next(batch_generator)

In [27]:
x

array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14]])

In [28]:
y

array([[ 1,  2,  3,  4,  5],
       [11, 12, 13, 14, 15]])

In [29]:
x, y = next(batch_generator)

In [30]:
x

array([[ 5,  6,  7,  8,  9],
       [15, 16, 17, 18, 19]])

In [31]:
y

array([[ 6,  7,  8,  9,  0],
       [16, 17, 18, 19, 10]])

--------

## GPU Check

Remember this will take a lot longer on CPU!

In [32]:
torch.cuda.is_available()

True

# Creating the LSTM Model

**Note! We will have options for GPU users and CPU users. CPU will take MUCH LONGER to train and you may encounter RAM issues depending on your hardware. If that is the case, consider using cloud services like AWS, GCP, or Azure. Note, these may cost you money to use!**

https://discuss.pytorch.org/uploads/default/original/2X/d/d8688e43375fd7a246b5d64047b64393d80983be.png

In [33]:
# num_layers in RNN is just stacking RNNs on top of each other. 
# So you get a hidden from each layer and an output only from the topmost layer

# The output for the LSTM is the output for all the hidden nodes on the final layer.
# hidden_size - the number of LSTM blocks per layer.
#            - the number of hidden features in hidden state h
# input_size - the number of input features per time-step.
#            - the number of expected features in input x
# num_layers - the number of hidden layers.

# Input of LSTM nn.LSTM(input_size=input_size, hidden_size=hidden_size,
                            # num_layers=num_layers, batch_first=True)

# In total there are hidden_size * num_layers LSTM blocks.

# The input dimensions are (seq_len, batch, input_size).
# seq_len - the number of time steps in each input stream.
# batch - the size of each batch of input sequences.

# The hidden and cell dimensions are: (num_layers, batch, hidden_size)

# output shape (seq_len, batch, hidden_size * num_directions): 
# tensor containing the output features (h_t) from the last layer of the RNN, for each t.

# So there will be hidden_size * num_directions outputs. You didn't initialise the RNN to be bidirectional so num_directions is 1. So output_size = hidden_size.


class CharModel(nn.Module):
    
    def __init__(self, all_chars, num_hidden=256, num_layers=4,drop_prob=0.5,use_gpu=False):
        
        
        # SET UP ATTRIBUTES
        super().__init__()
        self.drop_prob = drop_prob
        self.num_layers = num_layers
        self.num_hidden = num_hidden
        self.use_gpu = use_gpu
        
        #CHARACTER SET, ENCODER, and DECODER
        self.all_chars = all_chars
        self.decoder = dict(enumerate(all_chars))
        self.encoder = {char: ind for ind,char in decoder.items()}
        
        # LSTM input (h0,c0) of shape (sequence length, batch and input_size) 
        # If h0 and c0 not provided they default to 0
        # Hidden_State or h0 and cel dimension c0 of shape (num_layers*num_directions, batch, hidden_size)
        # Output (h_n,c_n) of shape (seq_len, batch, num_directions * hidden_size)

        # If batch=True, input and output provided in this format (batch, seq, feature)
        self.lstm = nn.LSTM(len(self.all_chars), num_hidden, num_layers, dropout=drop_prob, batch_first=True)
        
        self.dropout = nn.Dropout(drop_prob)
        
        self.fc_linear = nn.Linear(num_hidden, len(self.all_chars))
      
    
    def forward(self, x, hidden):
                  
        
        lstm_output, hidden = self.lstm(x, hidden)
        
        
        drop_output = self.dropout(lstm_output)
        
        drop_output = drop_output.contiguous().view(-1, self.num_hidden)
        
        
        final_out = self.fc_linear(drop_output)
        
        
        return final_out, hidden
    
    
    def hidden_state(self, batch_size):
        '''
        Used as separate method to account for both GPU and CPU users.
        '''
        
        if self.use_gpu:
            # The hidden and cell dimensions are: (num_layers, batch, hidden_size)
            hidden = (torch.zeros(self.num_layers,batch_size,self.num_hidden).cuda(),
                     torch.zeros(self.num_layers,batch_size,self.num_hidden).cuda())
        else:
            hidden = (torch.zeros(self.num_layers,batch_size,self.num_hidden),
                     torch.zeros(self.num_layers,batch_size,self.num_hidden))
        
        return hidden
        

## Instance of the Model

In [34]:
len(all_characters)

84

In [35]:
model = CharModel(
    all_chars=all_characters,
    num_hidden=512,
    num_layers=3,
    drop_prob=0.5,
    use_gpu=True,
)

In [36]:
total_param  = []
for p in model.parameters():
  print(int(p.numel()))
  total_param.append(int(p.numel()))

172032
1048576
2048
2048
1048576
1048576
2048
2048
1048576
1048576
2048
2048
43008
84


Try to make the total_parameters be roughly the same magnitude as the number of characters in the text.

In [37]:
sum(total_param)

5470292

In [38]:
len(encoded_text)

5445609

In [39]:
# Choose hidden_size based such that number of parameters equals number of text

### Optimizer and Loss

In [40]:
optimizer = torch.optim.Adam(model.parameters(),lr=0.001)
criterion = nn.CrossEntropyLoss()

## Training Data and Validation Data

In [41]:
# percentage of data to be used for training
train_percent = 0.1

In [42]:
len(encoded_text)

5445609

In [43]:
int(len(encoded_text) * (train_percent))

544560

In [44]:
train_ind = int(len(encoded_text) * (train_percent))

In [45]:
train_data = encoded_text[:train_ind]
val_data = encoded_text[train_ind:]

In [46]:
train_data.shape, val_data.shape

((544560,), (4901049,))

# Training the Network

## Variables

Feel free to play around with these values!

In [47]:
## VARIABLES

# Epochs to train for
epochs = 50
# batch size 
batch_size = 128

# Length of sequence
seq_len = 100

# for printing report purposes
# always start at 0
tracker = 0

# number of characters in text
num_char = max(encoded_text)+1

------

In [None]:
# Set model to train
model.train()


# Check to see if using GPU
if model.use_gpu:
    model.cuda()

for i in range(epochs):
    
    hidden = model.hidden_state(batch_size)
    
    
    for x,y in generate_batches(train_data,batch_size,seq_len):
        
        tracker += 1
        
        # One Hot Encode incoming data
        x = one_hot_encoder(x,num_char)
        
        # Convert Numpy Arrays to Tensor
        
        inputs = torch.from_numpy(x)
        targets = torch.from_numpy(y)
        
        # Adjust for GPU if necessary
        
        if model.use_gpu:
            
            inputs = inputs.cuda()
            targets = targets.cuda()
            
        # Reset Hidden State
        # If we dont' reset we would backpropagate through all training history
        hidden = tuple([state.data for state in hidden])
        
        model.zero_grad()
        
        lstm_output, hidden = model.forward(inputs,hidden)
        loss = criterion(lstm_output,targets.view(batch_size*seq_len).long())
        
        loss.backward()
        
        # POSSIBLE EXPLODING GRADIENT PROBLEM!
        # LET"S CLIP JUST IN CASE
        nn.utils.clip_grad_norm_(model.parameters(),max_norm=5)
        
        optimizer.step()
        
        
        
        ###################################
        ### CHECK ON VALIDATION SET ######
        #################################
        
        if tracker % 25 == 0:
            
            val_hidden = model.hidden_state(batch_size)
            val_losses = []
            model.eval()
            
            for x,y in generate_batches(val_data,batch_size,seq_len):
                
                # One Hot Encode incoming data
                x = one_hot_encoder(x,num_char)
                

                # Convert Numpy Arrays to Tensor

                inputs = torch.from_numpy(x)
                targets = torch.from_numpy(y)

                # Adjust for GPU if necessary

                if model.use_gpu:

                    inputs = inputs.cuda()
                    targets = targets.cuda()
                    
                # Reset Hidden State
                # If we dont' reset we would backpropagate through 
                # all training history
                val_hidden = tuple([state.data for state in val_hidden])
                
                lstm_output, val_hidden = model.forward(inputs,val_hidden)
                val_loss = criterion(lstm_output,targets.view(batch_size*seq_len).long())
        
                val_losses.append(val_loss.item())
            
            # Reset to training model after val for loop
            model.train()
            
            print(f"Epoch: {i} Step: {tracker} Val Loss: {val_loss.item()}")

Epoch: 0 Step: 25 Val Loss: 3.241183280944824
Epoch: 1 Step: 50 Val Loss: 3.2209343910217285
Epoch: 1 Step: 75 Val Loss: 3.2246036529541016
Epoch: 2 Step: 100 Val Loss: 3.103549003601074
Epoch: 2 Step: 125 Val Loss: 3.0078160762786865
Epoch: 3 Step: 150 Val Loss: 2.8424694538116455
Epoch: 4 Step: 175 Val Loss: 2.7311224937438965
Epoch: 4 Step: 200 Val Loss: 2.6245357990264893
Epoch: 5 Step: 225 Val Loss: 2.530056953430176
Epoch: 5 Step: 250 Val Loss: 2.511744737625122
Epoch: 6 Step: 275 Val Loss: 2.424506187438965
Epoch: 7 Step: 300 Val Loss: 2.3757734298706055
Epoch: 7 Step: 325 Val Loss: 2.3281121253967285
Epoch: 8 Step: 350 Val Loss: 2.287860631942749
Epoch: 8 Step: 375 Val Loss: 2.258666515350342
Epoch: 9 Step: 400 Val Loss: 2.219432830810547
Epoch: 10 Step: 425 Val Loss: 2.1962826251983643
Epoch: 10 Step: 450 Val Loss: 2.1531155109405518
Epoch: 11 Step: 475 Val Loss: 2.12485408782959
Epoch: 11 Step: 500 Val Loss: 2.102055072784424
Epoch: 12 Step: 525 Val Loss: 2.0815775394439697
E

-------
------

## Saving the Model

https://pytorch.org/tutorials/beginner/saving_loading_models.html

In [None]:
# Be careful to overwrite our original name file!
model_name = 'example.net'

In [None]:
torch.save(model.state_dict(),model_name)

## Load Model

In [None]:
# MUST MATCH THE EXACT SAME SETTINGS AS MODEL USED DURING TRAINING!

model = CharModel(
    all_chars=all_characters,
    num_hidden=512,
    num_layers=3,
    drop_prob=0.5,
    use_gpu=True,
)

In [None]:
model.load_state_dict(torch.load(model_name))
model.eval()

CharModel(
  (lstm): LSTM(84, 512, num_layers=3, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.5)
  (fc_linear): Linear(in_features=512, out_features=84, bias=True)
)

# Generating Predictions

--------

In [None]:
def predict_next_char(model, char, hidden=None, k=1):
        
        # Encode raw letters with model
        encoded_text = model.encoder[char]
        
        # set as numpy array for one hot encoding
        # NOTE THE [[ ]] dimensions!!
        encoded_text = np.array([[encoded_text]])
        
        # One hot encoding
        encoded_text = one_hot_encoder(encoded_text, len(model.all_chars))
        
        # Convert to Tensor
        inputs = torch.from_numpy(encoded_text)
        
        # Check for CPU
        if(model.use_gpu):
            inputs = inputs.cuda()
        
        
        # Grab hidden states
        hidden = tuple([state.data for state in hidden])
        
        
        # Run model and get predicted output
        lstm_out, hidden = model(inputs, hidden)

        
        # Convert lstm_out to probabilities
        probs = F.softmax(lstm_out, dim=1).data
        
        
        
        if(model.use_gpu):
            # move back to CPU to use with numpy
            probs = probs.cpu()
        
        
        # k determines how many characters to consider
        # for our probability choice.
        # https://pytorch.org/docs/stable/torch.html#torch.topk
        
        # Return k largest probabilities in tensor
        probs, index_positions = probs.topk(k)
        
        
        index_positions = index_positions.numpy().squeeze()
        
        # Create array of probabilities
        probs = probs.numpy().flatten()
        
        # Convert to probabilities per index
        probs = probs/probs.sum()
        
        # randomly choose a character based on probabilities
        char = np.random.choice(index_positions, p=probs)
       
        # return the encoded value of the predicted char and the hidden state
        return model.decoder[char], hidden

In [None]:
def generate_text(model, size, seed='The', k=1):
        
      
    
    # CHECK FOR GPU
    if(model.use_gpu):
        model.cuda()
    else:
        model.cpu()
    
    # Evaluation mode
    model.eval()
    
    # begin output from initial seed
    output_chars = [c for c in seed]
    
    # intiate hidden state
    hidden = model.hidden_state(1)
    
    # predict the next character for every character in seed
    for char in seed:
        char, hidden = predict_next_char(model, char, hidden, k=k)
    
    # add initial characters to output
    output_chars.append(char)
    
    # Now generate for size requested
    for i in range(size):
        
        # predict based off very last letter in output_chars
        char, hidden = predict_next_char(model, output_chars[-1], hidden, k=k)
        
        # add predicted character
        output_chars.append(char)
    
    # return string of predicted text
    return ''.join(output_chars)

In [None]:
print(generate_text(model, 1000, seed='The ', k=3))

The will true and breathed to me.
    If thou wert better to the stare and send thee,
    Which hath any trives and sound and stretged,
    That have the better send of the constance,
    That then that thou shaltst but that have seem surpet
    And we had been the self-fight and had their strange,
    With his sward shall strave a servant state.
    Where this't she is that to the wind of held
    That have this serve that she he with the child
    Which they were beauty of their command strowes
    And truth and strength to the serves and song.
    If thou say'st he that hath seen this should still
    To she with his both shall see him.
    The world was a solder thou to heaven with me,
    And should this can stay that I heave make
    Which his charge in her shames, and to his state.
    That have tho stol'd of this starts to have,  
    And we and to the cheeks that to the stol'd
    To serve the courtier time of that sense is.
    In the summer that that shall not,
    That he w