# Character-Level RNN

In character-wise RNN problems the goal is to input characters and output characters which are most likely how the text proceeds. Individual characters are passed in to the level using one-hot-encoding. 

> One-hot-encoded characters: Every character is a binary feature 0/1, i.e. character is there vw. character not there.

These one-hot-encoded features constitute the input layer. 

The input layer is passed to one (or more) hidden layers. The hidden layer(s) use LSTM cells. 

> An LSTM cell takes in the input as well as a hidden state. The hidden state is passed to the next cell. Simultaneously the cell produces an output. 

The last layer is the output layer. The activation function is a softmax as we'd like to get a probability distribution for the next character. This answers the question: Which character is most likely the next one in the sequence?

#### How to get the batches right?

How does batching works for RNN?

Using RNNs, we're ttraining on sequences of data. Splitting these sequences into multiple shorter ones  enables us to take advantage of **matrix operations to make training more efficient.**

RNNs are usually trained on multiple sequences in parallel. 

Example sequence: 

```
# starting sequence
[ 1 2 3 4 5 6 7 8 9 10 11 12 ]

# split in two sequences, i.e. batch_size=2
[ 1 2 3 4 5 6 ]
[ 7 8 9 10 11 12 ]

# Along batch size, we also choose seq_length
# seq_length = 3, batch_size = 2

Batch 1: [1 2 3 ] ---> Batch 2: [4 5 6 ]
         [7 8 9 ]               [10 11 12 ]
```

The hidden state from batch 1 is transferred to batch 2. In this way, the sequence information is transferred across batches for each mini sequence.

---

# Character-Level LSTM in PyTorch

- Construct a character-level LSTM in PyTorch
- Network will train character by character on some text, then generate new text character by character. 
- Here, trained on the beginning of the book *Anna Karenina*
- This model will be able to generate new text based on the text from the book

This network is based on: 

- Andrej Karpathy's blog post: [The unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)
- Implementation in Torch: [Github link](https://github.com/karpathy/char-rnn)

The general architecture of character-wise RNN:

<img src="../images/charseq.jpeg">

#### Load required resources 

In [35]:
# standard libraries
import numpy as np

# time
import time

# torch
import torch
from torch import nn
import torch.nn.functional as F

## Load in Data

- Load Anna Karenina text file
- Convert it into integers for our network to use

In [2]:
with open("../data/anna.txt", "r") as f:
    text = f.read()
    
print("Text length (characters):", len(text))
print("First 100 characters:\n")
text[:100]

Text length (characters): 1985223
First 100 characters:



'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

#### Tokenization

- Creating a couple of dictionaries
 - These dictionaries are used to conver the characters to and from integers
- Encoding the characters as integers makes it easier to use as input in the network.

We create two dictionaries:

1. `int2char`: maps integers to characters
2. `char2int`: maps characters to integers

In [3]:
# map each character to an integer
chars = tuple(set(text))

int2char = dict(enumerate(chars))
char2int = {ch: ii for ii, ch in int2char.items()}

### map text characters to int

# example of how this mapping works
print("Character 'i' is mapped to:", char2int["i"])

# use for loop
start = time.time()

encoded = []
for char in text:
    encoded.append(char2int[char])

encoded = np.array(encoded)
end = time.time()
print("Time for-loop:", end-start)

# use list comprehension
start = time.time()
encoded = np.array([char2int[char] for char in text])
end = time.time()

print("Time list comprehension:", end-start)
print("Length of encoded:", len(encoded))
encoded[:10]

Character 'i' is mapped to: 42
Time for-loop: 0.8123791217803955
Time list comprehension: 0.40152788162231445
Length of encoded: 1985223


array([46, 17, 39, 56, 38, 25, 50, 60, 48, 61])

#### Pre-processing the data

- LSTM expects input that is one-hot encoded
 - Each character is converted into an integer (using the dictionary),
 - then converted into a column vector. 
 - This column vector will have the value of 1 and the rest will be filled with 0s 

In [4]:
def one_hot_encode(arr, n_labels):
    
    # Initialize arrays with zeros
    one_hot = np.zeros((arr.size, n_labels), dtype=np.float32)
    
    # Fill the appropriate elements with ones
    one_hot[np.arange(one_hot.shape[0]), arr.flatten()] = 1.
    
    # Finally reshape it to get back to the original array
    one_hot = one_hot.reshape((*arr.shape, n_labels))
    
    return one_hot

In [5]:
# check the function
test_seq = np.array([[1,2,3]])
one_hot_test = one_hot_encode(test_seq, 4)
print(one_hot_test)

[[[0. 1. 0. 0.]
  [0. 0. 1. 0.]
  [0. 0. 0. 1.]]]


### Making training mini-batches

- We want to create mini-batches for training
- We want our batches to be multiple sequences of some desired number of sequence steps. 

<img src="../images/sequence_batching@1x.png" width=500px>

- We'll take the encoded characters (passed in as the `arr` parameter)
- split them into multiple sequences, given by `batch_size`. 
- Each sequence will be of length `seq_length`. 

## How to create batches?

- We have 1,985,223 characters
- N `(batch_size)`: Number of sequences in a batch
- M `(seq_length)`: or number of time steps in a sequence
- K: Total number of batches that we can make from the array `arr` (=total characters?)
- Each batch contains $N \times M$ characters, i.e. `batch_size * seq_length`
- $N * M * K$ is the total number of characters to keep

#### 1. Discard text so we only have full mini-batches

- What is a "good" `batch_size` and `seq_length`? 
- Decide and keep number of total characters $N * M * K$

#### 2. Split `arr` into $N$ batches

This can be done using `arr.reshape(size)` where size is a tuple containing the dimensions sized of the reshaped array. 

- We want N (`batch_size`) sequences in a batch
 - N can be determined as first dimension. 
 - Second dimension can be set as `-1` as a placeholder in the size. It'll be fitted according to the data

This yields an array that is $N \times (M * K)$

#### 3. Now iterate though array to get mini-batches. 

- Every batch is a $N \times M$ window on the array
- For each batch, the window moves over by `seq_length`. 
- In addition, we want to create input and target arrays
 - Targets are inputs shifted over by one character

Recommendation: 

- Use `range`to take steps of size `n_steps` from 0 to `arr.shape[1]`, the total number of tokens in each sequence. 

In [6]:
batch_size = 3 # N
seq_length = 10 # M 
n_batches = 5

# create data 
my_arr = np.arange(batch_size * seq_length * n_batches + 4)

### function prep ###

# calculate batch_size_total
batch_size_total = batch_size * seq_length
print("batch_size_total: {}, seq_length: {}".format(batch_size_total, seq_length))

# calculate number of batches
n_batches = len(my_arr) // batch_size_total # K
print("n_batches:", n_batches)

# Keep only enough chars to make full batches 
print("my_arr.size before cutting:", my_arr.size)
my_arr = my_arr[:n_batches * batch_size_total]
print("my_arr.size after cutting:", my_arr.size)
print("my_arr.shape:", my_arr.shape)

# reshape with batches as rows
my_arr = my_arr.reshape(n_batches, -1)
print("my_arr.shape:", my_arr.shape)

for n in range(0, my_arr.shape[1], seq_length):
    print(n)
    print(my_arr[:, n:n+seq_length])  

batch_size_total: 30, seq_length: 10
n_batches: 5
my_arr.size before cutting: 154
my_arr.size after cutting: 150
my_arr.shape: (150,)
my_arr.shape: (5, 30)
0
[[  0   1   2   3   4   5   6   7   8   9]
 [ 30  31  32  33  34  35  36  37  38  39]
 [ 60  61  62  63  64  65  66  67  68  69]
 [ 90  91  92  93  94  95  96  97  98  99]
 [120 121 122 123 124 125 126 127 128 129]]
10
[[ 10  11  12  13  14  15  16  17  18  19]
 [ 40  41  42  43  44  45  46  47  48  49]
 [ 70  71  72  73  74  75  76  77  78  79]
 [100 101 102 103 104 105 106 107 108 109]
 [130 131 132 133 134 135 136 137 138 139]]
20
[[ 20  21  22  23  24  25  26  27  28  29]
 [ 50  51  52  53  54  55  56  57  58  59]
 [ 80  81  82  83  84  85  86  87  88  89]
 [110 111 112 113 114 115 116 117 118 119]
 [140 141 142 143 144 145 146 147 148 149]]


In [7]:
def get_batches(arr, batch_size, seq_length):
    '''Create a generator that returns batches of size
       batch_size x seq_length from arr.
       
       Arguments
       ---------
       arr: Array you want to make batches from
       batch_size: Batch size, the number of sequences per batch
       seq_length: Number of encoded chars in a sequence
    '''
    
    batch_size_total = batch_size * seq_length
    # total number of batches we can make
    n_batches = len(arr)//batch_size_total
    
    # Keep only enough characters to make full batches
    arr = arr[:n_batches * batch_size_total]
    # Reshape into batch_size rows
    arr = arr.reshape((batch_size, -1))
    
    # iterate through the array, one sequence at a time
    for n in range(0, arr.shape[1], seq_length):
        # The features
        x = arr[:, n:n+seq_length]
        # The targets, shifted by one
        y = np.zeros_like(x)
        try:
            y[:, :-1], y[:, -1] = x[:, 1:], arr[:, n+seq_length]
        except IndexError:
            y[:, :-1], y[:, -1] = x[:, 1:], arr[:, 0]
        yield x, y

In [8]:
batches = get_batches(encoded, 8, 50)
x, y = next(batches)

# printing out the first 10 items in a sequence
print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])

x
 [[46 17 39 56 38 25 50 60 48 61]
 [55 44 45 60 38 17 39 38 60 39]
 [25 45 62 60 44 50 60 39 60 22]
 [55 60 38 17 25 60 23 17 42 25]
 [60 55 39 12 60 17 25 50 60 38]
 [23 57 55 55 42 44 45 60 39 45]
 [60 21 45 45 39 60 17 39 62 60]
 [15 10  9 44 45 55 59  7 70 60]]

y
 [[17 39 56 38 25 50 60 48 61 61]
 [44 45 60 38 17 39 38 60 39 38]
 [45 62 60 44 50 60 39 60 22 44]
 [60 38 17 25 60 23 17 42 25 22]
 [55 39 12 60 17 25 50 60 38 25]
 [57 55 55 42 44 45 60 39 45 62]
 [21 45 45 39 60 17 39 62 60 55]
 [10  9 44 45 55 59  7 70 60 64]]


#### Notes

- The y sequence is shifted by one character!

## Defining the network in PyTorch

Next step is to implement the network in PyTorch. 

<img src="../images/charRNN.png" width=500px>

#### Model Structure

Suggested structure in `__init__` :

- Create and store the dictionaries
- Define LSTM layer that takes in 
 - `input_size` (number of characters), 
 - hidden layer size: `n_hidden`
 - number of layers: `n_layers`
 - a dropout probability: `drop_prop`
- Define fully-connected layer with parameters
 - input_size `n_hidden`  
 - `output_size`: number of characters
- Finally, initialize the weights

## LSTM Inputs/Outputs

Basic LSTM layer: [Link](https://pytorch.org/docs/stable/nn.html#lstm)

```python
# Example
self.lstm = nn.LSTM(input_size, 
                    n_hidden, 
                    n_layers, 
                    dropout=drop_prob,
                    batch_first=True)
```

- `input_size`: Number of characters this cell expects as sequential input
- `n_hidden`: Number of units in the hidden layers
- forward function: stack up LSTM cells into layers using `.view`. 

Also required to create an initial hidden state of all zeros:

```python
self.init_hidden()

In [9]:
# Check GPU
train_on_gpu = torch.cuda.is_available()
if(train_on_gpu):
    print('Training on GPU!')
else: 
    print('No GPU available, training on CPU; consider making n_epochs very small.')

No GPU available, training on CPU; consider making n_epochs very small.


In [10]:
# Define CharRNN class
class CharRNN(nn.Module): 
    def __init__(self, tokens, n_hidden=256, n_layers=2,
                               drop_prob=0.5, lr=0.001):
        super().__init__()
        self.drop_prob = drop_prob
        self.n_layers = n_layers
        self.n_hidden = n_hidden
        self.lr = lr
        
        # Creating character dictionaries
        self.chars = tokens
        self.int2char = dict(enumerate(self.chars))
        self.char2int = {ch: ii for ii, ch in self.int2char.items()}
        
        # Define the LSTM
        # Batch_first changes order for inputs!
        self.lstm = nn.LSTM(len(self.chars), 
                            n_hidden, 
                            n_layers, 
                            dropout=drop_prob, 
                            batch_first=True)
        
        # Dropout layer
        self.dropout = nn.Dropout(drop_prob)
        
        # Fully-connected output layer
        self.fc = nn.Linear(n_hidden, len(self.chars))
      
    # Define forward method
    def forward(self, x, hidden):
        """ 
        Forward pass through the network. 
        Inputs: x, hidden/cell state. 
        """
                
        # Outputs, new hidden state from lstm
        r_output, hidden = self.lstm(x, hidden)
        
        # Pass through a dropout layer
        out = self.dropout(r_output)
        
        # Stack up LSTM outputs using view
        # contiguous to reshape output
        out = out.contiguous().view(-1, self.n_hidden)
        
        ## Put x through fully-connected layer
        out = self.fc(out)
        
        # return final output and the hidden state
        return out, hidden
    
    def init_hidden(self, batch_size):
        """
        Method that initializes hidden state
        Call at the beginning 
        """
        # Create two new tensors with sizes n_layers x batch_size x n_hidden,
        # initialized to zero, for hidden state and cell state of LSTM
        weight = next(self.parameters()).data
        
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda(),
                      weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_(),
                      weight.new(self.n_layers, batch_size, self.n_hidden).zero_())
        
        return hidden

## Training

Writing a function so we have better control over number of epochs, learning rate and other parameters. 

- Adam optimizer
- Cross-entropy loss

Within the batch loop: 

- Detach hidden state from its history
- Use [clip_grad_norm](https://pytorch.org/docs/stable/_modules/torch/nn/utils/clip_grad.html) to prevent gradients from exploding
 - [gradient clipping](https://deepai.org/machine-learning-glossary-and-terms/gradient-clipping)
 - Function clips gradient norm of an iterable of parameters. 
 

In [11]:
def train(net, data, epochs=10, batch_size=10, seq_length=50, lr=0.001, clip=5, val_frac=0.1, print_every=10):
    ''' Function to train the network
        
        Inputs
        
        net: CharRNN network
        data: text data to train the network
        epochs: Number of epochs to train
        batch_size: Number of mini-sequences per mini-batch, aka batch size
        seq_length: Number of character steps per mini-batch, timesteps
        lr: learning rate
        clip: gradient clipping
        val_frac: Fraction of data to hold out for validation
        print_every: Number of steps for printing training and validation loss
    '''
    # train mode on
    net.train()
    
    # Adam optimizer
    optimizer = torch.optim.Adam(net.parameters(), lr=lr)
    
    # Cross-Entropy loss
    criterion = nn.CrossEntropyLoss()
    
    # Create training and validation data
    val_idx = int(len(data)*(1-val_frac))
    data, val_data = data[:val_idx], data[val_idx:]
    
    # Check for gpu
    if(train_on_gpu):
        net.cuda()
    
    # initialize counter
    counter = 0
    n_chars = len(net.chars)
    
    # training loop
    for e in range(epochs):
        
        # initialize hidden state with init_hidden method
        h = net.init_hidden(batch_size)
        
        # batch loop
        for x, y in get_batches(data, batch_size, seq_length):
            counter += 1
            
            # One-hot encode our data 
            x = one_hot_encode(x, n_chars)
            
            # Make them Torch tensors
            inputs, targets = torch.from_numpy(x), torch.from_numpy(y)
            
            # Send to gpu if applicable
            if(train_on_gpu):
                inputs, targets = inputs.cuda(), targets.cuda()

            # Creating new variables for the hidden state, otherwise
            # we'd backprop through the entire training history
            h = tuple([each.data for each in h])

            # zero accumulated gradients
            net.zero_grad()
            
            # get output from the model
            output, h = net(inputs, h)
            
            # calculate loss
            loss = criterion(output, targets.view(batch_size*seq_length).long())
            
            # backward pass
            loss.backward()
            
            # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
            nn.utils.clip_grad_norm_(net.parameters(), clip)
            optimizer.step()
            
            # Validation
            if counter % print_every == 0:
                
                # Get validation loss
                val_h = net.init_hidden(batch_size)
                val_losses = []
                net.eval()
                
                # 
                for x, y in get_batches(val_data, batch_size, seq_length):
                    # One-hot encode our data and make them Torch tensors
                    x = one_hot_encode(x, n_chars)
                    x, y = torch.from_numpy(x), torch.from_numpy(y)
                    
                    # Creating new variables for the hidden state, otherwise
                    # we'd backprop through the entire training history
                    val_h = tuple([each.data for each in val_h])
                    
                    inputs, targets = x, y
                    if(train_on_gpu):
                        inputs, targets = inputs.cuda(), targets.cuda()

                    output, val_h = net(inputs, val_h)
                    val_loss = criterion(output, targets.view(batch_size*seq_length).long())
                
                    val_losses.append(val_loss.item())
                
                # train mode on after validation
                net.train()
                
                print("Epoch: {}/{}...".format(e+1, epochs),
                      "Step: {}...".format(counter),
                      "Loss: {:.4f}...".format(loss.item()),
                      "Val Loss: {:.4f}".format(np.mean(val_losses)))

#### Instantiating and training the model

- Create an instance of the CharRNN network
 - define hyperparameters
- Define mini-batches size
- start training

In [36]:
# instantiate CharRNN class
n_hidden = 256 # 512
n_layers = 2 

net = CharRNN(tokens=chars, 
              n_hidden = n_hidden, 
              n_layers = 2,
              drop_prob=0.5, 
              lr = 0.001)
print(net)

CharRNN(
  (lstm): LSTM(83, 256, num_layers=2, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.5)
  (fc): Linear(in_features=256, out_features=83, bias=True)
)


In [38]:
batch_size = 64 # 128
seq_length = 32 # 100
n_epochs = 1 # start smaller if you are just testing initial behavior

# train the model
train(net, encoded, epochs=n_epochs, batch_size=batch_size, seq_length=seq_length, lr=0.01, print_every=10)

Epoch: 1/1... Step: 10... Loss: 3.1237... Val Loss: 3.1249
Epoch: 1/1... Step: 20... Loss: 3.4172... Val Loss: 3.4152
Epoch: 1/1... Step: 30... Loss: 3.1932... Val Loss: 3.2921
Epoch: 1/1... Step: 40... Loss: 3.1119... Val Loss: 3.1488
Epoch: 1/1... Step: 50... Loss: 3.1418... Val Loss: 3.1239
Epoch: 1/1... Step: 60... Loss: 3.1149... Val Loss: 3.1096
Epoch: 1/1... Step: 70... Loss: 3.0897... Val Loss: 3.0551
Epoch: 1/1... Step: 80... Loss: 3.0201... Val Loss: 2.9858
Epoch: 1/1... Step: 90... Loss: 2.9510... Val Loss: 2.9470
Epoch: 1/1... Step: 100... Loss: 2.9281... Val Loss: 2.8912
Epoch: 1/1... Step: 110... Loss: 2.8835... Val Loss: 2.8221
Epoch: 1/1... Step: 120... Loss: 2.8082... Val Loss: 2.7513
Epoch: 1/1... Step: 130... Loss: 2.7148... Val Loss: 2.6715
Epoch: 1/1... Step: 140... Loss: 2.6953... Val Loss: 2.6104
Epoch: 1/1... Step: 150... Loss: 2.6770... Val Loss: 2.5547
Epoch: 1/1... Step: 160... Loss: 2.5842... Val Loss: 2.5128
Epoch: 1/1... Step: 170... Loss: 2.5460... Val Lo

## How to improve performance?

- Change parameters based on training and validation loss
- Training loss much lower than validation? Overfitting!
 - Increase regularization (more dropout)
 - Use a smaller network
- Training and validation losses close? Probably underfitting
 - Increase the size of the network
 
#### Hyperparameters

Model: 

- `n_hidden` - Number of units in the hidden layers
- `n_layers` - Number of hidden LSTM layers to use.

Training: 

- `batch_size` - Number of sequences running through the network in one pass
- `seq_length` - Number of characters in the sequence the network is trained on. 
- `lr` - learning rate for training

## Tips and Tricks (Karpathy)

- [Link](https://github.com/karpathy/char-rnn#tips-and-tricks)

#### Monitoring Validation Loss vs. Training Loss

- Keep track of the difference between training loss and validation loss
- Training loss much lower than validation loss? Network might be **overfitting**.
 - Solutions: Decrease network size or increase dropout
- Training/validation loss about equal? Then Model is underfitting.
 - Increase the size of your model (layers or neurons per layer)
 
#### Approximate number of parameters

- `n_layers` of 2 or 3 is advised
- `n_hidden` can be adjusted based on how much data are available. 

#### Best models strategy

- Be uncomfortable on making the network larger
- Try different dropout values
- Whatever model has the best validation performance is the one you should use in the end

Note that it is common practice to run many different models with many different hyperparameter setting, and in the end take whatever checkpoint gave the best validation performance. 

- In addition, the size of your training and validation splits are also parameters. Make sure you have a decent amount of data in your validation set or otherwise the validation performance will be noisy and not very informative. 

## Checkpoint

After training, save the mdoel so it can be loaded it in again if we need to. 

- Save parameters needed to create the same architecture, hidden layer hyperparameters and text characters.c


In [39]:
model_name = "rnn_1_epoch.net"

# save inputs to the model
checkpoint = {"n_hidden": net.n_hidden,
              "n_layers": net.n_layers,
              "state_dict": net.state_dict(),
              "tokens": net.chars}

with open(model_name, "wb") as f:
    torch.save(checkpoint, f)

## Making Predictions

Model is trained, how to make predictions about the next characters? 

- Pass in a character and have the network to predict the next character.
- Take this character, pass it back in, get another predicted character. 
- Keep doing this as long as you want...

#### Note on the predict function

- Output of the RNN is from a fully-connected layer and it outputs a **distribution of next character scores** (softmax)
- Take the character with the highest probability!

#### Top K Sampling

- Predictions are from a categorical probability distribution over all the possible characters (tokens)
- Only consider some $K$ most probable characters
- Use [topk](https://pytorch.org/docs/stable/torch.html#torch.topk)

In [40]:
# numpy squeeze 
in_arr = np.array([[[2, 2, 2],
                   [2, 2, 2]]])

print("Shape before squeeze:", in_arr.shape)

# Remove single-dimensional enries from shapae of an array
print("Now squeeze it:\n", in_arr.squeeze())
print("Shape after squeeze", in_arr.squeeze().shape)

Shape before squeeze: (1, 2, 3)
Now squeeze it:
 [[2 2 2]
 [2 2 2]]
Shape after squeeze (2, 3)


In [44]:
def predict(net, char, h=None, top_k = None):
    """
    Given a character, predict next character
    Returns predicted character and hidden state
    """
    
    # inputs as tensor
    x = np.array([[net.char2int[char]]])
    x = one_hot_encode(x, len(net.chars))
    inputs = torch.from_numpy(x)
    
    # check if gpu
    if(train_on_gpu):
        inputs = inputs.cuda()
    
    # detach hidden state from history
    h = tuple([each.data for each in h])
    
    # get the output of the model
    out, h = net(inputs, h)
    
    # get character probs
    p = F.softmax(out, dim=1).data
    
    # move to cpu
    if(train_on_gpu):
        p = p.cpu() 
        
    # get top characters
    if top_k is None:
        top_ch = np.arange(len(net.chars))
    else:
        p, top_ch = p.topk(top_k)
        top_ch = top_ch.numpy().squeeze()
        
    # Convert p to numpy, remove single-dimensional entries
    p = p.numpy().squeeze()
    
    # Select the likely next character
    # ...with some element of randomness
    char = np.random.choice(top_ch, p=p/p.sum())
    
    # return encoded value of the predicted char
    # ...as well as the hidden state
    return net.int2char[char], h

#### Priming and generating text

- Prime the network to build up a hidden state. Otherwise network start out generating characters at random.
- In general, first bunch of characters will be little rough since it hasnt built up a long history of characters to predict from. 

In [45]:
def sample(net, size, prime='The', top_k=None):
        
    if(train_on_gpu):
        net.cuda()
    else:
        net.cpu()
    
    net.eval() # eval mode
    
    # First off, run through the prime characters
    chars = [ch for ch in prime]
    h = net.init_hidden(1)
    for ch in prime:
        char, h = predict(net, ch, h, top_k=top_k)

    chars.append(char)
    
    # Now pass in the previous character and get a new one
    for ii in range(size):
        char, h = predict(net, chars[-1], h, top_k=top_k)
        chars.append(char)

    return ''.join(chars)

In [48]:
print(sample(net, 300, prime='Anna', top_k=5))

Anna, when the mome her
was his charted a dear over that she and to his she shilt, the hound on
the was how, but tood a comen of stopan, he was a call how, whither the wifed how
tern that and a dinter as in the
head to a she shall of the daid
her with him a said on their she would how at her trun is
her 


## Loading a checkpoint

In [51]:
# Here we have loaded in a model that trained over 20 epochs `rnn_20_epoch.net`
with open('rnn_1_epoch.net', 'rb') as f:
    checkpoint = torch.load(f)
    
loaded = CharRNN(checkpoint['tokens'], n_hidden=checkpoint['n_hidden'], n_layers=checkpoint['n_layers'])
loaded.load_state_dict(checkpoint['state_dict'])

# Sample using a loaded model
print(sample(loaded, 2000, top_k=5, prime="And Levin said"))

And Levin said all think of the will, but tome than his tree he sould his shird that when she sorting in a still the dealt of ans her,
than intorest tank the his and soman the came time
seast that with and that a stopt of a serition in anst the compontather.

Afting on sered, and always with the werl was all his hir a little, but his but
and his to herself all the
say and the wart in its stret is thas he had betanter that, her, and
had begon to her the cart, and to
to the somet of it true, as he had seaning, hore which and trang in sick and
strat the dids to at his
she
sond into her would she sure his
hungand, and all him and
tome all her strest one he came the was
that ang alway, had thinked,
and to be all shiming have have
all had been that thould not him the conversated in a sand him to he
could, though, but the herting as the
harsed in the did that and she sand
a seck a lade of in the wonding, with the horsian in a much stired to this with which the sort in
herself, and his
content

In [53]:
!!jupyter nbconvert *.ipynb

['[NbConvertApp] Converting notebook Character-Level LSTM in PyTorch.ipynb to html',
 '[NbConvertApp] Writing 345659 bytes to Character-Level LSTM in PyTorch.html']