# Testing a Char-RNN text generator

In this notebook we will load and test our Char-RNN text generator. This notebook has similiarities to the `char-rnn-training.ipynb` notebook, but this time there is no training loop, we are just loading in a model and generating from it. This code is heavily modified from the repo: https://github.com/nikhilbarhate99/Char-RNN-PyTorch

First lets do some imports:

In [None]:
import torch
import random

import numpy as np
import torch.nn as nn
import torch.nn.functional as F

from torch.distributions import Categorical

In PyTorch, there are different implementations for storing and processing data on different kinds of computer hardware. By default, all computers will work by training and running neural networks on the Central Processing Unit (CPU), which we can specify with `'cpu'`. 

If you have an NVIDIA Graphics Processing Unit (GPU) (and you have installed CUDA correctly and the correct version of PyTorch), then you can use the flag `'gpu'` which will make training your neural networks **much faster**. Most of you won't have powerful NVIDIA GPU's in yor laptops however. Don't worry if you don't, the notebooks we are using in this class will be designed to work on laptop CPU's. 

If you have an M1 or M2 processor on a Mac then you can use the device `'mps'` which will run on Apples accelerated Metal Performance Shaders (MPS) for potentially faster and more powerful training (though sometimes running on CPU can be faster). 

In [None]:
device = 'cpu'

#### Set hyperparameters

This is where we specify our *hyperparameters*. We have less hyperparameters this time as we are dont need any of the training parameters. The `hidden_size` and `num_layers` parameters need to be the same as was set when the model was trained in the other notebook.

The temperature parameter can be used to control how random or conservative our precited characters will be. If we have a low temeprature (below 1) we will more often than not pick the most likely character. If the temperature is higher (than 1) our generated sequences will be more random. 

In [None]:
hidden_size = 512   # size of hidden state
num_layers = 3      # number of layers in LSTM layer stack
gen_seq_len = 100   # length of LSTM sequence
temperature = 1     # how random do we want our predictions to be
load_path = "char_rnn_model.pt"

#### Defining the network 

Here we define our network the same. This code must be the same as the code used in the training notebook where we saved the model.

In [None]:
class RNN(nn.Module):
    def __init__(self, input_size, output_size, hidden_size, num_layers):
        super(RNN, self).__init__()
        self.embedding = nn.Embedding(input_size, input_size)
        self.rnn = nn.LSTM(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers)
        self.decoder = nn.Linear(hidden_size, output_size)
    
    def forward(self, input_batch, hidden_state):
        embedding = self.embedding(input_batch)
        output, hidden_state = self.rnn(embedding, hidden_state)
        output = self.decoder(output)
        return output, (hidden_state[0].detach(), hidden_state[1].detach())

#### Setting up network and optimiser

Here we will create an instantiation of our network `rnn`. We also need to define out loss function `loss_fn` and our `optimiser`, which is used to make make changes to the neural network weights in training. We have to make our data variable a PyTorch `tensor`. This is data type that we have to use with PyTorch so that our neural networks can read and process the data correctly. [PyTorch tensors](https://pytorch.org/docs/stable/tensors.html) have been designed to work in almost exactly the same way as [numpy arrays](https://numpy.org/doc/stable/reference/generated/numpy.array.html).

In [None]:
checkpoint = torch.load(load_path)

# Load char_to_ix and ix_to_char dictionaries from checkpoint file
char_to_ix = checkpoint['char_to_ix']
ix_to_char = checkpoint['ix_to_char']

# Calculate vocab size
vocab_size = len(char_to_ix)

# Instantiate RNN
rnn = RNN(vocab_size, vocab_size, hidden_size, num_layers).to(device)

# Load model weights from checkpoint file 
rnn.load_state_dict(checkpoint['state_dict'])

#### Generate a random sequence

In [None]:
with torch.no_grad():
    hidden_state = None
    
    #Pick a random starting character
    random_start = np.array(random.choice(list(char_to_ix.values())))
    
    # Convert to PyTorch Tensor
    input_seq = torch.tensor(random_start, dtype=torch.int64)
    
    # Change dimensionality of tensor for PyTorch compatibility
    # For more info on this function see: https://stackoverflow.com/questions/57237352/what-does-unsqueeze-do-in-pytorch
    input_seq = input_seq.unsqueeze(0).unsqueeze(0)

    # Iterate over our sequence length
    for i in range(gen_seq_len):
        # Forward pass
        output, hidden_state = rnn(input_seq, hidden_state)
        
        # Construct categorical distribution and sample a character
        output = F.softmax(torch.squeeze(output), dim=0)
        dist = Categorical(output / temperature)
        index = dist.sample()
        
        # Print the sampled character
        print(ix_to_char[index.item()], end='')
        
        # Next input is current output
        input_seq[0][0] = index.item()

#### Map string to indexes

Lets write a function where we can manually create our own starting sequence. We will take a string and use our `char_to_ix` dictionary to get the mapped numerical values. It is important to remember that **only the characters in the original dataset** will be able to be **mapped into the index values for the model**. Try printing out `char_to_ix` to see all the available characters. Any characters not in the original data will unfortunately be skipped:

In [None]:
def map_str_to_ix(input_str):
    index_list = []
    for char in input_str:
        ix = char_to_ix.get(char, None)
        if ix is not None:
            index_list.append(ix)
        else:
            print(f'The char {char} is not in the dictionary')
    return index_list

#### Define new starting string

Now lets create our index list and convert it to a numpy array then pytorch tensor:

In [None]:
input_str = 'cci'
index_list = map_str_to_ix(input_str)
print(f'Our list is: {index_list}')

#### Generate from randomly created starting sequence

Now lets have a go at generating from our own sequence. We need to have two loops here. The first passes each character into the model to update the **hidden state**, here we are not doing anything with the models predictions, just *conditioning* the model on our sequence. Once the model is conditioned on the sequence then we can start to make new generations from it in the second loop.

How do these predictions compare to the random generations? What happens when you put in a starting sequence that is very different to the original data? Try [changing the temperature parameter](#set-hyperparameters) to see how the effects the results.

In [None]:
with torch.no_grad():
    hidden_state = None

    for ix in index_list:
        
        # Print current input sequence
        print(ix_to_char[ix], end='')

        #Pick a random starting character
        current_ix = np.array(ix)
        
        # Convert to PyTorch Tensor
        input = torch.tensor(current_ix, dtype=torch.int64)
        
        # Change dimensionality of tensor for PyTorch compatibility
        # For more info on this function see: https://stackoverflow.com/questions/57237352/what-does-unsqueeze-do-in-pytorch
        input = input.unsqueeze(0).unsqueeze(0)
        
        # Condition the model on starting sequence
        output, hidden_state = rnn(input, hidden_state)


    # Iterate over our sequence length
    for i in range(gen_seq_len):
        # Forward pass
        output, hidden_state = rnn(input, hidden_state)
        
        # Construct categorical distribution and sample a character
        output = F.softmax(torch.squeeze(output), dim=0)
        dist = Categorical(output / temperature)
        index = dist.sample()
        
        # Print the sampled character
        print(ix_to_char[index.item()], end='')
        
        # Next input is current output
        input[0][0] = index.item()