### RNN Example ###

Here is a simple example of an RNN implemented in PyTorch. In this example, we will build a character-level RNN model that is trained to predict the next character in a sequence, given a sequence of characters.

In [1]:
# import modules

import torch # import pytorch
import torch.nn as nn # import neural network module
import string # import string module (standard python library that has all ascii characters)
import random # import random module (standard python library that has all random functions)

In [2]:
torch.manual_seed(42) # set random seed for reproducibility

<torch._C.Generator at 0x11c57c130>

In [3]:
# Create a list of alphabets

#all_characters = string.printable  # This command generates all the characters
all_characters = string.ascii_letters + '.' + ' ' # This command generates all ascii letters, period and space
print(f"Characters:\n{all_characters}") # print all characters

n_characters = len(all_characters) # get the number of characters
print(f"Total charaters = {n_characters}") # print the number of characters


Characters:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ. 
Total charaters = 54


In [4]:
# Whole dataset
dataset = "This is a larger dataset. It contains many more sentences and characters than before."

# Split dataset into 80% training and 20% validation
train_size = int(len(dataset) * 0.8) # initialize a n
train_data = dataset[:train_size] # get the first 80% of the dataset for training
valid_data = dataset[train_size:] # get the last 20% of the dataset for validation

print('Train:', train_data,'\nTest: ',valid_data, '\nTest_Length: ', len(valid_data))

Train: This is a larger dataset. It contains many more sentences and charac 
Test:  ters than before. 
Test_Length:  17


#### Parameters requred to instantiate the layer ####
When you create an instance of the nn.RNN class in PyTorch, you need to provide a few parameters. These parameters help to define the structure and function of the recurrent neural network. Here are the main parameters:

## Required
    input_size: The number of expected features in the input x. The input_size is equal to the number of unique characters in our training data (n_characters), which we got from converting our text data into a set of unique characters. In our case it is 54.

    hidden_size: The number of features in the hidden state h. This essentially represents the "memory" of the RNN and can be thought of as the number of "neurons" or "nodes" in the hidden layer of the RNN. You can set this to any number you like, but keep in mind that larger numbers will increase the complexity and computational requirements of the model.

    num_layers: Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and producing the final results. Default: 1

## Not-Required
    nonlinearity: The non-linearity to use ('tanh' or 'relu'). Default: 'tanh'

    bias: If False, then the layer does not use bias weights b_ih and b_hh. Default: True

    batch_first: If True, then the input and output tensors are provided as (batch, seq, feature). Default: False

    dropout: If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0

    bidirectional: If True, becomes a bidirectional RNN. Default: False

For most simple use cases, you'll probably just need to set input_size, hidden_size, and num_layers.


In [5]:
# Build the RNN model
# Remember that RNN gets 2 inputs - actual data and predicted value from previous time step

class RNN(nn.Module): # Create a class RNN that inherits from nn.Module
    def __init__(self, input_size, hidden_size, output_size, n_layers=1): # Define the constructor
        super(RNN, self).__init__() # Call the constructor of the parent class to initialize the object
        
        # Define all the details of the RNN architecture
        self.input_size = input_size # number of expected features (unique characters) in the training data
        self.hidden_size = hidden_size # number of features (nodes/neurons) in the hidden state (hidden layer) where the larger the hidden size, the more complex patterns the model can capture
        self.output_size = output_size # number of expected features (unique characters) in the output data
        self.n_layers = n_layers # number of recurrent layers (stacked RNNs) where the larger the number of layers, the more complex patterns the model can capture

        self.embedding = nn.Embedding(input_size, hidden_size) # Converts text information into vectorial representation to be fed into the RNN (python community has pre-trained embeddings like word2vec, glove, etc. however, we will train our own embeddings here)
        self.rnn = nn.RNN(hidden_size, hidden_size, n_layers) # Define the RNN layer where the first argument is the number of features in the input data, the second argument is the number of features in the hidden state, and the third argument is the number of RNN layers
        self.decoder = nn.Linear(hidden_size, output_size) # define the Linear layer model and x activation function to convert the output of the RNN layer to the output size

    def forward(self, input, hidden): # (input = current input character, hidden = previous state) - forward pass of the RNN
        embedded = self.embedding(input.view(1, -1)) # reshape so it is 1 row, and then send it through the embedding layer to generate embeddings and convert it into a vector
        output, hidden = self.rnn(embedded, hidden) # feed the embedded input and hidden state into the RNN layer to get the output and hidden state
        output = self.decoder(output.view(1, -1)) # send the output to the linear layer to predict the next character in the sequence
        return output, hidden # return the output and the hidden layer for the next iteration
    
    # Initialize the hidden layer parameters
    def init_hidden(self): # initialize some values for the hidden state for the first time step
        return torch.zeros(self.n_layers, 1, self.hidden_size) # initialize the hidden state with zeros

    

To train this RNN we need to create input and target tensors for each sequence in our dataset. Here we use a sequence length of n characters:

In [6]:
# Create as a numeric representation, of first to last characters (not including EOS) for input
def char_tensor(string): # convert string to tensor
    tensor = torch.zeros(len(string)).long() # initialize a tensor with zeros based on the length of the string
    for c in range(len(string)): # loop through the string
        tensor[c] = all_characters.index(string[c]) # get the index of the character in the string and assign it to the tensor
    return tensor # return the tensor

In [7]:
# Encode the string into a tensor
print(char_tensor('All is well')) # takes each alphabet and spaces, and assigns the index to teh corresponding ascii character's index from the all_characters list

tensor([26, 11, 11, 53,  8, 18, 53, 22,  4, 11, 11])


In [9]:
# Create randomly sequenced train and test data
# These functions take in the parameter chunk length (should be less than or equal to the training or test data length)

# Train data generation function
def random_training_set(): # make feature and label pairs for training data
    if chunk_len < len(train_data): # check if the chunk length is less than the length of the training data (chunk - get a random index from the training data as a starting point, and then get a chunk of data from that starting point)
        start_index = random.randint(0, len(train_data) - chunk_len - 1) # pick a the random start index between the start and of the training data and the end of it minus the chunk length minus 1
        end_index = start_index + chunk_len # get the end index by adding the chunk length to the start index
    else: # error if chunk is greater than the training data length, so...
        start_index = 0 # set the start index to 0
        end_index = len(train_data) # set the end index to the length of the training data
    
    chunk = train_data[start_index:end_index] # get the chunk of data from the training data
    inp = char_tensor(chunk[:-1]) # get the input tensor by taking all the characters in the chunk except the last one
    target = char_tensor(chunk[1:]) # get the target tensor by taking all the characters in the chunk except the first one
    return inp, target # return the input and target tensors (input - training features - the vectorized representation of the word, target - labels - the ascii characters)


# Test data generation function
def random_valid_set(): # make feature and label pairs for test data
    if chunk_len < len(valid_data): # check if the chunk length is less than the length of the test data (chunk - get a random index from the test data as a starting point, and then get a chunk of data from that starting point)
        start_index = random.randint(0, len(valid_data) - chunk_len - 1) # pick a the random start index between the start and of the test data and the end of it minus the chunk length minus 1
        end_index = start_index + chunk_len # get the end index by adding the chunk length to the start index
    else: # error if chunk is greater than the test data length, so...
        start_index = 0 # set the start index to 0
        end_index = len(valid_data) # set the end index to the length of the test data
    chunk = valid_data[start_index:end_index] # get the chunk of data from the test data
    inp = char_tensor(chunk[:-1]) # get the input tensor by taking all the characters in the chunk except the last one
    target = char_tensor(chunk[1:]) # get the target tensor by taking all the characters in the chunk except the first one
    return inp, target # return the input and target tensors (input - test features - the vectorized representation of the word, target - labels - the ascii characters)


In [10]:
# Let us implement our training function
# Takes input and target values, runs it through the RNN modeland returns the average loss per chunk length of data

def train(inp, target): # training function - take the input and target tensors
    hidden = rnn.init_hidden() # initialize the hidden input (first time step / first input)
    rnn.zero_grad() # zero the gradients of the RNN model
    loss = 0 # initialize the loss to 0

    for c in range(len(inp)): # loop through the input tensor (sentence dataset)
        output, hidden = rnn(inp[c], hidden) # feed the input character by character and the hidden state into the RNN model to get the output and hidden state
        loss += loss_fn(output, target[c].unsqueeze(0)) # use the loss function model to calculate the loss between the output and the target

    loss.backward() # backpropagate the loss
    optimizer.step() # optimize the gradients

    return loss.data.item() / chunk_len # return the loss per chunk length


In [11]:
# Evaluate on validation data function
# Here we simple feed the validation data to the RNN model and copmpare the prediction to the actual value
# Returns lost per validation data chunk length

def evaluate(inp, valid_target): # test function - take the input and target tensors
    hidden = rnn.init_hidden() # initialize the hidden input (first time step / first input)
    loss = 0 # initialize the loss to 0
    for c in range(len(inp)): # loop through the input tensor (sentence dataset)
        output, hidden = rnn(inp[c], hidden) # feed the input character by character and the hidden state into the RNN model to get the output and hidden state
        loss += loss_fn(output, valid_target[c].unsqueeze(0)) # use the loss function model to calculate the loss between the output and the target
    return loss.data.item() / chunk_len # return the loss per chunk length

#### Calling functions within function which return tuple values ####

The asterisk (*) in Python has a few different meanings depending on the context, but in this case, when it's used in a function call like train(*random_training_set()), it's used for unpacking the elements of random_training_set().

In Python, the single asterisk (*) operator can be used in a function call to unpack an iterable into positional arguments passed to the function.

In our case, random_training_set() returns a tuple of two elements: inp and target. When we call train(*random_training_set()), the * operator unpacks this tuple, and passes the elements of the tuple as separate arguments to the train function.

This is the same as:

inp, target = random_training_set()<br>train(inp, target)


In [12]:
# Create the training loop by calling the functions

# set variable values
n_epochs = 10000 # overfit the very small dataset, so we can see the model learning
print_every = 1000 # print the loss every 1000 epochs
hidden_size = 200 # number of neurons in the hidden layer
n_layers = 1 # number of RNN layers
lr = 0.00001 # very low learning rate because we have so many epochs and are overfitting
chunk_len = 50 # chunk length of the data (the number of characters we want in the sequence of the training data)

# Instantiate the model object, optimizer and loss function
rnn = RNN(n_characters, hidden_size, n_characters, n_layers) # feed the number of characters, hidden size, number of characters, and number of layers into the RNN model
optimizer = torch.optim.Adam(rnn.parameters(), lr=lr) # adam is the most popular loss function of the optimizer, and we are using it here
loss_fn = nn.CrossEntropyLoss() # cross entropy loss function

# Run the training
for epoch in range(1, n_epochs + 1): # loop through the number of epochs
    loss = train(*random_training_set()) # Calculate training loss
    valid_loss = evaluate(*random_valid_set())  # Calculate test loss
    if epoch % print_every == 0: # print the loss every 1000 epochs
        print('Epoch: %d, Training Loss: %.4f, Validation Loss: %.4f' % (epoch, loss, valid_loss)) # print the epoch, training loss, and test loss

Epoch: 1000, Training Loss: 1.5496, Validation Loss: 1.0214
Epoch: 2000, Training Loss: 0.8298, Validation Loss: 1.0342
Epoch: 3000, Training Loss: 0.3885, Validation Loss: 1.0668
Epoch: 4000, Training Loss: 0.3468, Validation Loss: 1.1076
Epoch: 5000, Training Loss: 0.1540, Validation Loss: 1.1524
Epoch: 6000, Training Loss: 0.0623, Validation Loss: 1.1929
Epoch: 7000, Training Loss: 0.0558, Validation Loss: 1.2375
Epoch: 8000, Training Loss: 0.0366, Validation Loss: 1.2898
Epoch: 9000, Training Loss: 0.0252, Validation Loss: 1.3426
Epoch: 10000, Training Loss: 0.0110, Validation Loss: 1.3854


### Generate text with this model ###

In [14]:
def generate(decoder, prime_str='ch', predict_len=100, temperature=0.8): # generate text by feeiding the model 'decoder', the string we want to start with 'prime_str', the number of characters we want to predict 'predict_len', and the strictness vs creative freedom of the model 'temperature'
    hidden = decoder.init_hidden() # initialize the previous hidden state to start the prediction
    prime_input = char_tensor(prime_str) # convert the prime string 'ch' to a tensor
    predicted = prime_str # the first predicted character has to be the prime string

    # Use priming string to "build up" hidden state if there are two characters in the prime string
    for p in range(len(prime_str) - 1): # loop through the prime string
        _, hidden = decoder(prime_input[p], hidden) # the input is whatever tensors we created except the last one, and the hidden state is the previous hidden state
    inp = prime_input[-1] # start predicting the next character by taking the last character in the prime string
    
    for p in range(predict_len): # loop through the number of characters we want to predict
        output, hidden = decoder(inp, hidden) # feed the input character and the hidden state into the RNN model to get the output and hidden state
        
        # Sample from the network as a multinomial distribution
        output_dist = output.data.view(-1).div(temperature).exp() # look at the data of the output, divide by temperature to get the strictness vs creative freedom, and exponentiate it to get the multinomial output distribution probabilities
        top_i = torch.multinomial(output_dist, 1)[0] # take the top index of the multinomial distribution (the character with the highest probability)
        
        # Add predicted character to string and use as next input
        predicted_char = all_characters[top_i] # take the index of the character with the highest probability and get the corresponding character as the predicted character
        predicted += predicted_char # add the predicted character to the predicted string
        inp = char_tensor(predicted_char) # convert the predicted character to a tensor to be fed back into the model for the next iteration

    return predicted # return the predicted string


In [18]:
print(generate(rnn)) # the rnn function has been trained on the dataset, and now we are generating text using the trained model

charger dataset. It contains many more sentences and characes and chara larger dataset. It contains ma
