<a href="https://colab.research.google.com/github/virtualfarhan/IIITH-PROJECTS/blob/main/AIML_RNN_Training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Recurrent Neural Networks with PyTorch:**

Recurrent Neural Networks (RNNs) are a class of neural networks particularly well-suited for sequential data processing. They are widely used in natural language processing (NLP), time series analysis, and other tasks where the order of data points matters. In this overview, we'll delve into the theory behind RNNs and how to implement them using PyTorch, a popular deep learning framework.

**1. Introduction to Recurrent Neural Networks (RNNs):**
   - RNNs are designed to handle sequences of data by maintaining an internal state or memory that captures information about previous inputs. This memory allows RNNs to process sequences of arbitrary length and learn patterns in sequential data.
   - At each time step, an RNN takes an input vector and its internal state from the previous time step, computes a new state based on these inputs, and produces an output vector.

**2. Understanding the RNN Architecture:**
   - The basic architecture of an RNN consists of a series of recurrent units (cells) arranged in a sequence. Each recurrent unit maintains a hidden state that serves as its memory.
   - The recurrent units share parameters across time steps, allowing them to capture temporal dependencies in the data.

**3. Challenges with Standard RNNs:**
   - While RNNs are powerful, they suffer from the vanishing gradient problem, which hinders their ability to capture long-term dependencies in sequences.
   - Additionally, standard RNNs have difficulty retaining information from earlier time steps when processing long sequences, leading to what is known as the "short-term memory" problem.

**4. Introduction to Long Short-Term Memory (LSTM) Networks:**
   - To address the limitations of standard RNNs, more advanced architectures such as Long Short-Term Memory (LSTM) networks have been developed.
   - LSTMs contain additional gating mechanisms that control the flow of information, allowing them to retain information over longer sequences and mitigate the vanishing gradient problem.

**5. Implementing RNNs with PyTorch:**
   - PyTorch provides a user-friendly interface for building and training RNNs and other neural network architectures.
   - By leveraging PyTorch's `nn.Module` class and its built-in RNN modules (`nn.RNN`, `nn.LSTM`, `nn.GRU`), users can easily define and customize their RNN architectures.
   - PyTorch also offers efficient handling of sequence data, automatic differentiation for gradient computation, and seamless integration with GPUs for accelerated training.

In [30]:
import torch
from torch import nn
import numpy as np

# Define the input text
text = ['hey how are you', 'good i am fine', 'have a nice day']

# Join all the sentences together and extract the unique characters from the combined sentences
chars = set(''.join(text))

# Creating a dictionary that maps characters to integers
char2int = {char: ind for ind, char in enumerate(chars)}

# Creating another dictionary that maps integers to characters
int2char = {ind: char for char, ind in char2int.items()}

# Find the length of the longest string in the text
maxlen = len(max(text, key=len))

# Padding the sequences
input_seq = []
target_seq = []

for i in range(len(text)):
    input_seq.append(text[i][:-1].ljust(maxlen - 1))  # Pad the input sequence
    target_seq.append(text[i][1:].ljust(maxlen - 1))  # Pad the target sequence

# Convert characters to integers using the char2int dictionary
input_seq = [[char2int[char] for char in seq] for seq in input_seq]
target_seq = [[char2int[char] for char in seq] for seq in target_seq]

# Convert to numpy arrays
input_seq = np.array(input_seq)
target_seq = np.array(target_seq)

# One-hot encode the input sequences
def one_hot_encode(sequence, dict_size, seq_len):
    features = np.zeros((sequence.shape[0], seq_len, dict_size), dtype=np.float32)
    for i in range(len(sequence)):
        for u in range(seq_len):
            features[i, u, sequence[i][u]] = 1
    return features

dict_size = len(char2int)
seq_len = maxlen - 1  # Since we removed one character for input and target sequences
batch_size = len(text)

input_seq = one_hot_encode(input_seq, dict_size, seq_len)
print("Input shape: {} --> (Batch Size, Sequence Length, One-Hot Encoding Size)".format(input_seq.shape))

input_seq = torch.from_numpy(input_seq)
target_seq = torch.Tensor(target_seq)

# Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

class Model(nn.Module):
    def __init__(self, input_size, output_size, hidden_dim, n_layers):
        super(Model, self).__init__()
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers
        self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_size)

    def forward(self, x, hidden):
        batch_size = x.size(0)
        out, hidden = self.rnn(x, hidden)
        out = out.contiguous().view(-1, self.hidden_dim)
        out = self.fc(out)
        return out, hidden

    def init_hidden(self, batch_size):
        return torch.zeros(self.n_layers, batch_size, self.hidden_dim).to(device)

# Instantiate the model
model = Model(input_size=dict_size, output_size=dict_size, hidden_dim=10, n_layers=1)
model = model.to(device)

# Define hyperparameters
n_epochs = 100
lr = 0.01

# Define Loss, Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

# Training Run
input_seq = input_seq.to(device)
target_seq = target_seq.to(device)

for epoch in range(1, n_epochs + 1):
    optimizer.zero_grad()
    hidden = model.init_hidden(batch_size)
    output, hidden = model(input_seq, hidden)
    loss = criterion(output, target_seq.view(-1).long())
    loss.backward()
    optimizer.step()

    if epoch % 10 == 0:
        print('Epoch: {}/{}.............'.format(epoch, n_epochs), end=' ')
        print("Loss: {:.4f}".format(loss.item()))

def predict(model, character, hidden):
    character = np.array([[char2int[c] for c in character]])
    character = one_hot_encode(character, dict_size, character.shape[1])
    character = torch.from_numpy(character).float().to(device)
    out, hidden = model(character, hidden)
    prob = nn.functional.softmax(out[-1], dim=0).data
    char_ind = torch.max(prob, dim=0)[1].item()
    return int2char[char_ind], hidden

def sample(model, out_len, start='hey'):
    model.eval()
    start = start.lower()
    chars = [ch for ch in start]
    hidden = model.init_hidden(1)
    for _ in range(out_len - len(start)):
        char, hidden = predict(model, chars[-1], hidden)
        chars.append(char)
    return ''.join(chars)

# Set the desired length of the generated sample
out_len = 4

generated_sample = sample(model,out_len,'good')
print("Generated Sample:", generated_sample)

Input shape: (3, 14, 17) --> (Batch Size, Sequence Length, One-Hot Encoding Size)
Using device: cpu
Epoch: 10/100............. Loss: 2.4282
Epoch: 20/100............. Loss: 2.2164
Epoch: 30/100............. Loss: 1.9051
Epoch: 40/100............. Loss: 1.5340
Epoch: 50/100............. Loss: 1.1665
Epoch: 60/100............. Loss: 0.8531
Epoch: 70/100............. Loss: 0.6207
Epoch: 80/100............. Loss: 0.4625
Epoch: 90/100............. Loss: 0.3519
Epoch: 100/100............. Loss: 0.2709
Generated Sample: good


**Data Preparation and Preprocessing:**

Before delving into model creation and training, it's crucial to prepare the data adequately. We begin by compiling a collection of motivational quotes, intending to train our model to generate similar phrases. These quotes are combined, and from this amalgamation, we derive a set of unique characters to encompass the textual diversity. By mapping each character to a corresponding integer and vice versa, we establish dictionaries for efficient encoding and decoding operations. Furthermore, we ascertain the length of the longest quote to facilitate uniform padding across sequences.

1. **One-Hot Encoding and Sequence Padding**:
   - To enable effective processing by our neural network, we encode each character using the one-hot encoding technique. This transformation converts characters into binary vectors, with each vector representing the presence or absence of a specific character in the sequence.
   - Additionally, we pad the input and target sequences to ensure uniformity in length. By extending or truncating sequences as necessary, we guarantee consistency in the model's input dimensions.

2. **Model Architecture Specification**:
   - Our model architecture is based on a recurrent neural network (RNN), a type of neural network particularly adept at capturing sequential dependencies in data. Employing PyTorch's `nn.Module`, we define an RNN model equipped with both an RNN layer and a linear layer.
   - The RNN layer serves as the core component responsible for processing sequential data, while the linear layer transforms the output to align with the dimensions of the one-hot encoded vectors.

3. **Training and Optimization**:
   - With the model architecture defined, we proceed to configure the training process. Key hyperparameters such as the number of epochs and the learning rate are specified to govern the training dynamics.
   - To optimize the model's performance, we employ the cross-entropy loss function and the Adam optimizer. These components work in tandem to minimize the discrepancy between predicted and actual outputs, thereby refining the model's predictive capabilities.
   - Throughout the training process, we iteratively update the model's parameters based on computed gradients, gradually improving its ability to generate coherent text.

4. **Text Generation and Inference**:
   - Once the model is trained, we leverage it to generate new text samples. Using a seed input as a starting point, we invoke the model's predictive capabilities to generate subsequent characters.
   - Through this process of sampling, the model produces sequences of text that exhibit coherence and semblance to the motivational quotes it was trained on.