<a href="https://colab.research.google.com/github/talhaakbarmohal/Mastering-LLMs/blob/main/Notebooks/Seq_Seq_Part_1_RNN_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Sequence-to-Sequence Models: Starting with RNNs

This section marks the beginning of our exploration into sequence-to-sequence (seq2seq) models. We will initiate this journey by delving into Recurrent Neural Networks (RNNs), the foundational building blocks of seq2seq architectures.

### Types of RNNs

RNNs can be categorized into three primary types based on their input and output structures:

1. **One-to-One:** This type of RNN takes a single input and produces a single output. It is often used for tasks like image classification.

2. **One-to-Many:** In this configuration, an RNN receives a single input and generates a sequence of outputs. This is commonly employed in tasks such as music generation or image captioning.

3. **Many-to-Many:** This RNN architecture processes a sequence of inputs and produces a corresponding sequence of outputs. Machine translation and video classification are typical applications of this type.

### How RNNs Work

Recurrent Neural Networks (RNNs) are designed to process sequential data, where the order of inputs matters. They achieve this by incorporating a feedback loop, allowing information from previous inputs to influence the processing of current inputs. Here's a breakdown:

1. **Sequential Input:** RNNs take a sequence of data as input, processing one element at a time. This could be a series of words in a sentence, frames in a video, or any other sequential data.

2. **Recurrent Connections:** The key to RNNs is their recurrent connections. These connections create a feedback loop within the network, allowing it to maintain an internal memory or state. This memory captures information from previous inputs in the sequence.

3. **Hidden State:** The internal memory of an RNN is represented by its hidden state. As the network processes each input in the sequence, it updates its hidden state based on the current input and the previous hidden state. This hidden state acts as a summary of the information seen so far in the sequence.

4. **Output:** Based on the current input and its hidden state, the RNN produces an output. This output could be a prediction, a classification, or another form of information relevant to the task.

5. **Maintaining Context:** By updating its hidden state with each input, the RNN effectively remembers the relationships between different elements in the sequence. This allows it to capture context and make informed decisions based on the entire input history.

**In essence, RNNs work by iteratively processing sequential data, using recurrent connections to maintain a memory of past inputs and leverage this memory to understand the context and relationships within the sequence.**

In [1]:
!pip install torch==2.2.2
!pip install torchtext==0.17.2
!pip install numpy==1.26.0


Collecting torch==2.2.2
  Downloading torch-2.2.2-cp311-cp311-manylinux1_x86_64.whl.metadata (25 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.2.2)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.2.2)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.2.2)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.2.2)
  Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.2.2)
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch==2.2.2)
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylin

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

# You can also use this section to suppress warnings generated by your code:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
warnings.filterwarnings('ignore')

**Simple RNN Flow for Prediction**

1. **Input:** The RNN receives a sequence of data points as input. => x[1,5,7,4]
2. **Hidden State Initialization:** The RNN initializes its hidden state, which will store information about previous inputs. => h[0,0,0,0]
3. **Sequential Processing:** The RNN processes the input sequence one element at a time. => h_t = f(W_hh * h_(t-1) + W_xh * x_t + b_h)
    - For each input element, it combines the input with the current hidden state.
    - This combination is used to update the hidden state, capturing information about the current input and previous context.
    - The updated hidden state is then used to generate an output prediction for the current input.
4. **Output:** The RNN produces a sequence of output predictions, one for each input element.



In [12]:
#This code shows how the pridiction work in RNN
# This code demonstrates how prediction works in a simple RNN
X = [1, 5, 7, 4]  # Input values, processed sequentially
W_xh = torch.tensor(-10.0, requires_grad=True)  # Weight applied to the current input (input-to-hidden)
W_hh = torch.tensor(10.0, requires_grad=True)  # Weight applied to the previous hidden state (hidden-to-hidden)
b_h = torch.tensor(0.0, requires_grad=True)  # Bias term for the hidden state calculation
x_t = 1  # Current input value (placeholder)
h_prev = torch.tensor(-1.0, requires_grad=True)  # Initial hidden state
W_hy = torch.tensor(4.0, requires_grad=True)  # Weight applied to the hidden state for output (hidden-to-output)
b_y = torch.tensor(5.0, requires_grad=True)  # Bias term for the output calculation
y_hat_t = torch.tensor(15.0, requires_grad=True)  # Target output (expected value)

In [13]:
for x in X:
  x_t = x
  h_t = torch.tanh(W_hh * h_prev + W_xh * x_t + b_h)
  h_prev=h_t
print(h_t)
y_t = torch.sigmoid(W_hy * h_t + b_y)
print(y_t)

tensor(-1., grad_fn=<TanhBackward0>)
tensor(0.7311, grad_fn=<SigmoidBackward0>)


**For Backpropagation**
5. **Loss Calculation:** The predicted outputs are compared to the actual target values using a loss function (e.g., Mean Squared Error).
6. **Backpropagation:** The error is backpropagated through the network to update the RNN's weights, improving its prediction accuracy.
7. **Prediction:** Once trained, the RNN can be used to predict values for new input sequences.

In [14]:
# Define the loss function
loss_fn = nn.MSELoss()

# Calculate the loss during training
loss = loss_fn(y_t, y_hat_t)
print(loss)
optimizer = optim.Adam([W_xh, W_hh, b_h, W_hy, b_y], lr=0.1)

# Training loop:
for epoch in range(1):

    # Backpropagation and weight update:
    optimizer.zero_grad()  # Reset gradients
    loss.backward()        # Calculate gradients
    optimizer.step()       # Update weights

tensor(203.6027, grad_fn=<MseLossBackward0>)


## Simple RNN Explanation and Future Directions

This notebook provided a basic introduction to how Recurrent Neural Networks (RNNs) work. We covered the following key concepts:

1. **RNN Structure:** We explored the basic components of an RNN, including the input, hidden state, and output.
2. **Hidden State Update:** We discussed how the hidden state is updated at each time step, incorporating information from the current input and the previous hidden state.
3. **Loss Calculation:** We used the Mean Squared Error (MSE) loss function to measure the difference between the predicted output and the actual target values.
4. **Backpropagation:** We applied gradient descent to update the RNN's weights, aiming to minimize the loss function and improve prediction accuracy.

**Future Directions:**

This was a simplified explanation of RNNs. We will delve deeper

**Project Exploration:**

To see RNNs in action, explore the projects related to RNNs in the "project" folder. These projects will demonstrate how RNNs can be used to solve real-world problems.