### **MNIST Classification Tutorial using LeNet Architecture**

---

## **1. Overview**

The **LeNet architecture**, developed by Yann LeCun in 1998, is one of the first convolutional neural networks (CNNs) and was designed for image classification tasks like recognizing handwritten digits in the MNIST dataset.

In this tutorial, we will build a convolutional neural network (CNN) based on the **LeNet** architecture to classify handwritten digits from the MNIST dataset. We'll cover the following topics:

1. **Data Loading**: Loading and preprocessing the MNIST dataset.
2. **Model Definition**: Defining the LeNet architecture for image classification.
3. **Training**: Using an optimizer and a loss function to train the model.
4. **Evaluation**: Assessing the model's performance on the test set.

---

## **2. Data Loading**

We will use the `torchvision` library to load and preprocess the MNIST dataset. The dataset contains 28x28 grayscale images.

### **Code:**

```python
import torch
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Define a transform to normalize the data
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Download and load the training data
trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=64, shuffle=True)

# Download and load the test data
testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
testloader = DataLoader(testset, batch_size=64, shuffle=False)
```

---

## **3. Model Definition (LeNet)**

The **LeNet** architecture consists of the following layers:

1. **Convolutional Layer 1**: Input (1x28x28) → Output (6x24x24)
2. **Pooling Layer 1**: Input (6x24x24) → Output (6x12x12)
3. **Convolutional Layer 2**: Input (6x12x12) → Output (16x8x8)
4. **Pooling Layer 2**: Input (16x8x8) → Output (16x4x4)
5. **Fully Connected Layer 1**: Flattened input (16x4x4 = 256) → 120 neurons
6. **Fully Connected Layer 2**: 120 neurons → 84 neurons
7. **Fully Connected Layer 3**: 84 neurons → 10 output neurons (one for each digit)

### **Code:**

```python
import torch.nn as nn
import torch.nn.functional as F

class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5)  # Convolutional Layer 1
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)  # Pooling Layer
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)  # Convolutional Layer 2
        self.fc1 = nn.Linear(16 * 4 * 4, 120)  # Fully Connected Layer 1
        self.fc2 = nn.Linear(120, 84)  # Fully Connected Layer 2
        self.fc3 = nn.Linear(84, 10)  # Fully Connected Layer 3
    
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))  # Conv1 -> ReLU -> Pool
        x = self.pool(F.relu(self.conv2(x)))  # Conv2 -> ReLU -> Pool
        x = x.view(-1, 16 * 4 * 4)  # Flatten the tensor
        x = F.relu(self.fc1(x))  # Fully connected layer 1
        x = F.relu(self.fc2(x))  # Fully connected layer 2
        x = self.fc3(x)  # Fully connected layer 3
        return F.log_softmax(x, dim=1)  # Log-Softmax for classification
```

---

## **4. Training the Model**

### **Loss Function:**

For multi-class classification, we use the negative log-likelihood loss (`torch.nn.NLLLoss`).

### **Optimizer:**

We will use the Adam optimizer for training.

### **Training Loop:**

We'll train the model for a specified number of epochs, updating the weights after each mini-batch.

### **Code:**

```python
import torch.optim as optim

# Instantiate the model, define the loss function and the optimizer
model = LeNet()
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
epochs = 5

for epoch in range(epochs):
    running_loss = 0.0
    for images, labels in trainloader:
        # Zero the parameter gradients
        optimizer.zero_grad()
        
        # Forward pass
        output = model(images)
        loss = criterion(output, labels)
        
        # Backward pass and optimization
        loss.backward()
        optimizer.step()
        
        # Print statistics
        running_loss += loss.item()
    
    print(f"Epoch {epoch+1}, Loss: {running_loss/len(trainloader):.4f}")
```

---

## **5. Evaluation**

After training the model, we evaluate its performance on the test dataset to see how well it generalizes.

### **Code:**

```python
correct = 0
total = 0

with torch.no_grad():
    for images, labels in testloader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f"Accuracy: {100 * correct / total:.2f}%")
```

---

## **Conclusion**

In this tutorial, we built a convolutional neural network based on the **LeNet** architecture to classify handwritten digits from the MNIST dataset. We trained the model using the Adam optimizer and evaluated its performance on the test set.

You can further improve the model by experimenting with different hyperparameters, data augmentation techniques, or using advanced architectures.


### Architecture

#### **Architecture Summary**:
- **Input Layer**: Accepts grayscale images (1x28x28).
- **Convolutional Layer 1 (Conv1)**: Applies 6 convolutional filters (kernels) to the input image.
- **Pooling Layer 1**: Reduces the spatial dimensions by down-sampling the feature maps.
- **Convolutional Layer 2 (Conv2)**: Applies 16 convolutional filters to the feature maps produced by the first pooling layer.
- **Pooling Layer 2**: Further down-samples the feature maps.
- **Fully Connected Layers**: Three fully connected layers are used for classification.

#### **LeNet Architecture Breakdown**:
1. **Input**: 1x28x28 (grayscale image).
2. **Conv1**: 6 filters of size 5x5 → Output size: 6x24x24.
3. **Max Pooling**: 2x2 pooling → Output size: 6x12x12.
4. **Conv2**: 16 filters of size 5x5 → Output size: 16x8x8.
5. **Max Pooling**: 2x2 pooling → Output size: 16x4x4.
6. **Fully Connected Layer 1**: Input size 16x4x4 = 256 → 120 neurons.
7. **Fully Connected Layer 2**: 120 → 84 neurons.
8. **Output Layer**: 84 → 10 neurons (for 10 digit classes).

### **2. Kernel Size**
- **What is Kernel Size?**
  - The **kernel** (or filter) in a convolutional layer is a small matrix of weights that slides over the input image, computing dot products with overlapping sections of the input.
  - The **kernel size** refers to the dimensions of this matrix.
  
- **In LeNet**:
  - Both convolutional layers in LeNet use a **kernel size of 5x5**, meaning each filter covers a 5x5 region of the input.

- **Effect of Kernel Size**:
  - Larger kernels capture more spatial features but require more computations and parameters.
  - Smaller kernels are computationally cheaper but capture less complex patterns.

### **3. Number of Kernels (Filters)**
- **What is the Number of Kernels?**
  - The number of kernels (or filters) determines how many different feature maps are learned at each convolutional layer. Each kernel extracts a different feature from the input.

- **In LeNet**:
  - **Conv1**: 6 kernels (filters) are applied, producing 6 different feature maps.
  - **Conv2**: 16 kernels are applied, producing 16 feature maps.

- **Effect of Number of Kernels**:
  - More kernels allow the network to learn more diverse features but increase the computational load and memory requirements.
  - Too few kernels may lead to underfitting, as the model may not capture enough complexity in the data.

### **4. Padding**
- **What is Padding?**
  - **Padding** refers to adding zeros around the input image before applying the convolution. This technique controls the spatial size of the output feature maps.
  - Padding can prevent the reduction in the spatial dimensions of the input as it passes through the convolutional layers.

- **In LeNet**:
  - No padding is used in the original LeNet architecture, which is why the size of the feature maps decreases after each convolution.
    - **Conv1 Input**: 28x28 → No padding → **Conv1 Output**: 24x24.
    - **Conv2 Input**: 12x12 → No padding → **Conv2 Output**: 8x8.

- **Effect of Padding**:
  - **Same padding** (padding to maintain the input size) preserves the spatial dimensions after convolution.
  - **Valid padding** (no padding) reduces the spatial dimensions, as seen in LeNet.

### **5. Stride**
- **What is Stride?**
  - **Stride** defines how much the filter moves at each step when sliding over the input. A stride of 1 means the filter moves 1 pixel at a time, while a stride of 2 skips every other pixel.

- **In LeNet**:
  - Both convolutional layers in LeNet use a stride of **1**, meaning the filter moves one pixel at a time.
  - The pooling layers use a **stride of 2**, meaning the pooling operation reduces the spatial dimensions by half.

- **Effect of Stride**:
  - A larger stride reduces the size of the output feature maps faster, resulting in a lower spatial resolution.
  - Stride controls the trade-off between computational efficiency and the level of detail in the feature maps.

---

### **Summary**
In Week 04, we will cover how LeNet works, focusing on these architectural components:
- **Kernel Size**: Determines the receptive field of each filter and what features are captured.
- **Number of Kernels**: Controls the number of different features learned in each convolutional layer.
- **Padding**: Affects the spatial dimensions of the output and whether the borders of the input are preserved.
- **Stride**: Controls how much the filter shifts across the input and the size of the output feature maps.
