## __HOMEWORK 3__
#### Zankhana Mehta
#### 002320268
#### mehta.zan@northeastern.edu

#### __PROBLEM 2__

#### Part a

a) Convolution layer: A convolution layer is made up of a large number of convolution filters, each of which is a template that determines whether a particular local feature is present in an image. A convolution filter relies on a very simple operation, called a convolution, which basically amounts to repeatedly multiplying matrix elements and then adding the results.Convolutional layers are primarily used for feature extraction. They identify patterns such as edges, textures, and shapes in the input data, which are crucial for tasks like image recognition. The convolved image highlights regions of the original image that resemble the convolution filte

b) Pooling layer: Pooling layers reduce the spatial dimensions of the feature maps generated by the convolution layers. Common pooling methods include max pooling (taking the maximum value in a specified window) and average pooling (calculating the average value). Pooling layers help in down-sampling feature maps, reducing the computational load, and providing translational invariance. This means they make the network robust to small shifts and distortions in the input.

c) Fully connected layer: This layer connects every neuron from the previous layer to every neuron in the current layer, forming a dense layer. It typically follows the convolution and pooling layers. Fully-connected layers serve to combine the features learned by previous layers and make final classifications. They are commonly used in the final stages of CNNs to produce the output probabilities for different classes in tasks like image classification.

![image.png](attachment:b5a8f965-5a2a-4cce-9745-12194612a048.png)

Examples of feasible configurations of CNNs and their use cases:

1) AlexNet:

Architecture: Comprises five convolutional layers, max pooling layers, and three fully connected layers. It also employs techniques like ReLU activation and dropout for regularization.
Use Case: Widely used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). It revolutionized the field by demonstrating the effectiveness of deep learning for large-scale image classification tasks.

Use Case: AlexNet was primarily designed for classifying images into categories. It can be used to identify objects in images, such as animals, vehicles, and everyday items.

2) ResNet (Residual Network):

Architecture: Features many convolutional layers (often over 50), organized into residual blocks that include skip connections allowing gradients to flow through the network effectively.
Use Case: Ideal for complex tasks such as object detection, image segmentation, and other tasks in computer vision. ResNet's deep architecture addresses the vanishing gradient problem, enabling the training of very deep networks while maintaining performance.

Use Case: Ideal for complex tasks such as object detection, image segmentation, and other tasks in computer vision. ResNet's deep architecture addresses the vanishing gradient problem, enabling the training of very deep networks while maintaining performance.

3) ImageNet:

Architecture: The term "ImageNet" typically refers to a large-scale dataset and the challenges associated with it rather than a specific CNN architecture. However, several CNN architectures have been developed and optimized to perform well on the ImageNet dataset, such as AlexNet, VGGNet, GoogleNet, and ResNet. These architectures typically consist of multiple convolutional layers, pooling layers, and fully connected layers.

Use Case: ImageNet is primarily used for image classification tasks, where the goal is to categorize images into one of many classes (over 20,000 classes). The dataset contains millions of labeled images, making it a benchmark for training and evaluating the performance of various deep learning models. The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) has been a pivotal event that has driven advancements in CNN architectures, showcasing how deeper and more complex networks can significantly improve classification accuracy.

#### Part b

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms

# Download and prepare MNIST dataset (replace path if needed)
data_dir = './MNIST_data'  # Download and extract data here
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))  # MNIST's standard normalization
])
train_dataset = datasets.MNIST(root=data_dir, train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root=data_dir, train=False, download=True, transform=transform)

# Define data loaders with appropriate batch sizes
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)

# Create the CNN architecture
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)  # Input channels=1 (grayscale), output channels=6, kernel size=5
        self.pool = nn.MaxPool2d(2, 2)  # Pooling size=2
        self.conv2 = nn.Conv2d(6, 16, 5)  # Input channels=6 (from conv1), output channels=16, kernel size=5
        self.fc1 = nn.Linear(16 * 4 * 4, 120)  # Input size based on conv2 output
        self.fc2 = nn.Linear(120, 84)  # Additional hidden layer for better learning
        self.fc3 = nn.Linear(84, 10)  # Output layer with 10 classes (digits 0-9)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 4 * 4)  # Flatten for fully connected layers
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))  # Additional ReLU activation for better learning
        x = self.fc3(x)
        return x

# Initialize the model and optimizer
model = CNN()
criterion = nn.CrossEntropyLoss()  # Suitable for multi-class classification
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)  # Adam optimizer with learning rate=0.001

# Training loop with loss tracking
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
num_epochs = 10

training_losses = []
test_losses = []

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images, labels = images.to(device), labels.to(device)

        # Forward pass, calculate loss, and backpropagation
        outputs = model(images)
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i+1) % 100 == 0:
            print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(train_loader)}], Loss: {loss.item():.4f}')

    # Evaluate on test set after each epoch
    with torch.no_grad():
        model.eval()  # Set model to evaluation mode
        test_loss = 0
        correct = 0
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            test_loss += criterion(outputs, labels).item()
            _, predicted = torch.max(outputs.data, 1)  # Get predictions with highest probabilities
            correct += (predicted == labels).