# Task description

The task is to use a pre-trained ResNet-18 model as the starting point and adapt it to classify images into one of the 10 classes in the CIFAR-10 dataset. This is a common practice known as transfer learning, where you leverage the knowledge a model has gained from a prior task (in this case, image classification on ImageNet) to improve performance on a new, related task.

By using the pre-trained ResNet-18 model, you benefit from the model's already learned filters and weights that can detect edges, textures, and patterns efficiently. Since these low-level features are similar across different image datasets, the pre-trained model only needs to be fine-tuned to learn the specifics of the new dataset, which in this case are the distinct features of the CIFAR-10 classes. **This fine-tuning typically requires less data and computation time than training a model from scratch.**

### CIFAR-10
In the CIFAR-10 dataset, the 10 classes are airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. The dataset consists of 60,000 32x32 color images in total, with 6,000 images per class. The dataset is split into 50,000 training images and 10,000 test images.

More info at: https://www.cs.toronto.edu/~kriz/cifar.html

In [None]:
# Load libraries
import numpy as np
import matplotlib.pyplot as plt
import cv2

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
from torchvision import datasets, transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader

In [None]:
# Check if CUDA is available and set the device accordingly
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Load a pre-trained model

We load a pre-trained ResNet-18 model. ResNet-18 is a convolutional neural network with 18 layers. It's already trained on ImageNet, a large dataset with over a million images and 1000 classes.

In [None]:
# Load a pre-trained model
model = models.resnet18(pretrained=True)
model.to(device)  # Move the model to the CUDA device

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

ResNet-18 was chosen as an example because it's a relatively lightweight version of the ResNet family, which makes it faster to train and more suitable for a demo, especially on limited computational resources. ResNet architectures are indeed commonly pre-trained on the ImageNet dataset, which includes a wide range of images across 1000 classes, making the features learned by these models quite robust for a variety of tasks.

More info at: https://pytorch.org/vision/main/models/resnet.html

# Modify the Model for a New Task
Since we might not be classifying 1000 classes, we change the last fully connected layer to match the number of classes we have. In this example, it's set to 10.

In [None]:
# Modify the last fully connected layer for the new task
num_ftrs = model.fc.in_features # fc=fully connected layer
model.fc = nn.Linear(num_ftrs, 10)  # CIFAR-10 has 10 classes
model.fc.to(device)  # Move the new fully connected layer to the CUDA device

Linear(in_features=512, out_features=10, bias=True)

In [None]:
print(num_ftrs)

512


The value num_ftrs being 512 for ResNet-18 specifically refers to the number of output features from the final convolutional layer of the network.

# Transform the Input Data
Input data needs to be pre-processed to the format the pre-trained model expects. This usually includes resizing, cropping, converting to a tensor, and normalizing to match the ImageNet training data.

In [None]:
# Define transformations for the input data
transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize images to size used by ResNet
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

`transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])`: This transformation standardizes the pixel values of the input images. The mean and std arguments are lists corresponding to the means and standard deviations for each of the three color channels (RGB) of the ImageNet dataset. By normalizing the images, we ensure that the input distribution matches the distribution that the model was originally trained with. This step is crucial because it helps the model to learn more effectively and converge faster during training.

The values [0.485, 0.456, 0.406] and [0.229, 0.224, 0.225] are the global mean and standard deviation of the ImageNet dataset, and using these specific numbers is a common practice when working with models pre-trained on ImageNet.
Normalization is done by subtracting the mean from each pixel and then dividing by the standard deviation, channel-wise: (pixel_value - mean) / std.

# Load the Dataset
We use ImageFolder to load our dataset, which is assumed to be organized into a directory with subdirectories for each class, each containing the corresponding images.

In [None]:
# Load CIFAR-10 dataset
train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
val_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified


To load your own images:

```
train_data = ImageFolder(root='root_dir/train', transform=transform)
val_data = ImageFolder(root='root_dir/val', transform=transform)
test_data = ImageFolder(root='root_dir/test', transform=transform)
```

When using the ImageFolder class in PyTorch, you do not need to specify a train parameter. The ImageFolder class is designed to automatically load images from a directory where the structure of the directory defines the class labels.

```
root_dir/
    train/
        class1/
            img1.jpg
            img2.jpg
        class2/
            img3.jpg
            ...
    val/
        class1/
            img4.jpg
            ...
        class2/
            img5.jpg
            ...
```

# Create Data Loaders
We create DataLoader instances that provide an iterable over our dataset with the specified batch size and shuffling option for the training data.

In [None]:
# Create DataLoaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# Define the Optimizer
We define an optimizer that will update the weights of the last fully connected layer. SGD with momentum is used, but the choice of optimizer might vary based on the specific problem.

In [None]:
# Define the optimizer and loss function
optimizer = torch.optim.SGD(model.fc.parameters(), lr=0.001, momentum=0.9)
criterion = nn.CrossEntropyLoss().to(device)  # Move the loss function to the CUDA device

# Train the Model
We iterate over the epochs, then over the batches of data in the train_loader, calculate the loss, and update the model's weights.

In [None]:
# Function to calculate accuracy
def calculate_accuracy(y_pred, y_true):
    y_pred_softmax = torch.log_softmax(y_pred, dim = 1)
    _, y_pred_tags = torch.max(y_pred_softmax, dim = 1)
    correct_pred = (y_pred_tags == y_true).float()
    acc = correct_pred.sum() / len(correct_pred)
    acc = torch.round(acc * 100)
    return acc

In [None]:
# Train the model
num_epochs = 5
for epoch in range(num_epochs):
    model.train()  # Set the model to training mode
    running_loss = 0.0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)  # Move data to the CUDA device
        optimizer.zero_grad()  # Zero the gradients
        outputs = model(inputs)  # Forward pass
        loss = criterion(outputs, labels)  # Compute the loss
        loss.backward()  # Backward pass
        optimizer.step()  # Update weights
        running_loss += loss.item() * inputs.size(0) # the total loss over all batches within the epoch

    # Validation phase
    model.eval()  # Set the model to evaluation mode
    val_running_loss = 0.0
    val_running_accuracy = 0.0
    with torch.no_grad():  # No need to track the gradients, reducing memory usage and speeding up computations.
        for val_inputs, val_labels in val_loader:
            val_inputs, val_labels = val_inputs.to(device), val_labels.to(device)
            val_outputs = model(val_inputs) # Foward pass
            val_loss = criterion(val_outputs, val_labels) # Compute the loss
            val_running_loss += val_loss.item() * val_inputs.size(0) # Cumulative loss
            val_running_accuracy += calculate_accuracy(val_outputs, val_labels).item() # Cumulative accuracy

    # Calculate average losses
    epoch_loss = running_loss / len(train_loader.dataset)
    val_epoch_loss = val_running_loss / len(val_loader.dataset)
    val_epoch_accuracy = val_running_accuracy / len(val_loader)

    # Print out the information
    print(f'Epoch {epoch+1}/{num_epochs} - Training Loss: {epoch_loss:.4f}, Validation Loss: {val_epoch_loss:.4f}, Validation Accuracy: {val_epoch_accuracy:.2f}%')

Epoch 1/5 - Training Loss: 0.6638, Validation Loss: 0.6203, Validation Accuracy: 78.62%
Epoch 2/5 - Training Loss: 0.6326, Validation Loss: 0.5978, Validation Accuracy: 79.71%
Epoch 3/5 - Training Loss: 0.6143, Validation Loss: 0.5778, Validation Accuracy: 80.25%
Epoch 4/5 - Training Loss: 0.6011, Validation Loss: 0.5916, Validation Accuracy: 80.12%
Epoch 5/5 - Training Loss: 0.5939, Validation Loss: 0.5831, Validation Accuracy: 80.27%


# Overall performance metrics

In [None]:
# Assuming training is complete and the model is already trained...

# Evaluate on the test dataset
model.eval()  # Set the model to evaluation mode
test_running_accuracy = 0.0
total_samples = 0

# No need to track the gradients since we are not training
with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)  # Move data to the CUDA device
        outputs = model(inputs)  # Forward pass: compute the predicted outputs
        _, predicted = torch.max(outputs.data, 1)  # Get the predicted classes from the maximum value
        total_samples += labels.size(0)  # Increment the total count of samples
        test_running_accuracy += (predicted == labels).sum().item()  # Increment the correct predictions count

# Calculate the overall accuracy by dividing the number of correct predictions by the total number of samples
overall_test_accuracy = (test_running_accuracy / total_samples) * 100

# Print overall accuracy
print(f'Overall Test Accuracy: {overall_test_accuracy:.2f}%')

Overall Test Accuracy: 80.24%


# Saving model after training

In [None]:
# Save the model state dictionary
torch.save(model.state_dict(), 'model_cifar10.pth')