# SIT319/SIT744 Practical 5: Build an Image Classification Model

<div class="alert alert-info">
We suggest that you run this notebook using Google Colab.
</div>

## Learning objectives

- Construct and train a Convolutional Neural Network

## Pre-practical readings

- [Training a Classifier](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html)

## Task 1 Understanding Loss Functions in PyTorch

In this task, we will explore the concept of **loss functions** and how they guide the training process of neural networks. We’ll discuss why loss functions are essential, examine the differences between common regression and classification losses, and implement practical examples in PyTorch.

---

### Overview of Loss Functions

#### What is a Loss Function?
A **loss function** (or **cost function**) measures how far off a model’s predictions are from the actual target values. During training, the goal is to **minimize** this loss, guiding the model’s parameters to better fit the data.

#### Regression vs. Classification Losses
- **Regression Loss Functions**: Used when the output is a continuous value.  
  - **Mean Squared Error (MSE)**: Measures the average of the squares of the errors between predictions and targets.  
  - **Mean Absolute Error (MAE)**: Measures the average of the absolute differences between predictions and targets.

- **Classification Loss Functions**: Used when the output is a discrete class label.  
  - **Binary Cross-Entropy (BCE)**: Suitable for binary classification tasks (0 vs. 1).  
  - **Cross-Entropy Loss**: Generalization of BCE for multi-class classification. Sometimes referred to as *Softmax Loss* when combined with a softmax layer.

---


Below, we will illustrate how to implement and compute some of these loss functions using PyTorch’s `torch.nn` module. We’ll set up minimal synthetic examples for both **regression** and **classification** tasks to highlight the key differences.




### Regression with MSE Loss

Let's create a simple dataset $X$ and a target  $y$ for a regression problem

In [None]:
import torch
import torch.nn as nn

# For reproducibility
torch.manual_seed(42)

# Synthetic data: y = 2x + 1 with some noise
X = torch.randn(10, 1)  # 10 samples, 1 feature
y = 2 * X + 1 + 0.2 * torch.randn(10, 1)

print("Features (X):\n", X)
print("Targets (y):\n", y)

We'll define a single-layer linear model (`nn.Linear`) and compute its output on $X$.

In [None]:
model_reg = nn.Linear(in_features=1, out_features=1)

# Forward pass
predictions = model_reg(X)
print("Predictions:\n", predictions)

We'll instantiate PyTorch's MSELoss and compute the loss between predictions and $y$.

In [None]:
criterion_mse = nn.MSELoss()
loss_mse = criterion_mse(predictions, y)
print("MSE Loss:", loss_mse.item())

You’ll notice the scalar loss value. If we train this model (using gradient descent), we can reduce the loss over time to fit our linear data better.

### Binary Classification with BCE Loss

We’ll create a small dataset of 0s and 1s to mimic a binary classification scenario.

In [None]:
# 10 samples, each with 3 "features"
X_class = torch.randn(10, 3)
# Binary labels (0 or 1)
y_class = torch.randint(0, 2, (10, 1)).float()

print("Features (X_class):\n", X_class)
print("Labels (y_class):\n", y_class)

We’ll define a single-layer linear model, but we’ll treat it as a logistic regression by passing its output through a sigmoid before computing loss.

In [None]:
model_class = nn.Linear(in_features=3, out_features=1)

# Forward pass
logits = model_class(X_class)
preds = torch.sigmoid(logits)  # Convert logits to probabilities
print("Predictions (sigmoid output):\n", preds)

PyTorch provides two main ways to compute BCE loss:

1. `nn.BCELoss()` requires manual sigmoid.
2. `nn.BCEWithLogitsLoss()` combines sigmoid + BCE in one step.
Below, we'll use `BCEWithLogitsLoss()` (common and numerically stable):

In [None]:
criterion_bce = nn.BCEWithLogitsLoss()
loss_bce = criterion_bce(logits, y_class)
print("BCE Loss:", loss_bce.item())


> Note: We pass logits directly (no sigmoid needed) to `BCEWithLogitsLoss()`. If you are using `BCELoss()`, you must pass the sigmoid output instead.

📝 **Exercise**

1. Create a Synthetic Binary Classification Dataset
   - Generate 100 data points with 2 input features (e.g., using `torch.randn`).
   - Assign binary labels (0 or 1) based on some simple rule (e.g., `label = 1 if x1 + x2 > 0` else `0`).

2. Define a Simple Logistic Regression Model
   - Use `nn.Linear(in_features=2, out_features=1)` to map your 2D inputs to a single output (logit).
   - Remember to apply `torch.sigmoid` if you plan to use `nn.BCELoss`, or pass logits directly if you use `nn.BCEWithLogitsLoss`.

3. Train with Different Loss Functions  
   1. Using `nn.MSELoss` (**intentionally mismatched for classification**):  
      - Set up a training loop for a few epochs (e.g., 50).  
      - Observe how the loss decreases and track accuracy on the training data.  
   2. Using `nn.BCEWithLogitsLoss` (more appropriate for binary classification):  
      - Repeat the training loop.  
      - Compare the loss curve and final accuracy with what you obtained using MSE.

4. Compare and Discuss  
   - Which loss function yields better accuracy and why?  
   - What happens if you increase or decrease the number of data points?  
   - How does the learning rate affect the training convergence for each loss function?



## Task 2: Data Preparation for Image Classification

In this task, we will:
1. **Load** the Cats vs. Dogs dataset directly from [Hugging Face Datasets](https://huggingface.co/datasets).
2. **Split** the data into training, validation, and test sets.
3. **Transform** the images (resize, normalize, and augment).
4. **Explore** the dataset visually and numerically.

Using Hugging Face Datasets allows us to focus on **image classification** without manually downloading or organizing files.

---


❗ **If you haven’t installed the `datasets` library yet, run**:

In [None]:
!pip install datasets

### Loading the Cats vs. Dogs Dataset

Hugging Face hosts a version of the Cats vs. Dogs dataset under `microsoft/cats_vs_dogs`. The code below automatically downloads and caches the dataset.

In [None]:
import torch
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import numpy as np
from datasets import load_dataset
import PIL  # For converting array-based images to PIL before transforms

# For reproducibility
torch.manual_seed(42)


# Load the entire 'train' split
dataset = load_dataset("microsoft/cats_vs_dogs", split="train")
print(dataset)

The dataset has ~23k images, so we sample a smaller subset for faster experimentation. For example:

In [None]:
# Shuffle and select the first 1,000 images
dataset = dataset.shuffle(seed=42).select(range(1000))

### Splitting into Train, Validation, and Test Sets

We can use the built-in `train_test_split` method on our `Dataset` object to create train/validation/test subsets.

In [None]:
# 80% train, 20% temporary (val+test)
train_val = dataset.train_test_split(test_size=0.2, seed=42)

# From the remaining 20%, split equally into val (10%) and test (10%)
val_test = train_val["test"].train_test_split(test_size=0.5, seed=42)

train_ds = train_val["train"]  # 80%
val_ds   = val_test["train"]   # 10%
test_ds  = val_test["test"]    # 10%

print("Train size:", len(train_ds))
print("Val size:", len(val_ds))
print("Test size:", len(test_ds))

### Image Transformations

We’ll define a transformation pipeline that:

1. Resizes each image to 150×150.
Randomly flips images horizontally (data augmentation).
2. Normalizes pixel values.

In [None]:
from torchvision.transforms import v2 as T

transform = T.Compose([
    T.ToImage(),  # converts input image (PIL) to a tv_tensors.Image
    T.RandomResizedCrop(
        size=(150, 150),
        scale=(0.5, 1.0),         # Adjust these values based on how much random zoom you need
        ratio=(0.75, 1.33),       # Adjust aspect ratio range if needed
        antialias=True
    ),
    T.RandomHorizontalFlip(),
    T.ToDtype(torch.float32, scale=True),  # Converts from uint8 [0, 255] to float [0, 1]
    T.Normalize(mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225]),
])

def apply_transform(batch):
    # We apply the transform to each one individually:
    batch["image"] = [transform(img) for img in batch["image"]]
    return batch

train_ds.set_transform(apply_transform)
val_ds.set_transform(apply_transform)
test_ds.set_transform(apply_transform)

### Creating PyTorch DataLoaders

Now that our subsets are ready, we can create PyTorch `DataLoader`s to handle batching and shuffling:

In [None]:
batch_size = 32

train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
val_loader   = DataLoader(val_ds, batch_size=batch_size, shuffle=False)
test_loader  = DataLoader(test_ds, batch_size=batch_size, shuffle=False)

Let’s inspect a batch of images to ensure they look correct and confirm that augmentation is happening.

In [None]:
import torchvision
import torch
import matplotlib.pyplot as plt
import numpy as np

def denormalize(img_tensor, mean, std):
    """
    Denormalizes a single image tensor using the given mean and std.
    img_tensor: (C, H, W)
    mean, std: lists of length C
    """
    # clone to avoid modifying tensor in-place
    img_tensor = img_tensor.clone().detach()
    for c in range(img_tensor.shape[0]):
        img_tensor[c] = img_tensor[c] * std[c] + mean[c]
    return img_tensor

def imshow(img_tensor, mean, std):
    # Denormalize
    img_tensor = denormalize(img_tensor, mean, std)
    # Move channel dimension to the end for plotting
    np_img = img_tensor.permute(1, 2, 0).numpy()
    # Clip values to valid [0, 1] or [0, 255] range if needed
    np_img = np.clip(np_img, 0, 1)
    plt.imshow(np_img)
    plt.axis('off')
    plt.show()


# Fetch one batch
batch = next(iter(train_loader))
images = batch["image"]
labels = batch["labels"]
print("Batch shape:", images.shape, labels.shape)

# Create a grid of 8 images (2 rows, 4 columns for instance)
grid_img = torchvision.utils.make_grid(images[:8], nrow=4)
imshow(grid_img, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
plt.show()

print("Labels:", labels[:8].tolist())  # 0 for cat, 1 for dog (in this dataset)

It's often helpful to see how many cats vs. dogs are in your dataset. We can quickly iterate over the training set:


In [None]:
from collections import Counter

all_labels = []
for batch in train_loader:
    all_labels.extend(batch["labels"].tolist())

counts = Counter(all_labels)
print("Class 0 (Cats):", counts[0])
print("Class 1 (Dogs):", counts[1])

❓ Do you see a large imbalance? If yes, what techniques can you consider to address it?

## Task 3: Building the ConvNet (CNN) Model in PyTorch

A CNN typically consists of:
- **Convolutional layers** to extract spatial features.
- **Activation functions** (e.g., ReLU) to introduce non-linearity.
- **Pooling layers** (e.g., MaxPooling) to reduce spatial dimensions and parameters.

Below is a simple example of a CNN class with multiple convolutional blocks and a final fully connected (dense) layer.



In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class SimpleCNN(nn.Module):
    def __init__(self, num_classes=2):
        super(SimpleCNN, self).__init__()

        # Convolutional Block 1
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1)
        self.bn1   = nn.BatchNorm2d(32)           # Batch Normalization (optional)
        self.pool1 = nn.MaxPool2d(kernel_size=2)  # Halves the spatial dimensions

        # Convolutional Block 2
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1)
        self.bn2   = nn.BatchNorm2d(64)
        self.pool2 = nn.MaxPool2d(kernel_size=2)

        # Convolutional Block 3
        self.conv3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1)
        self.bn3   = nn.BatchNorm2d(128)
        self.pool3 = nn.MaxPool2d(kernel_size=2)

        # Use AdaptiveMaxPool2d to get a consistent 9×9 output
        self.adaptive_pool = nn.AdaptiveMaxPool2d((9, 9))

        # Fully Connected Layers
        self.fc1   = nn.Linear(128*9*9, 512)
        self.dropout = nn.Dropout(p=0.5)  # Dropout for regularization
        self.fc2   = nn.Linear(512, num_classes)

    def forward(self, x):
        # Block 1
        x = self.conv1(x)
        x = self.bn1(x)
        x = F.relu(x)
        x = self.pool1(x)

        # Block 2
        x = self.conv2(x)
        x = self.bn2(x)
        x = F.relu(x)
        x = self.pool2(x)

        # Block 3
        x = self.conv3(x)
        x = self.bn3(x)
        x = F.relu(x)
        x = self.pool3(x)


        # Adaptive max pooling to 9×9
        x = self.adaptive_pool(x)

        # Flatten: (batch_size, 128, 9, 9) → (batch_size, 128*9*9)
        x = x.view(x.size(0), -1)

        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout(x)  # Dropout
        x = self.fc2(x)      # Final layer (logits)

        return x

### Incorporating Regularisation

- Dropout: Dropout randomly zeros a fraction (`p`) of the neurons during training, reducing overfitting by preventing co-adaptation of features.


- Batch Normalization:
Applied after each convolutional layer, it normalises the activations. This can accelerate training and improve stability, often allowing higher learning rates.

You can experiment with adding/removing dropout or batch normalisation to observe their effects on overfitting.

### Model Summary

After defining the CNN, you can instantiate and print its structure to confirm the number of parameters and layer arrangement.

In [None]:
model = SimpleCNN(num_classes=2)
print(model)

❓What does the output say about each layer?

To get the total parameter count, you can do:

In [None]:
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Total Params: {total_params}")
print(f"Trainable Params: {trainable_params}")

## Task 4: Training, Evaluation, and Visualization


With the CNN model defined, you’re ready to:

- Initialize an optimizer (e.g., Adam or SGD) and loss function (e.g., CrossEntropyLoss for 2-class classification).
- Train the model by looping over batches from your data loaders.
- Evaluate on validation and test sets.

### Training Setup

1. **Model & Device**: Instantiate your CNN (e.g., `SimpleCNN`) and move it to the appropriate device (CPU or GPU).  
2. **Loss Function**: Use `nn.CrossEntropyLoss` for a 2-class (cat vs. dog) classification problem.  
3. **Optimizer**: Choose an optimizer like **Adam** or **SGD**.  
4. **Data Loaders**: Use the `train_loader` and `val_loader` created in Task 1 for training and validation.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

# Assume SimpleCNN is defined (from Task 3)
# and train_loader, val_loader, test_loader are defined (from Task 2)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

model = SimpleCNN(num_classes=2).to(device)  # Move model to GPU if available
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-4)

### Model Training

We'll define two helper functions to simplify the main training loop:

- `train_one_epoch`: Runs a single epoch over the training set, computing the average loss and accuracy.
- `evaluate`: Evaluates on a given data loader (e.g., validation), returning loss and accuracy.

In [None]:
def train_one_epoch(model, dataloader, optimizer, criterion, device):
    model.train()  # Set model to training mode
    running_loss = 0.0
    running_corrects = 0
    total_samples = 0

    for batch in dataloader:
        images, labels = batch["image"], batch["labels"]
        images, labels = images.to(device), labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Statistics
        _, preds = torch.max(outputs, 1)
        running_loss += loss.item() * images.size(0)
        running_corrects += torch.sum(preds == labels).item()
        total_samples += images.size(0)

    epoch_loss = running_loss / total_samples
    epoch_acc = running_corrects / total_samples
    return epoch_loss, epoch_acc

def evaluate(model, dataloader, criterion, device):
    model.eval()  # Set model to evaluation mode
    running_loss = 0.0
    running_corrects = 0
    total_samples = 0

    with torch.no_grad():
        for batch in dataloader:
            images, labels = batch["image"], batch["labels"]
            images, labels = images.to(device), labels.to(device)

            outputs = model(images)
            loss = criterion(outputs, labels)

            _, preds = torch.max(outputs, 1)
            running_loss += loss.item() * images.size(0)
            running_corrects += torch.sum(preds == labels).item()
            total_samples += images.size(0)

    val_loss = running_loss / total_samples
    val_acc = running_corrects / total_samples
    return val_loss, val_acc


## Main Training Loop
num_epochs = 10  # Adjust based on your dataset size and desired training time
train_losses, val_losses = [], []
train_accuracies, val_accuracies = [], []

for epoch in range(num_epochs):
    # Train for one epoch
    train_loss, train_acc = train_one_epoch(model, train_loader, optimizer, criterion, device)

    # Evaluate on the validation set
    val_loss, val_acc = evaluate(model, val_loader, criterion, device)

    # Store metrics
    train_losses.append(train_loss)
    train_accuracies.append(train_acc)
    val_losses.append(val_loss)
    val_accuracies.append(val_acc)

    print(f"Epoch [{epoch+1}/{num_epochs}] "
          f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}, "
          f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")

### Evaluation and Visualization

We can plot training curves to visualize how loss and accuracy change over epochs:

In [None]:
import matplotlib.pyplot as plt

epochs_range = range(1, num_epochs + 1)

plt.figure(figsize=(12, 5))

# Plot Loss
plt.subplot(1, 2, 1)
plt.plot(epochs_range, train_losses, label='Train Loss')
plt.plot(epochs_range, val_losses, label='Val Loss')
plt.title('Loss over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

# Plot Accuracy
plt.subplot(1, 2, 2)
plt.plot(epochs_range, train_accuracies, label='Train Accuracy')
plt.plot(epochs_range, val_accuracies, label='Val Accuracy')
plt.title('Accuracy over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.show()

❓ Do you observe overfitting or underfitting?


After choosing your best model (e.g., the one at the last epoch or with the best validation accuracy), evaluate on unseen test data:

In [None]:
test_loss, test_acc = evaluate(model, test_loader, criterion, device)
print(f"Test Loss: {test_loss:.4f}, Test Accuracy: {test_acc:.4f}")


It is important to check how the model performs on individual test images. You can display a few images and compare predictions to ground truth labels:

In [None]:
import torchvision

model.eval()
data_iter = iter(test_loader)
batch = next(data_iter)
images, labels = batch["image"], batch["labels"]
images, labels = images.to(device), labels.to(device)

# Get predictions
outputs = model(images)
_, preds = torch.max(outputs, 1)

# Move tensors back to CPU for plotting
images = images.cpu()
labels = labels.cpu()
preds = preds.cpu()

# Display some images with predicted vs. actual labels
class_names = ['cat', 'dog']  # Adjust if your dataset is reversed
plt.figure(figsize=(10, 8))
for i in range(8):
    plt.subplot(2, 4, i+1)
    img = denormalize(images[i], mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.title(f"Pred: {class_names[preds[i]]} | True: {class_names[labels[i]]}")
    plt.axis('off')
plt.show()

### Saving model

You can save your trained model's parameters for future use or inference:

In [None]:
torch.save(model.state_dict(), 'cats_vs_dogs_cnn.pth')
print("Model saved as cats_vs_dogs_cnn.pth")

Later you can load it back.

In [None]:
model_loaded = SimpleCNN(num_classes=2)
model_loaded.load_state_dict(torch.load('cats_vs_dogs_cnn.pth'))
model_loaded.to(device)
model_loaded.eval()

You now have a complete end-to-end pipeline for cats vs. dogs classification using PyTorch: from dataset preparation (Task 2) and model building (Task 3) to training and evaluation (Task 4).

## More Exercises

### Exercise 1: Custom Data Splitting

   - Instead of using automatic splitting (like `train_test_split`), manually create `train/`, `val/`, and `test/` folders.  
   - Ensure each split has a balanced distribution of cats and dogs (e.g., 80% train, 10% val, 10% test).
   - Print how many cat vs. dog images are in each split. Check for imbalance.


### Exercise 2: More Data Augmentation

   - Extend the `train_transform` pipeline with `RandomRotation`, `ColorJitter`, or `RandomResizedCrop`.  
   - Compare training performance (loss/accuracy) with and without these additional augmentations.
   - Temporarily remove *all* augmentations. Observe if the model overfits more quickly (e.g., higher training accuracy, lower validation accuracy).
   - Display 5 augmented images (with flips, rotations, color jitter) to see how they differ from the originals.



## Additional Reading

- [Training a Classifier](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html)