<a href="https://colab.research.google.com/github/urness/CS167Fall2025/blob/main/Day21_CNNs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CS167: Day21
## Intro to Convolutional Neural Networks (CNNs)

#### CS167: Machine Learning, Fall 2025


## __Put the Model on Training Device (GPU or CPU)__


We want to accelerate the training process using graphical processing unit (GPU). Fortunately, in Colab we can access for GPU. You need to enable it from _Runtime (or click on the down arrow near RAM & DISK in upper right)-->Change runtime type-->GPU or TPU_

In [None]:
# check to see if torch.cuda is available, otherwise it will use CPU
import torch
import torch.nn as nn
import numpy as np
device = (
    "cuda"
    if torch.cuda.is_available()
    else "cpu"
)
print(f"Using {device} device")
# if it prints 'cuda' then colab is running using GPU device



---



#__Load the Dataset for your CNN__

We can easily import some [built-in datasets](https://pytorch.org/vision/stable/datasets.html) from PyTorch's `torchvision.datasets` module
- [Fashion-MNIST](https://github.com/zalandoresearch/fashion-mnist)
  - each image size: 28x28 grayscale image
  - each image is associated with a label from __10 classes__
  - training set of 60,000 examples and a test set of 10,000 examples

<div>
<img src="https://analytics.drake.edu/~reza/teaching/cs167_sp25/notes/images/fashion-mnist-sprite.png" width=500/>
</div>


In [None]:
# import libraries
import torch
from torch.utils.data import Dataset
from torchvision import datasets # torchvision has many deep learning benchmark datasets Fashion-MNIST, CIFAR-10, Caltech-50, etc
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt

In [None]:
# download FashionMNIST data
training_data = datasets.FashionMNIST(
    root="/content/drive/MyDrive/CS167/datasets", # headsup! You can replace this path so that it points to a directory in your Google Drive
    train=True,
    download=True,
    transform=ToTensor() # specify the feature and label transformations
)

test_data = datasets.FashionMNIST(
    root="/content/drive/MyDrive/CS167/datasets", # headsup! You can replace this path so that it points to a directory in your Google Drive
    train=False,
    download=True,
    transform=ToTensor()
)

__Explore some sample training images__

In [None]:
# Visualize a random set of images and their the labels from the training split
# The following labels represent 10 classes, each with specific indices as defined by the creator of the Fashion-MNIST dataset
# reference: https://github.com/zalandoresearch/fashion-mnist#labels
labels_map = {
    0: "T-Shirt",
    1: "Trouser",
    2: "Pullover",
    3: "Dress",
    4: "Coat",
    5: "Sandal",
    6: "Shirt",
    7: "Sneaker",
    8: "Bag",
    9: "Ankle Boot",
}
figure = plt.figure(figsize=(5, 5))
cols, rows = 5, 2
for i in range(1, cols * rows + 1):
    sample_idx = torch.randint(len(training_data), size=(1,)).item()
    img, label = training_data[sample_idx]
    figure.add_subplot(rows, cols, i)
    plt.title(labels_map[label])
    plt.axis("off")
    #print('image tensor size:', img.shape)
    plt.imshow(img.squeeze(), cmap="gray") # .squeeze() method removes the '1' from first dimension of the tensor [1, 28, 28]
    #print('after removing the first dimension with ', img.squeeze().shape)
plt.show()

##__Prepare Your Data with DataLoader for Training/Testing__
We just explored one sample of data at a time. As we have seen in our discussion of the optimizer, specifically __Stochastic Gradient Descent (SGD)__, during training your network, we may need to pass them in __minibatches__. PyTorch has a module called __DataLoader__, which will do this automatically for us as long as we provide the right arguments:
- prepare the __minibatches__ with the given _batch_size_ eg 16, 32, 64, 128, etc
- multiprocessing to speed up the data retrieval
- reshuffle the data at every __epoch__


In [None]:
from torch.utils.data import DataLoader
#                              pairs of items,   minibatch size,        random shuffling turned ON
train_dataloader = DataLoader(training_data,     batch_size=128,        shuffle=True)
test_dataloader  = DataLoader(test_data,         batch_size=128,        shuffle=False) # for testing/inference: it is not necessary to shuffle

In [None]:

# explore the data from the train_dataloader
train_inputs, train_labels = next(iter(train_dataloader)) # returns a batch of 128 train-images and train-labels
print(f"Images batch shape: {train_inputs.shape}")
print(f"Labels batch shape: {train_labels.shape}")

# visualize one of the samples in this batch of 128
figure = plt.figure(figsize=(2, 2))
img = train_inputs[127].squeeze() # I picked 127 but you can pick any index in between 0 to batch_size=128
label = train_labels[127]         # It's a tensor datatype
plt.title(labels_map[int(label)]) # For indexing, convert the 'tensor' datatype --> integer number datatype
plt.imshow(img, cmap="gray")
plt.show()
print(f"Label: {label}")

#__Building Convolutional Neural Network (CNN)__

Create a network class with two methods:
- _init()_
- _forward()_

In general, we will follow this template for constructing other neural networks such as MLP in PyTorch. Here are the useful PyTorch modules we will be using for CNN construction:
- [nn.Conv2d()](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)
  - applies a 2D convolution over an input volume of $(C_{in}​,H_{in},W_{in})$ and produces an output volume of $(C_{out}​,H_{out},W_{out})$   between two adjacent layers.
  - to create this, you need to provide the followings:
    - __channel_dimension_of_input_layer__ i.e., $C_{in}$
    - __channel_dimension_of_output_layer__ i.e., $C_{out}$
    - __filter_size__ i.e., $F$

  - the other two optional parameters are __stride__: $S=1$ and __padding__: $P=0$, with default values as shown.


In [None]:
import torch
from torch import nn

class SimpleCNNv1(nn.Module):
  def __init__(self):
    super().__init__()
    # your network layer construction should take place here

    # note input image is greyscale and has dimension of [1,28,28]

    # Beginning layers: a series of 2D convolutional layers (useful for feature map learning from the grid layouts of an image)
    self.conv_layers = nn.Sequential(
            nn.Conv2d(1, 32, 3),   # -> maps input grey scale image (1 channel) to a conv layer of 32 channels; output dimension of [32,26,26]
            nn.ReLU(),
            nn.Conv2d(32, 64, 3),  # -> input of 32 channels to conv. layer of 64 channels; output dimensions of [64,24,24]
            nn.ReLU()
    )

    # --------------------------------------------------------------------------
    # -------                 heads up!                                 --------
    # you need to calculate the total_size_of_the_output_volume of your self.second_conv_2d layer,
    # as it will be needed by the upcoming nn.Linear(). This number will be used as the first argument for the next nn.Linear().
    # I pre-calculated this number, and it is 64*24*24 = 36864. I will plug this number in the next layer
    # --------------------------------------------------------------------------

    self.flatten = nn.Flatten() # -> flatten the tensor to prepare for a fully connected MLP layer; resulting layer is [64*24*24]

    self.linear_layers = nn.Sequential(
            nn.Linear(64*24*24, 128),
            nn.ReLU(),
            nn.Linear(128, 10)     # 10 is the number of classes in the classification task
    )

  def forward(self, x):
    # your code for Conv_2d forward pass should take place here
    output = self.conv_layers(x)
    output = self.flatten(output)
    output = self.linear_layers(output)
    return output


In [None]:
# check the structure of your CNN
cnn_model = SimpleCNNv1()
print(cnn_model)

#__Forward Pass using your Dataset and your CNN__
Test a forward pass of our first CNN using one of the training samples.
The forward method inside our network class, __SimpleCNNv1__, will be invoked if we provide an input tensor __X__ to the network object we instantiated earlier, i.e., __cnn_model__, as follows:
- _output = cnn_model(X)_

Finally, we convert the ouput from the model into probabilities using __Softmax()__ module:
- _pred_probab = softmax_activation(output)_

In [None]:
img   = train_inputs[100] # I picked 100 but you can pick any index in between 0 to batch_size-1=127
label = train_labels[100]

softmax_activation = nn.Softmax(dim=1)

# Load up the model
mlp_model = SimpleCNNv1()
mlp_model.eval

# data and model should be placed to the same device (either GPU or CPU)
X = img.unsqueeze(0).to(device)         # sending the data tensor to GPU (if available)
mlp_model.to(device)                      # sending the model to GPU (if available) print(f"device {device} and model: \n {mlp_model}")
output = mlp_model(X)

predicted_probability = softmax_activation(output)  # these raw numbers scaled to values [0, 1] representing the model’s predicted probabilities for each class

print('predited probability \n', predicted_probability)
y_pred = predicted_probability.argmax()
print(f"Predicted class: {y_pred}")
print(f"Actual class: {label}")


##__Defining Loss function__

- [nn.CrossEntropyLoss()](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)
  - useful when training a __classification problem__ with __C__ classes.
  - criterion computes the cross entropy loss between input logits (raw scores before softmax) and target
- [nn.MSELoss()](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss)
  - useful when training a __regression problem__
  - criterion that measures the mean squared error (squared L2 norm) between each element in the input _x_ and target _y_


In [None]:
# initialize the loss function
loss_fn = nn.CrossEntropyLoss() # this is useful for multiclass classification task

##__Initializing the Optimizer__

Optimiztaion, as we have discussed earlier, is process of adjusting model parameters to reduce model error in each training step.

PyTorch provides a selection of optimization algorithms in the [torch.optim](https://pytorch.org/docs/stable/optim.html) package. Some of them are as follows:
- [torch.optim.SGD](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html#torch.optim.SGD)
- [torch.optim.Adam](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam)
- [torch.optim.RMSprop](https://pytorch.org/docs/stable/generated/torch.optim.RMSprop.html#torch.optim.RMSprop)

In addition to selecting the optimizer, we can also select the hyperparameters which are referred to as *adjustable parameters* crucial for controlling the model optimization process. You can influence the training and convergence of the model by tweaking these hyperparameters:
- __epochs:__ denotes the number of iterations over the dataset
- __batch size:__ represents the quantity of data samples in each iteration propagated through the network before updating the parameters
- __learning rate:__ determines the extent of parameter updates made at each batch/epoch



In [None]:
learning_rate = 1e-3
batch_size    = 64
epochs        = 10
# let's use SGD optimization algorithm for training our model
optimizer     = torch.optim.SGD(cnn_model.parameters(), lr=learning_rate)

#__Putting Everything Together CNN__

In [None]:
# Step 1: load the Torch library and other utilities
#----------------------------------------------------

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
import time

In [None]:
# Step 2: load the dataset, ie, we are experimenting with FashionMNIST
#--------------------------------------------------------------------------------------------------

training_data = datasets.FashionMNIST(
    root="/content/drive/MyDrive/CS167/datasets", # headsup! You can replace this path so that it points to a directory in your Google Drive
    train=True,
    download=True,
    transform=ToTensor() # specify the feature and label transformations
)

test_data = datasets.FashionMNIST(
    root="/content/drive/MyDrive/CS167/datasets", # headsup! You can replace this path so that it points to a directory in your Google Drive
    train=False,
    download=True,
    transform=ToTensor()
)

In [None]:
# Step 3: Create your CNN Network (call it SimpleCNNv2) with 2 conv_2d layers + 2 layers of MLP
#--------------------------------------------------------------------------------------------------

class SimpleCNNv1(nn.Module):
  def __init__(self):
    super().__init__()
    # your network layer construction should take place here

    # note input image is greyscale and has dimension of [1,28,28]

    # Beginning layers: a series of 2D convolutional layers (useful for feature map learning from the grid layouts of an image)
    self.conv_layers = nn.Sequential(
            nn.Conv2d(1, 32, 3),   # -> maps input grey scale image (1 channel) to a conv layer of 32 channels; output dimension of [32,26,26]
            nn.ReLU(),
            nn.Conv2d(32, 64, 3),  # -> input of 32 channels to conv. layer of 64 channels; output dimensions of [64,24,24]
            nn.ReLU()
    )

    # --------------------------------------------------------------------------
    # -------                 heads up!                                 --------
    # you need to calculate the total_size_of_the_output_volume of your self.second_conv_2d layer,
    # as it will be needed by the upcoming nn.Linear(). This number will be used as the first argument for the next nn.Linear().
    # I pre-calculated this number, and it is 64*24*24 = 36864. I will plug this number in the next layer
    # --------------------------------------------------------------------------

    self.flatten = nn.Flatten() # -> flatten the tensor to prepare for a fully connected MLP layer; resulting layer is [64*24*24]

    self.linear_layers = nn.Sequential(
            nn.Linear(64*24*24, 128),
            nn.ReLU(),
            nn.Linear(128, 10)     # 10 is the number of classes in the classification task
    )

  def forward(self, x):
    # your code for Conv_2d forward pass should take place here
    output = self.conv_layers(x)
    output = self.flatten(output)
    output = self.linear_layers(output)
    return output

In [None]:
## Step 4: Your training and testing functions (updated -- now outputs accuracy to be visualized)
#--------------------------------------------------------------------------------------

def train_loop(dataloader, model, loss_fn, optimizer):
    """
    Executes one full training epoch for the given model.

    Iterates over all batches in the provided DataLoader, performing the following steps:
    - Moves input and target tensors to the selected device (CPU or GPU)
    - Computes predictions and loss for each batch
    - Performs backpropagation and optimizer updates
    - Tracks and prints training loss periodically

    Args:
        dataloader (torch.utils.data.DataLoader):
            The DataLoader providing batches of training data (inputs and labels).
        model (torch.nn.Module):
            The neural network model to be trained.
        loss_fn (torch.nn.Module or callable):
            The loss function used to compute the training loss.
        optimizer (torch.optim.Optimizer):
            The optimizer responsible for updating the model’s parameters.

    Returns:
        float: The average training loss across all batches in this epoch.
    """
    size = len(dataloader.dataset)

    model.train()                   # set the model to training mode for best practices

    size        = len(dataloader.dataset)
    train_loss, correct = 0, 0

    for batch, (X, y) in enumerate(dataloader):

        # compute prediction and loss
        X = X.to(device)                  # send data to the GPU device (if available)
        y = y.to(device)
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()      # compute gradients
        optimizer.step()     # apply updates
        optimizer.zero_grad()# clear old gradients

        train_loss += loss.item()
        correct += (pred.argmax(1) == y).type(torch.float).sum().item()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")
    correct /= size

    return train_loss/len(dataloader), 100*correct

def test_loop(dataloader, model, loss_fn):
    """
    Evaluates the model’s performance on a test (or validation) dataset.

    Runs a forward pass over all batches in the provided DataLoader with gradient
    computation disabled, accumulating loss and accuracy metrics.

    Args:
        dataloader (torch.utils.data.DataLoader):
            The DataLoader providing batches of test or validation data.
        model (torch.nn.Module):
            The trained neural network model to evaluate.
        loss_fn (torch.nn.Module or callable):
            The loss function used to compute the evaluation loss.

    Returns:
        float: The average loss over all test batches.

    Prints:
        Accuracy (% of correct predictions) and average test loss.
    """

    model.eval()                    # set the model to evaluation mode for best practices

    size        = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad():
        for X, y in dataloader:

            X = X.to(device)                     # send data to the GPU device (if available)
            y = y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
    return test_loss, 100*correct


In [None]:

# Step 5: prepare the DataLoader and select your optimizer and set the parameters for learning the model from DataLoader
#------------------------------------------------------------------------------------------------------------------------------
mlp_model = SimpleCNNv1() ## model Class name here
mlp_model.to(device)      ## device should have been determined earlier (at top of notebook)
learning_rate = 0.001
batch_size_val = 64
epochs = 10
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(mlp_model.parameters(), lr=learning_rate)

train_dataloader = DataLoader(training_data, batch_size=batch_size_val)
test_dataloader = DataLoader(test_data, batch_size=batch_size_val)


train_losses = []
test_losses  = []
train_accuracy = []
test_accuracy  = []
start_time   = time.time()
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    avg_train_loss, train_acc = train_loop(train_dataloader, mlp_model, loss_fn, optimizer)
    avg_test_loss, test_acc  = test_loop(test_dataloader, mlp_model, loss_fn)
    train_losses.append(avg_train_loss)
    test_losses.append(avg_test_loss)
    train_accuracy.append(train_acc)
    test_accuracy.append(test_acc)

print("Done!")

print("Total fine-tuning time: %.3f sec" %( (time.time()-start_time)) )
print("Total fine-tuning time: %.3f hrs" %( (time.time()-start_time)/3600) )

print(mlp_model.__class__.__name__, " model has been trained!")


In [None]:
#######
# visualizing the accuracy curves
plt.plot(range(1,epochs+1), train_accuracy)
plt.plot(range(1,epochs+1), test_accuracy)
plt.title('Model accuracy after each epoch')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'])
plt.show()


In [None]:
import matplotlib.pyplot as plt

epochs = range(1, num_epochs + 1)

# Accuracy
plt.figure()
plt.plot(epochs, history["train_acc"], label="Train accuracy")
plt.plot(epochs, history["test_acc"],  label="Test accuracy")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.title("Accuracy vs. Epoch")
plt.legend()
plt.grid(True)
plt.show()

# (Optional) Loss
plt.figure()
plt.plot(epochs, history["train_loss"], label="Train loss")
plt.plot(epochs, history["test_loss"],  label="Test loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Loss vs. Epoch")
plt.legend()
plt.grid(True)
plt.show()

In [None]:
# Now with a trained model.... let's see how well it does on a few specific examples:

labels_map = {
    0: "T-Shirt",
    1: "Trouser",
    2: "Pullover",
    3: "Dress",
    4: "Coat",
    5: "Sandal",
    6: "Shirt",
    7: "Sneaker",
    8: "Bag",
    9: "Ankle Boot",
}

from torch.utils.data import DataLoader
test_dataloader  = DataLoader(test_data,         batch_size=128,        shuffle=False) # for testing/inference: it is not necessary to shuffle
# we need to load data a batch at a time -- loading all of the data in memory is not efficient (or even possible sometimes)

test_inputs, test_labels = next(iter(test_dataloader)) # returns a batch of 128 train-images and train-labels

mlp_model.eval() # puts model into evaluation mode (training = False)
images_shown = 24

X_batch, y_batch = next(iter(test_dataloader)) # returns a batch of 128 train-images and train-labels
X_batch = X_batch.to(device)
y_batch = y_batch.to(device)

test_inputs, test_labels = next(iter(test_dataloader)) # returns a batch of 128 train-images and train-labels
test_inputs = test_inputs.to(device) #make sure we are on the same device (GPU or CPU)
test_labels = test_labels.to(device)

# run a forward pass -- no need to compute gradients
with torch.no_grad():
    logits = mlp_model(X_batch)

# what are the predictions?
preds = logits.argmax(dim=1)

# plot values in a grid
plt.figure(figsize=(10,6))
for i in range(images_shown):
    ax = plt.subplot(3, 8, i+1)
    plt.imshow(test_inputs[i].cpu().squeeze(), cmap="gray", interpolation="nearest")
    title = f"P: {labels_map[int(preds[i])]}"
    if preds[i] == test_labels[i]:
        title += " ✓"
    else:
        title += f" ✗ (T: {labels_map[int(test_labels[i])]})"
    ax.set_title(title, fontsize=8)
    ax.axis("off")
plt.tight_layout()
plt.show()