# **CIS 4190/5190 Homework 3 - Fall 2025**

In [None]:
import random
import numpy as np
import pandas as pd
import os
import sys
import matplotlib.pyplot as plt
from numpy.linalg import *
from sklearn.decomposition import PCA
from sklearn import preprocessing
from scipy.spatial import distance

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import transforms
from torchvision.transforms import ToTensor
from torch.utils.data import Dataset, DataLoader
from PIL import Image

# Random seeds for reproducibility
np.random.seed(42)
random.seed(42)

import base64

In [None]:
# For autograder only, do not modify this cell.
# True for Google Colab, False for autograder
NOTEBOOK = (os.getenv('IS_AUTOGRADER') is None)
if NOTEBOOK:
    print("[INFO, OK] Google Colab.")
else:
    print("[INFO, OK] Autograder.")

# **PennGrader Setup**
First, you'll need to set up the PennGrader, an autograder we are going to use throughout the semester. The PennGrader will automatically grade your answer and provide you with an instant feedback. Unless otherwise stated, you can resubmit up to a reasonable number of attempts (e.g. 100 attemptes per day). **We will only record your latest score in our backend database**.

After finishing each homework assignment, you must submit your iPython notebook to gradescope before the homework deadline. Gradescope will then retrive and display your scores from our backend database.

In [None]:
%%capture
!pip install penngrader-client

In [None]:
%%writefile student_config.yaml
grader_api_url: 'https://23whrwph9h.execute-api.us-east-1.amazonaws.com/default/Grader23'
grader_api_key: 'flfkE736fA6Z8GxMDJe2q8Kfk8UDqjsG3GVqOFOa'

In [None]:
from penngrader.grader import *

In [None]:
#PLEASE ENSURE YOUR PENN-ID IS ENTERED CORRECTLY. IF NOT, THE AUTOGRADER WON'T KNOW WHO
#TO ASSIGN POINTS TO YOU IN OUR BACKEND
STUDENT_ID = 12345678          # YOUR PENN-ID GOES HERE AS AN INTEGER#

Run the following cell to initialize the autograder. This autograder will let you submit your code directly from this notebook and immediately get a score.

**NOTE:** Remember we store your submissions and check against other student's submissions... so, not that you would, but no cheating.

In [None]:
#GRADER TODO
grader = PennGrader('student_config.yaml', 'cis5190_f25_HW3', STUDENT_ID, STUDENT_ID)

In [None]:
# Serialization code needed by the autograder
import inspect, sys
from IPython.core.magics.code import extract_symbols

def new_getfile(object, _old_getfile=inspect.getfile):
    if not inspect.isclass(object):
        return _old_getfile(object)

    # Lookup by parent module (as in current inspect)
    if hasattr(object, '__module__'):
        object_ = sys.modules.get(object.__module__)
        if hasattr(object_, '__file__'):
            return object_.__file__

    # If parent module is __main__, lookup by methods (NEW)
    for name, member in inspect.getmembers(object):
        if inspect.isfunction(member) and object.__qualname__ + '.' + member.__name__ == member.__qualname__:
            return inspect.getfile(member)
    else:
        raise TypeError('Source for {!r} not found'.format(object))
inspect.getfile = new_getfile

def grader_serialize(obj):
    cell_code = "".join(inspect.linecache.getlines(new_getfile(obj)))
    class_code = extract_symbols(cell_code, obj.__name__)[0][0]
    return class_code

## **Datasets**
Next, we will download the dataset from Github to your local runtime. After successful download, you may verify that all datasets are present in your Colab instance.

- [cis5190_hw3_observations.csv](https://raw.githubusercontent.com/upenn/cis-4190-5190-fall-25/main/hw3/cis5190_hw3_observations.csv)
- [cis5190_hw3_test_student.csv](https://raw.githubusercontent.com/upenn/cis-4190-5190-fall-25/main/hw3/cis5190_hw3_test_student.csv)


#### Acknowledgement
Dataset obtained from kaggle.com [Hourly Weather Surface - Brazil (Southeast region)](https://www.kaggle.com/PROPPG-PPG/hourly-weather-surface-brazil-southeast-region/metadata )

In [None]:
if NOTEBOOK:
  if not os.path.exists("cis5190_hw3_observations.csv"):
    !wget https://raw.githubusercontent.com/upenn/cis-4190-5190-fall-25/main/hw3/cis5190_hw3_observations.csv
  if not os.path.exists("cis5190_hw3_test_student.csv"):
    !wget https://raw.githubusercontent.com/upenn/cis-4190-5190-fall-25/main/hw3/cis5190_hw3_test_student.csv


#**1. [4190: 12 autograded, 4 manual; 5190: 12 autograded, 9 manually graded] K-means Clustering**

We will implement the K-means clustering algorithm using the Breast Cancer dataset. As with all unsupervised learning problems, our goal is to discover and describe some hidden structure in unlabeled data. The K-means algorithm, in particular, attempts to determine how to separate the data into <em>k</em> distinct groups over a set of features ***given that we know (are provided) the value of k***.

Knowing there are <em>k</em> distinct 'classes' however, doesn't tell anything about the content/properties within each class. If we could find samples that are representative of each of these *k* groups, then we could label the rest of the data based on how similar they are to each of the prototypical samples. We will refer to these representatives as the centroids (cluster centers) that correspond to each cluster.

## **1.1. Import the dataset**


In [None]:
from sklearn.datasets import load_breast_cancer
cancer_dataset = load_breast_cancer()

# STUDENT TODO START:
"""
First load the dataset X from cancer_dataset.
X -  (m, n) -> m x n matrix where m is the number of training points = 569 and n is the number of features = 30
"""
# STUDENT TODO END

## **1.2. [10 pts] K-means clustering implementation**

We will first implement a class for K-means clustering.<br>
These are the main functions: <br>
- `__init__`: The constructor (This is implemented for you)
- `fit`: Entrypoint function that takes in the dataset (X) as well as centroid initializations and returns:
    - the cluster labels for each row (data point) in the dataset
    - list of centroids corresponding to each cluster
    - no of iterations taken to converge.

Inside the `fit()` function, you will need to implement the actual K-means functionality. <br>
The K-means process you should follow is listed below:
1. Initialize each of the `n_clusters` centroids to a **random datapoint** if initialization is not provided.
2. Update each data point's cluster to the closest *centroid*
3. Calculate the new *centroid* of each cluster
4. Repeat the previous two steps until no centroid value changes. Make sure you break out of the loop reagrdless of whether you converged or not once max iterations `max_iter` are reached.

To help streamline this process, three helper functions have been given to you in the `KMeans` class:
- `compute_distance()`: used for Step 2 above
- `find_closest_cluster()`: used for Step 2 above
- `compute_centroid()`: use for Step 3 above


In [None]:
class KMeans:
    '''Implementing Kmeans clustering'''

    def __init__(self, n_clusters, max_iter=1000):
        self.n_clusters = n_clusters
        self.max_iter = max_iter

    def compute_centroids(self, X, clusters):
        """
        Computes new centroids positions given the clusters

        INPUT:
        X - m by n matrix, where m is the number of training points
        clusters -  m dimensional vector, where m is the number of training points
                    At an index i, it contains the cluster id that the i-th datapoint
                    in X belongs to.

        OUTPUT:
        centroids - k by n matrix, where k is the number of clusters.
        """
        centroids = np.zeros((self.n_clusters, X.shape[1]))
        # STUDENT TODO START:

        # STUDENT TODO END
        return centroids

    def compute_distance(self, X, centroids):
        """
        Computes the distance of each datapoint in X from the centroids of all the clusters

        INPUT:
        X - m by n matrix, where m is the number of training points
        centroids - k by n matrix, where k is the number of clusters

        OUTPUT:
        dist - m by k matrix, for each datapoint in X, the distances from all the k cluster centroids.

        """
        dist = np.zeros((X.shape[0], self.n_clusters))
        # STUDENT TODO START:

        # STUDENT TODO END
        return dist

    def find_closest_cluster(self, dist):
        """
        Finds the cluster id that each datapoint in X belongs to

        INPUT:
        dist - m by k matrix, for each datapoint in X, the distances from all the k cluster centroids.

        OUTPUT:
        clusters - m dimensional vector, where m is the number of training points
                    At an index i, it contains the cluster id that the i-th datapoint
                    in X belongs to.

        """
        clusters = np.zeros(dist.shape[0])
        # STUDENT TODO START:

        # STUDENT TODO END
        return clusters

    def fit(self, X, init_centroids=None):
        """
        Fit KMeans clustering to given dataset X.

        INPUT:
        X - m by n matrix, where m is the number of training points
        init_centroids (optional) - k by n matrix, where k is the number of clusters

        OUTPUT:
        clusters - m dimensional vector, where m is the number of training points
                    At an index i, it contains the cluster id that the i-th datapoint
                    in X belongs to.
        centroids - k by n matrix, where k is the number of clusters.
                    These are the k cluster centroids, for cluster ids 0 to k-1
        iters_taken - total iterations taken to converge. Should not be more than max_iter.

        """
        # STUDENT TODO START:
        # Initialize centroids to random points in the dataset if not provided (i.e. None)

        # Iterate until kmeans converges or max_iters is reached. In each iteration:
        #  - Update each datapoint's cluster to that whose *centroid* is closest
        #  - Calculate the new *centroid* of each cluster
        #  - Repeat the previous two steps until no centroid value changes.

        # STUDENT TODO END
        return self.clusters, self.centroids, iters_taken

In [None]:
# Test case centroids should be around (1.5,1.5) and (4.5,4.5)
points = []
result = []
for _ in range(500):
  x = random.random()*3
  y = random.random()*3
  points.append((x,y))
  result.append(0)
for _ in range(500):
  x = random.random()*3 + 3
  y = random.random()*3 + 3
  points.append((x,y))
  result.append(1)
clf = KMeans(2)
points = np.asarray(points)

In [None]:
#Test for sanity check
def test_compute_centroids():
  clf = KMeans(2)
  centroid_p = clf.compute_centroids(np.array(points),np.array(result))
  centroid_r = [[1.54079082, 1.534581],
 [4.49834714,4.495329]]
  assert(np.linalg.norm(centroid_p - np.array(centroid_r)) <= 1e-2 )
test_compute_centroids()

In [None]:
# PennGrader Grading Cell
grader.grade(test_case_id = 'test_compute_centroids', answer = grader_serialize(KMeans))

In [None]:
def test_distance():
    centroid_r = [[1.5185255, 1.45970038],
      [4.51568108,4.54138552]]
    clf = KMeans(2)
    distance = clf.compute_distance(np.array(points),np.array(centroid_r))
    distance_for_0 = [1.44121815, 5.16670124]
    assert(np.linalg.norm(distance_for_0-distance[0]) <= 1e-2)
test_distance()

In [None]:
# PennGrader Grading Cell
grader.grade(test_case_id = 'test_distance', answer = grader_serialize(KMeans))

In [None]:
def test_find_clusters():
  centroid_r = [[1.5185255, 1.45970038],
      [4.51568108,4.54138552]]
  clf = KMeans(2)
  distance = clf.compute_distance(np.array(points),np.array(centroid_r))
  cluster = clf.find_closest_cluster(distance)
  assert(cluster[0] == 0)
test_find_clusters()

In [None]:
# PennGrader Grading Cell
grader.grade(test_case_id = 'test_find_clusters', answer = grader_serialize(KMeans))

In [None]:
def test_fit():
  clf = KMeans(2)
  clusters, centroids, _ = clf.fit(np.array(points),np.array([[1,1],[4,4]]))
  centroid_r = [[1.54079082, 1.534581],
      [4.49834714,4.495329]]
  assert(np.linalg.norm(centroids - np.array(centroid_r)) <= 1e-2 )
  assert(sum(np.array(clusters)-np.array(result)) == 0)
test_fit()

In [None]:
# PennGrader Grading Cell
grader.grade(test_case_id = 'test_fit', answer = grader_serialize(KMeans))

## **1.3. [2 pts] Compute distortion**

One way to decide on a value for k is to run K-means and plot the distortion (sum of squared error between each point and its assigned centroid). From that we can find the "elbow of the graph" that indicates the best tradeoff between number of clusters and corresponding distortion. See distortion equation below, where $C_i$ represents the centroid assigned to point i:
>$ \sum_{i=1}^{n}{(X_i - C_i)^2} $

In the function `test_cluster_size`, iterate over possible cluster sizes from 2 to a `max_cluster` (inclusive) value. For each *k* from 2 to `max_k`, run K-means and calculate its distortion.

In [None]:
def test_cluster_size(X, max_k):
    """
    Iterates over possible cluster from 2 to max_k, running k-means and calculating distortion.

    INPUT:
    X - m by n matrix, where m is the number of training points
    max_k - the maximum number of clusters to consider

    OUTPUT:
    scores - a list of scores, that contains the distortion for k = 2 to max_k, in order.
    """
    scores = [0] * (max_k-1)
    # STUDENT TODO START:

    # STUDENT TODO END
    return scores

In [None]:
def test_test_cluster_size():
  scores = test_cluster_size(np.array(points),5)
  assert(np.argmax(scores) == 0)
test_test_cluster_size()

In [None]:
# PennGrader Grading Cell
max_k = 20
scores = test_cluster_size(X, max_k)
grader.grade(test_case_id = 'test_test_cluster_size', answer = scores)

## **1.4. [2 pts, manually graded] Plot distortion vs. k (without feature scaling)**

Plot **distortion vs. different k values** by using the function we just wrote on dataset X (no feature scaling) and add it in the written report. Use `max_k` = 20. Determine the best k value from this plot and also mention it in the written report. Make sure your plot has **axes labels, a legend labeling the different k values, and a title**.

In [None]:
# STUDENT TODO START:

# STUDENT TODO END

## **1.5. [2 pts, manually graded] Plot distortion vs. k (with feature scaling)**

What we just did was running k-means clustering over the dataset X without any feature scaling. This time, we will rescale each feature to the standard range of (0,1) before passing it to k-means and computing the distortion.

Use `sklearn.preprocessing.MinMaxScaler` ([docs](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html)) and scale the dataset X before passing it to the `test_cluster_size` function. As before, plot **distortion vs. different k values** and add it in the written report. Use `max_k` = 20. Determine the best k value from this plot and also mention it in the written report. Make sure your plot has **axes labels, a legend labeling the different k values, and a title**.

In [None]:
# STUDENT TODO START:

# STUDENT TODO END

## **1.6. [5 pts, manually graded 5190 only] Comments about K-Means**

Answer these questions in the written report.

1. Why do you get different results with and without feature scaling?
2. Should you scale the features before fitting k-means? Why or why not?

# **2. [4190: 2 autograded, 8 manual; 5190: 2 autograded, 13 manual] Principal Component Analysis**

## **2.1. [6 pts, manually graded] Exploring Effects of Different Principal Components in Linear Regression**
We have introduced you a way of dimension reduction, Principal Component Analysis, in class. Now, we would like to ask you to apply PCA from `sklearn` on the breast cancer dataset to observe its performance and interpret the major components.

In order to better compare the effects of PCA, we load the labels from the dataset **without feature scaling**. Then, we will evaluate the performances of raw dataset and various numbers of PCA components on the `LinearRegression` classifier.

In the section, you are asked to draw a plot of **test accuracies vs number of different principal components**. The detailed instructions are included in the following cells. Remember to **attach the plot** in your written submission, along with **axes labels and a title**. Also **make comments** about what you observe, explain the reason behind the trend, and what conclusion you could draw from the graph.

In [None]:
# load the label from the dataset, which is a binary label 0/1 representing whether the cancer is benign or malignant

# STUDENT TODO START:

# STUDENT TODO END

In [None]:
# try raw data vs PCA data
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.decomposition import PCA

# STUDENT TODO START:
# Step 1: split the data into train and test set by a test_size of 0.33.


# Step 2: Train a linear regression model using train set and predict on the test set.
# As the labels are binary, we should cast the predictions into binary labels as well. (Set predictions >=0.5 as 1)
# You might want to print out accuracy scores here

# Step 3: Iterate the number of components from 1 to 10 (exclusive).
# For each number of PCs, we are training a linear regression model and save its accuracy on the test set following the same style as above.
# Remember to only fit the train set and not the test set.
# You might want to store your accuracies in a list

# Step 4: Make a plot to compare accuracy vs number of PCs on Linear Regression for the test set.
# Add a black, dashed line for the test accuracy of linear regression by feeding the raw input data.
# Remeber to add x, y labels and title to your plot, and comment on your observations.

# STUDENT TODO END

## **2.2. [4 pts] Understanding PCA**

### **2.2.1 [2 pts, autograded] Explained Ratio of PCA**
Given a threshold of explained ratio (0 < ratio < 1), compute the number of required PCs to reach the threshold.

In [None]:
def select_n_principal_components(data, variation):
  # STUDENT TODO START:

  # STUDENT TODO END

In [None]:
# PennGrader Grading Cell
student_ans = [select_n_principal_components(cancer_dataset['data'], 0.98), select_n_principal_components(cancer_dataset['data'], 0.99)]
grader.grade(test_case_id = 'test_select_n_principal_components', answer = student_ans)

### **2.2.2 [2 pts, manually graded] Composition of PCA's Principal Components**
In this section, we ask you to understand which features specifically in the dataset contribute to the most important PCs. We ask that you select the **best number of principal components** you got from **Section 2.1** and analyze their composition. Please comment on and analyze the **top 3 features** that make up the PCs.

In [None]:
# STUDENT TODO START:

# STUDENT TODO END

# Display the PCs and related metrics
df = pd.DataFrame(abs(pca.components_.T),index=cancer_dataset['feature_names'],columns = ['PC1','PC2', 'PC3','PC4'])
df['total'] = df.sum(axis=1)
df.sort_values(by='total', ascending=False).head(3)

## **2.3. [5 pts, manually graded 5190 only] PCA and KMeans**
It is common practice to run PCA and KMeans together because PCA reduces the number of features and thus variance, enabling KMeans to perform better, especially given how poorly it performs in high dimensions due to the curse of dimensionality.

We first run PCA on the dataset for visualization in 2D space. Note that K-means is actually being fit on the entire feature set.

Next, call your K-means class on the dataset X and obtain the clusters. **Make sure to populate the `clusters` variable here.** We have provided the plotting code for you.

**Add these plots in the written report.**

In [None]:
# PCA for visualization in 2D.
pca = PCA(n_components=2)
pca.fit(X)
v = pca.components_

for k in [3,5,7,9, 11]:

    clusters = np.zeros(X.shape[0])

    # STUDENT TODO START:

    # STUDENT TODO END

    plt.scatter(X[:, 0], X[:, 1], c=clusters, s=18)
    plt.title("Breast Cancer Clusters (k = "+str(k) + ")")
    plt.show()

In [None]:
# Assuming X is your data matrix with shape (569, 30)

# PCA for visualization in 2D
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)  # Shape: (569, 2)

# Explained variance
print("Explained variance by each component:", pca.explained_variance_ratio_)

# Iterate over different values of k for KMeans
for k in [3, 5, 7, 9, 11]:
    # Initialize KMeans with k clusters and a fixed random state for reproducibility
    kmeans = KMeans(n_clusters=k)

    # Fit KMeans on the original data and get cluster assignments
    clusters, _, _ = kmeans.fit(X)

    # Plot the PCA-transformed data colored by cluster assignments
    plt.figure(figsize=(8, 6))
    scatter = plt.scatter(X_pca[:, 0], X_pca[:, 1], c=clusters, cmap='viridis', s=50)
    plt.title(f"Breast Cancer Clusters (k = {k})")
    plt.xlabel('Principal Component 1')
    plt.ylabel('Principal Component 2')
    plt.colorbar(scatter, label='Cluster')
    plt.show()


# **3. Image Classification using CNN [14 pts, autograded]**

#### **Import libraries**

In [None]:
import os
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import transforms
from torchvision.transforms import ToTensor
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import matplotlib.pyplot as plt

#### **Set the random seed**

In [None]:
np.random.seed(42)
torch.manual_seed(42)

#### **Set GPU**: Make sure you are using `cuda`

In [None]:
# Make sure you're using cuda (GPU) by checking the hardware accelerator under Runtime -> Change runtime type
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("We're using:", device)

#### **Download and extract the data**

In [None]:
if not os.path.exists("cis5190_hw3_supertuxkart_data.zip"):
    !wget https://raw.githubusercontent.com/upenn/cis-4190-5190-fall-25/main/hw3/cis5190_hw3_supertuxkart_data.zip

In [None]:
!unzip "cis5190_hw3_supertuxkart_data.zip"

## **3.1. Dataset class implementation**

In this section, you will be training, validating and testing a CNN model to classify images of objects from a car racing video game called SuperTuxKart. There are 6 classes of objects: kart is 1, pickup is 2, nitro is 3, bomb is 4 and projectile 5. The background class (all other images) is assigned the label 0. First, you need to load data in a way that PyTorch can deal with easily. We will lean on PyTorch’s `Dataset` class to do this.

Complete the `STKDataset` class that inherits from `Dataset`.

1. `__init__` is a constructor, and would be the natural place to perform operations common to the full dataset, such as parsing the labels and image paths.
2. The `__len__` function should return the size of the dataset, i.e., the number of samples.
3. The `__getitem__` function should return a python tuple of (image, label). The image should be a torch.Tensor of size (3, 64, 64) and the label should be an int.

The labels of the images under a particular folder (`train/` or `val/`) are stored in the same folder as `labels.csv`. Read the `labels.csv` file using `pandas` to understand what it looks like before proceeding. There is also a `labels.csv` in the `test/` folder. That would only contain the file names of the test samples.

In [None]:
ENCODING_TO_LABELS = {0: "background",
                    1: "kart",
                    2: "pickup",
                    3: "nitro",
                    4: "bomb",
                    5: "projectile"}

LABELS_TO_ENCODING = {"background": 0,
                    "kart": 1,
                    "pickup": 2,
                    "nitro": 3,
                    "bomb": 4,
                    "projectile": 5}

In [None]:
class STKDataset(Dataset):

    def __init__(self, image_path, transform=None):
        self.image_path = image_path
        self.labels = pd.read_csv(image_path + "/labels.csv")
        self.transform = transform

    def __len__(self):

        # STUDENT TODO START: Return the number of samples in the dataset

        # STUDENT TODO END

    def __getitem__(self, idx):

        if torch.is_tensor(idx):
            idx = idx.tolist()

        # STUDENT TODO START: Create the path to each image by joining the root path with the name of the file as found in labels.csv

        # STUDENT TODO END

        # Read the image from the file path
        image = Image.open(img_name)
        # Transform the image using self.transform
        if self.transform:
            image = self.transform(image)

        if "label" in self.labels.columns:
            # STUDENT TODO START: Extract label name and encode it using the LABELS_TO_ENCODING dictionary

            # STUDENT TODO END
            sample = (image, label)
        else:
            sample = (image)
        return sample

In [None]:
# STUDENT TODO START: Use transforms.Compose to transform the image such that every pixel takes on a value between -1 and 1
# Hint: Refer to transforms.ToTensor() and transforms.Normalize()

# STUDENT TODO END

train_dataset = STKDataset(image_path="train", transform=transform)
val_dataset = STKDataset(image_path="val", transform=transform)
test_dataset = STKDataset(image_path="test", transform=transform)

#### **Visualization**

The following cell visualizes the data as a sanity check for your implementation of the `STKDataset` class.

In [None]:
figure = plt.figure(figsize=(8, 8))
cols, rows = 3, 3
torch.manual_seed(0)
for i in range(1, cols * rows + 1):
    sample_idx = torch.randint(len(train_dataset), size=(1,)).item()
    img, label = train_dataset[sample_idx]
    figure.add_subplot(rows, cols, i)
    plt.title(ENCODING_TO_LABELS[label])
    plt.axis("off")
    plt.imshow(img.permute(1, 2, 0)*0.5 + 0.5)
plt.show()

#### **Data loaders**

In [None]:
# STUDENT TODO START: Create data loaders for training, validation, and test sets each having a batch size of 64.
# Set shuffle to be True for the training data loader, False for validation and test data loader.

# STUDENT TODO END

## **3.2. CNN architecture**

Your goal is to devise a CNN that passes the threshold accuracy (80%) on the test set. You get full score (20 pts) if you get at least 80% test set accuracy and 0 if you get 30% or below. The score varies linearly between 0 and 20 for accuracies between 30% and 80%.

There are several decisions that you take in building your CNN including but not limited to:

- the number of convolutional layers
- the kernel size, stride, padding and number of out channels for each convolutional layer
- number of fully connected layers
- number of nodes in each fully connected layer

You are free to decide the architecture. To make your search easier, we recommend you to use **not more than four convolutional layers and four fully connected layers**. We also suggest that you use the **relu activation function** between the layers.

In [None]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        # STUDENT TODO START: Create the layers of your CNN here

        # STUDENT TODO END

    def forward(self, x):
        # STUDENT TODO START: Perform the forward pass through the layers

        # STUDENT TODO END

# STUDENT TODO START: Create an instance of Net and move it to the GPU

# STUDENT TODO END

## **3.3. Training, validation, and testing**

In [None]:
# STUDENT TODO START:
# 1. Set the criterion to be cross entropy loss


# 2. Experiment with different optimizers

# STUDENT TODO END

In [None]:
train_loss, validation_loss = [], []
train_acc, validation_acc = [], []

# STUDENT TODO START:
# Note that we have set the number of epochs to be 10. You can choose to increase or decrease the number of epochs.
num_epochs = 10
for epoch in range(num_epochs):

    for i, data in enumerate(train_dataloader, 0):

        inputs, labels = data
        # 1. Store the inputs and labels in the GPU

        # 2. Get the model predictions

        # 3. Zero the gradients out

        # 4. Get the loss

        # 5. Calculate the gradients

        # 6. Update the weights


    for i, data in enumerate(val_dataloader, 0):

        # 1. Store the inputs and labels in the GPU

        # 2. Get the model predictions

        # 3. Get the loss



    print(f"Epoch {epoch+1}:")

    print(f"Training Loss:", round(train_loss[epoch], 3))
    print(f"Validation Loss:", round(validation_loss[epoch], 3))

    print(f"Training Accuracy:", round(train_acc[epoch], 3))
    print(f"Validation Accuracy:", round(validation_acc[epoch], 3))

    print("------------------------------")

In [None]:
model.eval()

test_predictions = np.array([])

for i, data in enumerate(test_dataloader, 0):

    inputs = data
    # STUDENT TODO START:
    # 1. Store the inputs in the GPU

    # 2. Get the model predictions

    # STUDENT TODO END

    _, predicted = torch.max(predictions, 1)

    test_predictions = np.concatenate((test_predictions, predicted.detach().cpu().numpy()))

In [None]:
# PennGrader Grading Cell
if NOTEBOOK:
    grader.grade(test_case_id = 'test_cnn_predictions', answer = test_predictions)

# **Submit to Gradescope**
You've finished the homework. Please submit your final notebook on [Gradescope](gradescope.com).