# Homework 3 part a: Build a Tiny CNN on MNIST (Starter)


This assignment guides you through building and training a **very small CNN** on **MNIST** using **PyTorch**. It is intentionally minimal and CPU-friendly.

**What you'll do:**
1) Load a tiny dataset subset
2) Write a small CNN (a few convs + linear)
3) Train with a simple loop
4) Evaluate accuracy & confusion matrix
5) Run tiny experiments (channels/kernel size)


In [1]:
import os, random, math, time
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, Subset
from torchvision import datasets, transforms
import matplotlib.pyplot as plt

# Reproducibility
def set_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

set_seed(42)

In [None]:
# === Data: MNIST (28x28 grayscale) ===
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

root = "./data"
train_full = datasets.MNIST(root, train=True, download=True, transform=transform)
test_full  = datasets.MNIST(root, train=False, download=True, transform=transform)

# Use a small subset for speed
train_indices = list(range(0, 10000))       # 10k train samples
val_indices   = list(range(10000, 12000))   # 2k val samples from the rest of train
test_indices  = list(range(0, 2000))        # 2k test samples

train_ds = Subset(train_full, train_indices)
val_ds   = Subset(train_full, val_indices)
test_ds  = Subset(test_full, test_indices)

train_loader = DataLoader(train_ds, batch_size=64, shuffle=True, num_workers=2, pin_memory=True)
val_loader   = DataLoader(val_ds,   batch_size=256, shuffle=False, num_workers=2, pin_memory=True)
test_loader  = DataLoader(test_ds,  batch_size=256, shuffle=False, num_workers=2, pin_memory=True)

for images, labels in train_loader:
    print('Batch:', images.shape, labels.shape)
    break

In [None]:
# === Visualize a few samples ===
import matplotlib.pyplot as plt

images, labels = next(iter(train_loader))
images = images[:8]
labels = labels[:8]

plt.figure(figsize=(8,2))
for i in range(len(images)):
    plt.subplot(1, len(images), i+1)
    plt.imshow(images[i,0].numpy(), cmap='gray')
    plt.title(int(labels[i]))
    plt.axis('off')
plt.show()

## Part 1 — Implement a tiny CNN

In [None]:
# === TODO: Define a very small CNN ===
# Requirements:
# - Input: (N, 1, 28, 28)
# - Use 2 Conv2d layers with ReLU, a MaxPool2d, then a Linear head
# - Keep it tiny: first conv out_channels ~ 8-16, second ~ 16-32
# - Print the number of parameters
#
# Suggested skeleton:
class SmallCNN(nn.Module):
    def __init__(self, c1=8, c2=16, num_classes=10):
        super().__init__()
        # TODO: layers (Conv2d, ReLU, MaxPool2d, Linear)
        

        # After two convs + one pool, feature map shape is (c2, 14, 14)
        

    def forward(self, x):
        # TODO: forward pass with ReLU and pooling
       

model = SmallCNN()

In [None]:
# === Check shapes ===
x, y = next(iter(train_loader))
x = x
with torch.no_grad():
    logits = model(x)
print("Input:", x.shape, "Logits:", logits.shape)
assert logits.shape == (x.shape[0], 10), "Logits must be [batch, 10]"
print("Shape check passed ✅")

## Part 2 — Train the model

In [None]:
# === TODO: Training loop (fill the marked TODOs) ===
lr = 1e-2
epochs = 5
# TODO


def accuracy(logits, y):
    # TODO 

def valid_metrics(model, val_loader):
    # TODO

# TODO
best_val_acc = 0.0
for epoch in range(1, epochs+1):
    model.train()
    running_loss, running_acc, n = 0.0, 0.0, 0
    for xb, yb in train_loader:

        # --- forward
        

        # --- backward
        
       

    

    # --- validation
    

    print(f"Epoch {epoch:02d} | train_loss={train_loss:.4f} | train_acc={train_acc:.4f} | val_acc={val_acc:.4f}")

## Part 3 — Evaluate on the test set

In [None]:
# === Test loss and test accuracy ===




## Mini-Experiments (Answer in Markdown)
1. Increase `c1` and `c2` (e.g., `c1=16, c2=32`). What happens to parameter count and accuracy?\
   *Run and report numbers.*
2. Change `kernel_size` in conv layers to 5 (with padding=2). Any difference?\
   *Explain briefly.*
3. Add `Dropout(p=0.2)` before the linear layer. Does validation accuracy change?\
   *Why might that be?*
4. Reduce the training subset to 2,000 samples. How does train vs. val accuracy change?\
   *What does this tell you about capacity and data size?*

## What to Submit
- The completed notebook with all code cells executed.
- A short paragraph answering the 4 mini-experiments.
- Report final `val_acc` and `test_acc`.

## Grading Rubric (10 pts)
- (3 pts) Model implemented correctly; shapes & param count shown.
- (3 pts) Training loop works; learning curves reasonable; no crashes.
- (2 pts) Evaluation + confusion matrix produced.
- (2 pts) Mini-experiments: clear, concise answers with evidence.