<table style="background-color:#FFFFFF">   
  <tr>     
  <td><img src="https://upload.wikimedia.org/wikipedia/commons/9/95/Logo_EPFL_2019.svg" width="150x"/>
  </td>     
  <td>
  <h1> <b>CS-461: Foundation Models and Generative AI</b> </h1>
  Prof. Charlotte Bunne  
  </td>   
  </tr>
</table>

# 📚 Graded Assignment 1  
### CS-461: Foundation Models and Generative AI - Fall 2025  - Due: October 8, 23:59 CET

Welcome to the first graded assignment!
In this assignment, you will **implement and explore self-supervised learning** on a downsampled subset of the [ImageNet-1k dataset](https://www.image-net.org/), and evaluate how well your model generalizes **both in-distribution and out-of-distribution (OOD)**.  

---

## 🎯 Learning Objectives
By completing this assignment, you will learn to:
- Implement a custom **encoder** and **projection head** for images  
- Experiment with **data augmentations** for self-supervised learning  
- Train a model using a **self-supervised loss**  
- Evaluate learned representations with **k-NN** and **linear probes**  
- Assess **out-of-distribution (OOD) generalization** to unseen classes  
- Save, visualize, and submit results in a reproducible way  

---

## ⚡ Practical Notes
- **Dataset:**  
  - Training: 200 ImageNet classes, 500 images each (100k total)  
  - Validation: 200 ImageNet classes, 50 images each (10k total)  
  - **OOD dataset:** 200 unseen classes, 50 images each (10k total)  
- Use OOD only for **evaluation**, never for training.  
- Checkpoints and evaluation intervals are already set up — your main tasks are to fill in missing functions and customize the model.  
- Some helper utilities (e.g., dataset loaders, probes) are provided in `utils.py`.  

---

👉 **Deliverables:** You will submit:
- Your modified **`models.py`**  
- Trained weights in **`final_model.safetensors`**  
- A short **report.md** (max 500 words) — including **discussion of OOD results**  
- This completed notebook **CS461_Assignment1.ipynb**  

---

⚠️ **Important:** Don’t forget to fill in your **SCIPER number** and **full name** in Section 0, otherwise you will receive **0 points**.  

First, we import packages and set up the device. \
Feel free to add any additional packages you may need.

In [40]:
# Automatically reloads modules when you make changes (useful during development)
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [41]:
from pathlib import Path
import shutil

import numpy as np

import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import ImageGrid
from tqdm import tqdm

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as T
from torch.utils.data import DataLoader
from safetensors.torch import save_model

from torch.amp import autocast, GradScaler

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# 🆔 0. SCIPER Number and Name  

⚠️ **IMPORTANT!** ⚠️  
You **must** fill in your **SCIPER number** and **full name** below.  

This is **required for automatic grading**.  
If you do **not** provide this information, you will receive **0️⃣ (zero)** for this assignment. 

In [42]:
SCIPER = "395715"
LAST_NAME = "Kroknes-Gomez"
FIRST_NAME = "Yasmine"

## 1. Datasets & Utilities

- In the following, we will work with a subset of the ImageNet-1k dataset: color images downsampled to 64×64, covering 200 classes.
- The training set contains 500 images per class (100,000 images in total), and the validation set contains 50 images per class (10,000 images in total).
- The Out-Of-Distribution (OOD) datasets contain images from classes not present in the training set. It contains 50 images from 200 different classes (1,000 images in total).
- The purpose of these OOD datasets is to evaluate the generalization capabilities of the learned representations. You should not use it for training.
- During evalution, we will measure your model's performance on another OOD dataset (different from the one provided here), so make sure to not overfit on the provided OOD dataset.

<!-- Let's download/load it and define a default transformation turning a PIL Image into a `torch.tensor` -->
Make sure that you have access to the `/shared/CS461/cs461_assignment1_data/` folder. The folder structure should look like this:
```
cs461_assignment1_data/
└── train.npz
└── val.npz
└── ood.npz
```


Import dataset class and other utilities you developed in previous homeworks:

In [43]:
from utils import ImageDatasetNPZ, default_transform, seed_all
from utils import run_knn_probe, run_linear_probe, extract_features_and_labels

For reproducibility, you can use the provided `seed_all` function to set the random seed for all relevant libraries (Python, NumPy, PyTorch).

In [44]:
seed_all(42)  # For reproducibility, you can use any integer here

You probably want to implement custom data augmentations for the self-supervised learning method you choose. \
Feel free to swap the `default_transform` defined below and create multiple instances of datasets with different transforms.

In [45]:
data_dir = Path('/shared/CS461/cs461_assignment1_data/')

In [46]:
class TwoCropTransform:
    def __init__(self, base_transform):
        self.base = base_transform
    def __call__(self, x):
        return self.base(x), self.base(x)

# two stochastic for train
simclr_train_transform = T.Compose([
    T.ToPILImage(), 
    T.RandomResizedCrop(64, scale=(0.08, 1.0)),
    T.RandomHorizontalFlip(),
    T.RandomApply([T.ColorJitter(0.8, 0.8, 0.8, 0.2)], p=0.8),
    T.RandomGrayscale(p=0.2),
    T.GaussianBlur(kernel_size=9, sigma=(0.1, 2.0)),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225]),
])

# lighter, deterministic for eval
simclr_eval_transform = T.Compose([
    T.ToPILImage(),
    T.Resize(64),
    T.CenterCrop(64),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225]),
])

train_dataset = ImageDatasetNPZ(data_dir / 'train.npz', transform=TwoCropTransform(simclr_train_transform))
val_dataset = ImageDatasetNPZ(data_dir / 'val.npz', transform=simclr_eval_transform)

# train "eval" dataset that is single view to build the kNN/linear feature bank
train_eval_dataset = ImageDatasetNPZ(data_dir/'train.npz', transform=simclr_eval_transform)

You can split the provided OOD dataset into a training and validation set using the code below. \
You should not use the training split for actually training your models, but only for evaluation (e.g. kNN or linear probing).

In [47]:
rng = np.random.RandomState(42)
ds_ood = ImageDatasetNPZ(data_dir / 'ood.npz', transform=default_transform)
ood_val_ratio = 0.2
train_mask = rng.permutation(len(ds_ood)) >= int(len(ds_ood) * ood_val_ratio)
ds_oods_train = torch.utils.data.Subset(ds_ood, np.where(train_mask)[0])
ds_oods_val = torch.utils.data.Subset(ds_ood, np.where(~train_mask)[0])

In [48]:
batch_size = 128
num_workers = 4
pin_memory = True
collate_fn = None  # Replace with your custom collate function if needed

In [49]:
train_loader = DataLoader(train_dataset, batch_size=batch_size, num_workers=num_workers, pin_memory=pin_memory, shuffle=True, collate_fn=collate_fn)
val_loader  = DataLoader(val_dataset,  batch_size=batch_size, num_workers=num_workers, pin_memory=pin_memory, shuffle=False, collate_fn=collate_fn)

train_eval_loader  = DataLoader(train_eval_dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers, pin_memory=pin_memory)

# 2. Load Your Model

- Load your model from `models.py`.
- You will need to modify the `encoder` and `projection` modules, as the provided template implementation is only a placeholder.
- You SHOULD NOT change the `input_dim`, `input_channels`, and `feature_dim` parameters of the `ImageEncoder` class.
- You can use an existing architecture (e.g., ResNet, ViT) but you SHOULD NOT use any pre-trained weights.

In [50]:
from models import ImageEncoder

model = ImageEncoder().to(device)
model

ImageEncoder(
  (encoder): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (4): Sequential(
      (0): Bottleneck(
        (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (downsample): Sequential(
          (0): Conv2d

## 3. Helpers for Training & Evaluation

We suggest you to implement the following helper functions to keep your training and evaluation loops clean and organized. 
- `training_step`: Performs a single training step (forward pass, loss computation, backward pass, optimizer step) and returns the loss value.
- `evaluation_step`: Evaluates the model on the validation dataset and returns the accuracy.

Depending on your specific requirements, you may also want to implement additional utility functions for tasks such as data loading, metric computation, and logging.

As you have seen from previous assignments, loss functions for self-supervised learning objectives can be quite complex. \
Feel free to implement any helper functions you may need to compute the loss.


In [51]:
def training_step(model, batch, optimizer, scaler=None, clip_grad_norm=None, amp_dtype=torch.float16):
    # TODO: Implement the training step
    model.train()
    
    (x1, x2), _ = batch
    device = next(model.parameters()).device
    x1 = x1.to(device, non_blocking=True)
    x2 = x2.to(device, non_blocking=True)

    optimizer.zero_grad(set_to_none=True)
    if scaler is not None:
        with autocast('cuda', dtype=amp_dtype):
            _, z1 = model(x1)
            _, z2 = model(x2)
            loss = custom_loss_function(z1, z2)
        scaler.scale(loss).backward()

        if clip_grad_norm is not None:
            scaler.unscale_(optimizer)
            nn.utils.clip_grad_norm_(model.parameters(), clip_grad_norm)
            
        scaler.step(optimizer)
        scaler.update()

    else:
        _, z1 = model(x1)
        _, z2 = model(x2)
        loss = custom_loss_function(z1, z2)
        loss.backward()

        if clip_grad_norm is not None:
            nn.utils.clip_grad_norm_(model.parameters(), clip_grad_norm)

        optimizer.step()

    return float(loss.item())

In [52]:
def evaluation_step(model, eval_loader, batch, do_linear=True):
    # TODO: Implement the evaluation step
    model.eval()

    x_tr, y_tr = extract_features_and_labels(model, train_eval_loader, normalize=True)
    x_va, y_va = extract_features_and_labels(model, val_loader, normalize=True)

    x_tr, y_tr = x_tr.cpu().numpy(), y_tr.cpu().numpy()
    x_va, y_va = x_va.cpu().numpy(), y_va.cpu().numpy()

    knn_acc = run_knn_probe(x_tr, y_tr, x_va, y_va)
    out = {"knn_accuracy": 100.0 * knn_acc}

    if do_linear:
        lin_acc = run_linear_probe(x_tr, y_tr, x_va, y_va)
        out["linear_accuracy"] = 100.0 * lin_acc
    
    return out

In [53]:
def custom_loss_function(z1, z2, temperature: float = 0.1):
    # TODO: Depend on your training paradigm, implement your custom loss function
    B, D = z1.size()
    z = torch.cat([z1, z2], dim=0)
    sim = (z @ z.T)

    logits = sim / temperature
    diag = torch.eye(2*B, device=z.device, dtype=torch.bool)
    logits = logits.masked_fill(diag, float('-inf'))

    targets = torch.arange(B, device=z.device)
    targets = torch.cat([targets + B, targets], dim=0)

    loss = F.cross_entropy(logits, targets)
    return loss

# 4. Optimizer Configuration

In [54]:
# Feel free to adapt and add more arguments
lr = 1e-3
weight_decay = 5e-2
lr_step_size = 10
lr_gamma = 0.1

In [55]:
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=lr_step_size, gamma=lr_gamma)

# 5. Training Loop

Adapt your training configuration and implement the training loop. \
You probably want to save model checkpoints and evaluate the model on the validation set at regular intervals.

In [56]:
n_epochs = 200  # Adjust the number of epochs as needed
eval_interval = 5  # Evaluate the model every 'eval_interval' epochs
save_interval = 10  # Save the model every 'save_interval' epochs

checkpoints_dir = Path('checkpoints')
if not checkpoints_dir.exists():
    checkpoints_dir.mkdir(parents=True, exist_ok=False)

In [57]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
scaler = GradScaler(enabled=(device.type == "cuda"))

for epoch in tqdm(range(n_epochs)):
    # TODO: Implement the training and evaluation loop
    running_loss = 0.0

    for batch_idx, batch in enumerate(train_loader):
        loss_val = training_step(model, batch, optimizer, scaler)
        running_loss += loss_val

    avg_train_loss = running_loss / max(1, len(train_loader))

    if lr_scheduler is not None:
        lr_scheduler.step()

    # if (epoch + 1) % eval_interval == 0:
    #     model.eval()
    #     with torch.no_grad():
    #         val_stats = evaluation_step(model, val_loader)  # should internally handle device
    #     print(f"Epoch {epoch+1}/{n_epochs} | "
    #           f"Train Loss: {avg_train_loss:.4f} | "
    #           f"Val kNN-5 Acc: {val_stats['knn_accuracy']:.2f}%")

    if (epoch + 1) % eval_interval == 0:
        stats = evaluation_step(model, train_eval_loader, val_loader, do_linear=True)
        line = (f"Epoch {epoch+1}/{n_epochs} | "
                f"Train Loss: {avg_train_loss:.4f} | "
                f"kNN-5: {stats['knn_accuracy']:.2f}%")
        if 'linear_accuracy' in stats:
            line += f" | Linear: {stats['linear_accuracy']:.2f}%"
        print(line)

    if (epoch + 1) % save_interval == 0:
        checkpoint_path = checkpoints_dir / f"model_epoch_{epoch+1}.safetensors"
        save_model(model, checkpoint_path)
        print(f"Model checkpoint saved at {checkpoint_path}")

# Save the final model
final_model_path = checkpoints_dir / 'model_final.safetensors'
save_model(model, final_model_path)

  2%|▏         | 4/200 [07:17<5:56:58, 109.28s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
  2%|▎         | 5/200 [10:26<7:28:23, 137.96s/it]

Epoch 5/200 | Train Loss: 4.4382 | kNN-5: 1.52% | Linear: 2.48%


  4%|▍         | 9/200 [17:39<6:04:08, 114.39s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 10/200 | Train Loss: 3.5836 | kNN-5: 2.44% | Linear: 4.23%


  5%|▌         | 10/200 [21:04<7:31:00, 142.42s/it]

Model checkpoint saved at checkpoints/model_epoch_10.safetensors


  7%|▋         | 14/200 [28:27<6:05:54, 118.03s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
  8%|▊         | 15/200 [31:48<7:21:15, 143.11s/it]

Epoch 15/200 | Train Loss: 3.2819 | kNN-5: 2.62% | Linear: 4.49%


 10%|▉         | 19/200 [39:02<5:52:08, 116.73s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 20/200 | Train Loss: 3.2368 | kNN-5: 2.98% | Linear: 4.85%


 10%|█         | 20/200 [42:05<6:49:37, 136.54s/it]

Model checkpoint saved at checkpoints/model_epoch_20.safetensors


 12%|█▏        | 24/200 [49:20<5:38:20, 115.35s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 12%|█▎        | 25/200 [52:29<6:41:10, 137.54s/it]

Epoch 25/200 | Train Loss: 3.1847 | kNN-5: 2.97% | Linear: 4.96%


 14%|█▍        | 29/200 [59:38<5:25:20, 114.15s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 30/200 | Train Loss: 3.1798 | kNN-5: 2.84% | Linear: 4.71%


 15%|█▌        | 30/200 [1:02:39<6:20:14, 134.20s/it]

Model checkpoint saved at checkpoints/model_epoch_30.safetensors


 17%|█▋        | 34/200 [1:09:40<5:10:04, 112.08s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 18%|█▊        | 35/200 [1:12:47<6:10:09, 134.60s/it]

Epoch 35/200 | Train Loss: 3.1872 | kNN-5: 2.84% | Linear: 4.82%


 20%|█▉        | 39/200 [1:19:47<5:00:53, 112.13s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 40/200 | Train Loss: 3.1790 | kNN-5: 3.00% | Linear: 4.51%


 20%|██        | 40/200 [1:23:03<6:06:06, 137.29s/it]

Model checkpoint saved at checkpoints/model_epoch_40.safetensors


 22%|██▏       | 44/200 [1:30:09<4:56:33, 114.06s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 22%|██▎       | 45/200 [1:33:13<5:48:51, 135.04s/it]

Epoch 45/200 | Train Loss: 3.1834 | kNN-5: 2.66% | Linear: 4.87%


 24%|██▍       | 49/200 [1:40:22<4:46:02, 113.66s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 50/200 | Train Loss: 3.1660 | kNN-5: 2.99% | Linear: 4.43%


 25%|██▌       | 50/200 [1:43:44<5:50:38, 140.26s/it]

Model checkpoint saved at checkpoints/model_epoch_50.safetensors


 27%|██▋       | 54/200 [1:50:52<4:39:39, 114.93s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 28%|██▊       | 55/200 [1:54:04<5:33:24, 137.96s/it]

Epoch 55/200 | Train Loss: 3.1825 | kNN-5: 2.91% | Linear: 4.58%


 30%|██▉       | 59/200 [2:01:12<4:29:08, 114.53s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 60/200 | Train Loss: 3.1805 | kNN-5: 2.84% | Linear: 5.46%


 30%|███       | 60/200 [2:04:31<5:26:14, 139.82s/it]

Model checkpoint saved at checkpoints/model_epoch_60.safetensors


 32%|███▏      | 64/200 [2:11:38<4:19:25, 114.45s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 32%|███▎      | 65/200 [2:14:57<5:14:39, 139.85s/it]

Epoch 65/200 | Train Loss: 3.1721 | kNN-5: 2.80% | Linear: 4.46%


 34%|███▍      | 69/200 [2:22:01<4:09:18, 114.19s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 70/200 | Train Loss: 3.1685 | kNN-5: 2.83% | Linear: 5.22%


 35%|███▌      | 70/200 [2:25:13<4:57:37, 137.37s/it]

Model checkpoint saved at checkpoints/model_epoch_70.safetensors


 37%|███▋      | 74/200 [2:32:16<3:57:56, 113.30s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 38%|███▊      | 75/200 [2:35:25<4:43:50, 136.25s/it]

Epoch 75/200 | Train Loss: 3.1686 | kNN-5: 2.96% | Linear: 4.96%


 40%|███▉      | 79/200 [2:42:29<3:48:13, 113.17s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 80/200 | Train Loss: 3.1854 | kNN-5: 2.98% | Linear: 4.96%


 40%|████      | 80/200 [2:46:00<4:45:08, 142.57s/it]

Model checkpoint saved at checkpoints/model_epoch_80.safetensors


 42%|████▏     | 84/200 [2:53:04<3:42:13, 114.94s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 42%|████▎     | 85/200 [2:56:15<4:24:00, 137.75s/it]

Epoch 85/200 | Train Loss: 3.1824 | kNN-5: 2.78% | Linear: 4.90%


 44%|████▍     | 89/200 [3:03:19<3:30:32, 113.80s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 90/200 | Train Loss: 3.1748 | kNN-5: 3.04% | Linear: 5.13%


 45%|████▌     | 90/200 [3:06:38<4:15:15, 139.23s/it]

Model checkpoint saved at checkpoints/model_epoch_90.safetensors


 47%|████▋     | 94/200 [3:13:40<3:21:12, 113.89s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 48%|████▊     | 95/200 [3:16:46<3:57:13, 135.55s/it]

Epoch 95/200 | Train Loss: 3.1719 | kNN-5: 2.88% | Linear: 3.96%


 50%|████▉     | 99/200 [3:23:51<3:10:30, 113.17s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 100/200 | Train Loss: 3.1784 | kNN-5: 2.69% | Linear: 5.07%


 50%|█████     | 100/200 [3:27:02<3:47:19, 136.39s/it]

Model checkpoint saved at checkpoints/model_epoch_100.safetensors


 52%|█████▏    | 104/200 [3:34:08<3:02:08, 113.84s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 52%|█████▎    | 105/200 [3:37:20<3:37:05, 137.11s/it]

Epoch 105/200 | Train Loss: 3.1753 | kNN-5: 2.90% | Linear: 4.47%


 55%|█████▍    | 109/200 [3:44:26<2:52:42, 113.87s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 110/200 | Train Loss: 3.1824 | kNN-5: 2.75% | Linear: 4.57%


 55%|█████▌    | 110/200 [3:47:35<3:24:24, 136.28s/it]

Model checkpoint saved at checkpoints/model_epoch_110.safetensors


 57%|█████▋    | 114/200 [3:54:41<2:42:43, 113.53s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 57%|█████▊    | 115/200 [3:57:58<3:16:10, 138.47s/it]

Epoch 115/200 | Train Loss: 3.1696 | kNN-5: 2.79% | Linear: 5.14%


 60%|█████▉    | 119/200 [4:05:03<2:33:44, 113.89s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 120/200 | Train Loss: 3.1791 | kNN-5: 2.95% | Linear: 4.47%


 60%|██████    | 120/200 [4:08:20<3:04:55, 138.70s/it]

Model checkpoint saved at checkpoints/model_epoch_120.safetensors


 62%|██████▏   | 124/200 [4:15:28<2:24:58, 114.46s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 62%|██████▎   | 125/200 [4:18:45<2:53:59, 139.20s/it]

Epoch 125/200 | Train Loss: 3.1765 | kNN-5: 2.91% | Linear: 4.38%


 64%|██████▍   | 129/200 [4:25:46<2:14:10, 113.39s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 130/200 | Train Loss: 3.1798 | kNN-5: 2.78% | Linear: 4.69%


 65%|██████▌   | 130/200 [4:28:53<2:37:58, 135.40s/it]

Model checkpoint saved at checkpoints/model_epoch_130.safetensors


 67%|██████▋   | 134/200 [4:35:56<2:04:20, 113.03s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 68%|██████▊   | 135/200 [4:39:12<2:29:14, 137.77s/it]

Epoch 135/200 | Train Loss: 3.1810 | kNN-5: 2.86% | Linear: 5.08%


 70%|██████▉   | 139/200 [4:46:18<1:55:39, 113.76s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 140/200 | Train Loss: 3.1729 | kNN-5: 2.82% | Linear: 5.01%


 70%|███████   | 140/200 [4:49:55<2:24:45, 144.75s/it]

Model checkpoint saved at checkpoints/model_epoch_140.safetensors


 72%|███████▏  | 144/200 [4:57:04<1:48:56, 116.73s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 72%|███████▎  | 145/200 [5:00:26<2:10:23, 142.24s/it]

Epoch 145/200 | Train Loss: 3.1771 | kNN-5: 2.88% | Linear: 5.10%


 74%|███████▍  | 149/200 [5:07:34<1:38:17, 115.63s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 150/200 | Train Loss: 3.1733 | kNN-5: 2.83% | Linear: 4.26%


 75%|███████▌  | 150/200 [5:10:45<1:55:04, 138.08s/it]

Model checkpoint saved at checkpoints/model_epoch_150.safetensors


 77%|███████▋  | 154/200 [5:17:55<1:28:13, 115.08s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 78%|███████▊  | 155/200 [5:21:06<1:43:33, 138.09s/it]

Epoch 155/200 | Train Loss: 3.1877 | kNN-5: 2.87% | Linear: 4.47%


 80%|███████▉  | 159/200 [5:28:17<1:18:41, 115.16s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 160/200 | Train Loss: 3.1761 | kNN-5: 2.73% | Linear: 4.37%


 80%|████████  | 160/200 [5:31:23<1:30:53, 136.33s/it]

Model checkpoint saved at checkpoints/model_epoch_160.safetensors


 82%|████████▏ | 164/200 [5:38:30<1:08:07, 113.53s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 82%|████████▎ | 165/200 [5:41:40<1:19:32, 136.35s/it]

Epoch 165/200 | Train Loss: 3.1698 | kNN-5: 2.87% | Linear: 4.68%


 84%|████████▍ | 169/200 [5:48:46<58:39, 113.53s/it]  hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 170/200 | Train Loss: 3.1842 | kNN-5: 2.85% | Linear: 4.95%


 85%|████████▌ | 170/200 [5:51:51<1:07:33, 135.10s/it]

Model checkpoint saved at checkpoints/model_epoch_170.safetensors


 87%|████████▋ | 174/200 [5:58:59<49:09, 113.46s/it]  hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 88%|████████▊ | 175/200 [6:02:04<56:15, 135.04s/it]

Epoch 175/200 | Train Loss: 3.1751 | kNN-5: 2.84% | Linear: 4.41%


 90%|████████▉ | 179/200 [6:09:10<39:41, 113.39s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 180/200 | Train Loss: 3.1754 | kNN-5: 2.84% | Linear: 5.51%


 90%|█████████ | 180/200 [6:12:26<46:02, 138.13s/it]

Model checkpoint saved at checkpoints/model_epoch_180.safetensors


 92%|█████████▏| 184/200 [6:19:34<30:29, 114.36s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 92%|█████████▎| 185/200 [6:22:41<33:58, 135.93s/it]

Epoch 185/200 | Train Loss: 3.1765 | kNN-5: 2.94% | Linear: 5.00%


 94%|█████████▍| 189/200 [6:29:44<20:40, 112.74s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 190/200 | Train Loss: 3.1668 | kNN-5: 2.96% | Linear: 4.78%


 95%|█████████▌| 190/200 [6:32:55<22:44, 136.43s/it]

Model checkpoint saved at checkpoints/model_epoch_190.safetensors


 97%|█████████▋| 194/200 [6:39:58<11:18, 113.02s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
 98%|█████████▊| 195/200 [6:43:12<11:26, 137.22s/it]

Epoch 195/200 | Train Loss: 3.1829 | kNN-5: 2.83% | Linear: 4.79%


100%|█████████▉| 199/200 [6:50:14<01:53, 113.22s/it]hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.


Epoch 200/200 | Train Loss: 3.1793 | kNN-5: 2.92% | Linear: 5.19%


100%|██████████| 200/200 [6:53:29<00:00, 124.05s/it]

Model checkpoint saved at checkpoints/model_epoch_200.safetensors





# 6. Visualize Results

To better understand the performance of your trained model, visualize some results. \
You can visualize:
- Sample images from the validation set along with their predicted labels.
- Training and validation loss curves over epochs.

In [None]:
# TODO: Visualize some results from your trained model.

# 7. Submission Instructions

You must submit the following files:
- `models.py`: Contains the implementation of your model architecture.
- `final_model.safetensors`: The trained model weights saved in the safetensors format.
- `report.md`: A brief report summarizing your approach, design choices, and results.
- `CS461_Assignment1.ipynb`: The Jupyter notebook containing your code and explanations. Make sure to save your progress before running the cell below.

You will submit your assignment under a single folder named `/home/cs461_assignment1_submission` containing the above files. \
Make sure to replace `<SCIPER>`, `<LAST_NAME>`, and `<FIRST_NAME>` with your actual SCIPER number, last name, and first name respectively. \
The following cell will help you move the files into the submission folder.

In [None]:
work_dir = Path('.')
output_dir = Path.home() / 'cs461_assignment1_submission'

if not output_dir.exists():
    output_dir.mkdir(parents=True, exist_ok=False)
    
shutil.copy(final_model_path, output_dir / 'final_model.safetensors')
shutil.copy(work_dir / 'models.py', output_dir / 'models.py')
shutil.copy(work_dir / 'CS461_Assignment1.ipynb', output_dir / 'CS461_Assignment1.ipynb')
shutil.copy(work_dir / 'report.md', output_dir / 'report.md')

Check that all required files are present in the submission folder before running the cell below.

In [None]:
assert SCIPER is not None and LAST_NAME is not None and FIRST_NAME is not None, "Please set your SCIPER, LAST_NAME, and FIRST_NAME variables."

list_of_files = ['final_model.safetensors', 'models.py', 'CS461_Assignment1.ipynb', 'report.md']
files_found = all((output_dir / f).exists() for f in list_of_files)
assert files_found, f"One or more required files are missing in the submission folder: {list_of_files}"


You can test whether your submission folder is appropriately structured by using the `eval.py`:
```bash
python eval.py
```

In [2]:
### Uncomment the line below to run the evaluation script and check your model's performance

# !python eval.py

---
🎉 **Congratulations!**  
You’ve completed Assignment 1. Good luck, and don’t forget to double-check your submission!