# Computer Vision Homework 3: Big vs Small Models

## Brief

Due date: Nov 13, 2023

Required files: `homework-3.ipynb`, `report.pdf`

To download the jupyter notebook from colab, you can refer to the colab tutorial we gave.


## Codes for Problem 1 and Problem 2

### Import Packages

In [None]:
import glob
import os
import random

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.optim as optim

from PIL import Image
from torch.utils.data import DataLoader, Dataset, RandomSampler
from torchvision import transforms, models, datasets
from tqdm import tqdm

%matplotlib inline

### Check GPU Environment

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using {device} device')

In [None]:
! nvidia-smi -L

### Set the Seed to Reproduce the Result

In [None]:
def set_all_seed(seed):
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
set_all_seed(123)

### Create Dataset and Dataloader

In [None]:
batch_size = 256

mean = (0.4914, 0.4822, 0.4465)
std = (0.2471, 0.2435, 0.2616)
train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])

train_dataset = datasets.CIFAR10(root='data', train=True, download=True, transform=train_transform)
valid_dataset = datasets.CIFAR10(root='data', train=False, download=True, transform=test_transform)

train_dataloader = DataLoader(train_dataset, batch_size = batch_size, shuffle=True, pin_memory=True)   # shuffle:表示在每個訓練周期開始之前是否對數據進行随機重排
valid_dataloader = DataLoader(valid_dataset, batch_size = batch_size, shuffle=False, pin_memory=True)
# len(train_dataloader) = 196
# len(test_dataloader) = 40

sixteenth_train_sampler = RandomSampler(train_dataset, num_samples=len(train_dataset)//16)
half_train_sampler = RandomSampler(train_dataset, num_samples=len(train_dataset)//2)

sixteenth_train_dataloader = DataLoader(train_dataset, batch_size=batch_size, sampler=sixteenth_train_sampler)
half_train_dataloader = DataLoader(train_dataset, batch_size=batch_size, sampler=half_train_sampler)

### Load Models

In [None]:
# HINT: Remember to change the model to 'resnet50' and the weights to weights="IMAGENET1K_V1" when needed.
small_model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', weights = "IMAGENET1K_V1")
big_model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', weights = "IMAGENET1K_V1")

small_model.fc = torch.nn.Linear(small_model.fc.in_features, 10)  # model.fc.in_features = 512，替換新的全連接層，從1000個classes變成10個
big_model.fc = torch.nn.Linear(big_model.fc.in_features, 10)  # model.fc.in_features = 512，替換新的全連接層，從1000個classes變成10個

# Background: The original resnet18 is designed for ImageNet dataset to predict 1000 classes.
# TODO: Change the output of the model to 10 class.

### Training and Testing Models

In [None]:
# TODO: Fill in the code cell according to the pytorch tutorial we gave.
loss_fn = nn.CrossEntropyLoss()
small_model_optimizer = torch.optim.Adam(small_model.parameters(), lr=1e-3)
big_model_optimizer = torch.optim.Adam(big_model.parameters(), lr=1e-3)

def train(dataloader, model, loss_fn, optimizer):
  num_batches = len(dataloader)
  size = len(dataloader.dataset)
  epoch_loss = 0
  correct = 0

  model.train()

  for X, Y in tqdm(dataloader):   # tqdm:進度條
    # X, Y = X.to(device), Y.to(device)

    # Compute prediction error
    pred = model(X)
    loss = loss_fn(pred, Y)

    # Backpropagation
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    epoch_loss += loss.item()
    pred = pred.argmax(dim = 1, keepdim = True)
    correct += pred.eq(Y.view_as(pred)).sum().item()

  avg_epoch_loss = epoch_loss / num_batches
  avg_acc = correct / size
  return avg_epoch_loss, avg_acc

def test(dataloader, model, loss_fn):
  num_batches = len(dataloader)
  size = len(dataloader.dataset)
  epoch_loss = 0
  correct = 0

  model.eval()

  with torch.no_grad():
    for X, Y in tqdm(dataloader):
      # X, Y = X.to(device), Y.to(device)

      pred = model(X)

      epoch_loss += loss_fn(pred, Y).item()
      pred = pred.argmax(dim = 1, keepdim = True)
      correct += pred.eq(Y.view_as(pred)).sum().item()

  avg_epoch_loss = epoch_loss / num_batches
  avg_acc = correct / size
  return avg_epoch_loss, avg_acc

# 跑for迴圈前須重置，避免累積accuracy，導致結果失真
def reset(weight_select):
  set_all_seed(123)
  small_model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', weights = weight_select)
  big_model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', weights = weight_select)
  small_model.fc = torch.nn.Linear(small_model.fc.in_features, 10)  # model.fc.in_features = 512，替換新的全連接層，從1000個classes變成10個
  big_model.fc = torch.nn.Linear(big_model.fc.in_features, 10)  # model.fc.in_features = 512，替換新的全連接層，從1000個classes變成10個
  small_model_optimizer = torch.optim.Adam(small_model.parameters(), lr=1e-3)
  big_model_optimizer = torch.optim.Adam(big_model.parameters(), lr=1e-3)
  return small_model, small_model_optimizer, big_model, big_model_optimizer

In [None]:
epochs = 5
x_train_dataloader = 1.0
x_half_train_dataloader = 0.5
x_sixteenth_train_dataloader = 1/16
data_size = np.array([x_sixteenth_train_dataloader, x_half_train_dataloader, x_train_dataloader])

###############Small Model#################
print("Starting Small Model Training\n")

small_model, small_model_optimizer, big_model, big_model_optimizer = reset("IMAGENET1K_V1")

print("==========sixteenth_train_dataloader==========\n")
for epoch in range(epochs):
  small_model_train_loss, small_model_train_acc = train(sixteenth_train_dataloader, small_model, loss_fn, small_model_optimizer)
  small_model_test_loss, small_model_test_acc = test(valid_dataloader, small_model, loss_fn)
  print(f"Epoch {epoch+1:2d}:Loss = {small_model_train_loss:.4f} Acc = {small_model_train_acc:.2f} Test_Loss = {small_model_test_loss:.4f} Test_Acc = {small_model_test_acc:.2f}")
  if (epoch == epochs-1):
    y_small_model_sixteenth_train = small_model_train_acc
    y_small_model_sixteenth_test = small_model_test_acc
print("Done!")

small_model, small_model_optimizer, big_model, big_model_optimizer = reset("IMAGENET1K_V1")

print("==========half_train_dataloader==========\n")
for epoch in range(epochs):
  small_model_train_loss, small_model_train_acc = train(half_train_dataloader, small_model, loss_fn, small_model_optimizer)
  small_model_test_loss, small_model_test_acc = test(valid_dataloader, small_model, loss_fn)
  print(f"Epoch {epoch+1:2d}:Loss = {small_model_train_loss:.4f} Acc = {small_model_train_acc:.2f} Test_Loss = {small_model_test_loss:.4f} Test_Acc = {small_model_test_acc:.2f}")
  if (epoch == epochs-1):
    y_small_model_half_train = small_model_train_acc
    y_small_model_half_test = small_model_test_acc
print("Done!")

small_model, small_model_optimizer, big_model, big_model_optimizer = reset("IMAGENET1K_V1")

print("==========train_dataloader==========\n")
for epoch in range(epochs):
  small_model_train_loss, small_model_train_acc = train(train_dataloader, small_model, loss_fn, small_model_optimizer)
  small_model_test_loss, small_model_test_acc = test(valid_dataloader, small_model, loss_fn)
  print(f"Epoch {epoch+1:2d}:Loss = {small_model_train_loss:.4f} Acc = {small_model_train_acc:.2f} Test_Loss = {small_model_test_loss:.4f} Test_Acc = {small_model_test_acc:.2f}")
  if (epoch == epochs-1):
    y_small_model_train = small_model_train_acc
    y_small_model_test = small_model_test_acc
print("Done!")


###############Big Model#################
print("Starting Big Model Training\n")

small_model, small_model_optimizer, big_model, big_model_optimizer = reset("IMAGENET1K_V1")

print("==========sixteenth_train_dataloader==========\n")
for epoch in range(epochs):
  big_model_train_loss, big_model_train_acc = train(sixteenth_train_dataloader, big_model, loss_fn, big_model_optimizer)
  big_model_test_loss, big_model_test_acc = test(valid_dataloader, big_model, loss_fn)
  print(f"Epoch {epoch+1:2d}:Loss = {big_model_train_loss:.4f} Acc = {big_model_train_acc:.2f} Test_Loss = {big_model_test_loss:.4f} Test_Acc = {big_model_test_acc:.2f}")
  if (epoch == epochs-1):
    y_big_model_sixteenth_train = big_model_train_acc
    y_big_model_sixteenth_test = big_model_test_acc
print("Done!")

small_model, small_model_optimizer, big_model, big_model_optimizer = reset("IMAGENET1K_V1")

print("==========half_train_dataloader==========\n")
for epoch in range(epochs):
  big_model_train_loss, big_model_train_acc = train(half_train_dataloader, big_model, loss_fn, big_model_optimizer)
  big_model_test_loss, big_model_test_acc = test(valid_dataloader, big_model, loss_fn)
  print(f"Epoch {epoch+1:2d}:Loss = {big_model_train_loss:.4f} Acc = {big_model_train_acc:.2f} Test_Loss = {big_model_test_loss:.4f} Test_Acc = {big_model_test_acc:.2f}")
  if (epoch == epochs-1):
    y_big_model_half_train = big_model_train_acc
    y_big_model_half_test = big_model_test_acc
print("Done!")

small_model, small_model_optimizer, big_model, big_model_optimizer = reset("IMAGENET1K_V1")

print("==========train_dataloader==========\n")
for epoch in range(epochs):
  big_model_train_loss, big_model_train_acc = train(train_dataloader, big_model, loss_fn, big_model_optimizer)
  big_model_test_loss, big_model_test_acc = test(valid_dataloader, big_model, loss_fn)
  print(f"Epoch {epoch+1:2d}:Loss = {big_model_train_loss:.4f} Acc = {big_model_train_acc:.2f} Test_Loss = {big_model_test_loss:.4f} Test_Acc = {big_model_test_acc:.2f}")
  if (epoch == epochs-1):
    y_big_model_train = big_model_train_acc
    y_big_model_test = big_model_test_acc
print("Done!")

small_model_accuracy_train = np.array([y_small_model_sixteenth_train, y_small_model_half_train, y_small_model_train])
big_model_accuracy_train = np.array([y_big_model_sixteenth_train, y_big_model_half_train, y_big_model_train])
small_model_accuracy_test = np.array([y_small_model_sixteenth_test, y_small_model_half_test, y_small_model_test])
big_model_accuracy_test = np.array([y_big_model_sixteenth_test, y_big_model_half_test, y_big_model_test])

print("small_model_accuracy_train:", small_model_accuracy_train)
print("small_model_accuracy_test:", small_model_accuracy_test)
print("big_model_accuracy_train:", big_model_accuracy_train)
print("big_model_accuracy_test:", big_model_accuracy_test)


## Codes for Problem 3

In [10]:
# TODO: Try to achieve the best performance given all training data using whatever model and training strategy.
# (New) (You cannot use the model that was pretrained on CIFAR10)

def set_all_seed(seed):
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
set_all_seed(123)
# ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
# ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
batch_size = 32   # hyperparameter

mean = (0.4914, 0.4822, 0.4465)
std = (0.2471, 0.2435, 0.2616)
train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])

train_dataset = datasets.CIFAR10(root='data', train=True, download=True, transform=train_transform)
valid_dataset = datasets.CIFAR10(root='data', train=False, download=True, transform=test_transform)

train_dataloader = DataLoader(train_dataset, batch_size = batch_size, shuffle=True, pin_memory=True)   # shuffle:表示在每個訓練周期開始之前是否對數據進行随機重排
valid_dataloader = DataLoader(valid_dataset, batch_size = batch_size, shuffle=False, pin_memory=True)

sixteenth_train_sampler = RandomSampler(train_dataset, num_samples=len(train_dataset)//16)
half_train_sampler = RandomSampler(train_dataset, num_samples=len(train_dataset)//2)

sixteenth_train_dataloader = DataLoader(train_dataset, batch_size=batch_size, sampler=sixteenth_train_sampler)
half_train_dataloader = DataLoader(train_dataset, batch_size=batch_size, sampler=half_train_sampler)
# ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
# ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
small_model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', weights = "IMAGENET1K_V1")
big_model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', weights = "IMAGENET1K_V1")

small_model.fc = torch.nn.Linear(small_model.fc.in_features, 10)  # model.fc.in_features = 512，替換新的全連接層，從1000個classes變成10個
big_model.fc = torch.nn.Linear(big_model.fc.in_features, 10)  # model.fc.in_features = 512，替換新的全連接層，從1000個classes變成10個

# 添加Dropout
small_model.fc = nn.Sequential(nn.Dropout(0.2), nn.Linear(small_model.fc.in_features, 10))
big_model.fc = nn.Sequential(nn.Dropout(0.2), nn.Linear(big_model.fc.in_features, 10))
# ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
# ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
def train(dataloader, model, loss_fn, optimizer):
  num_batches = len(dataloader)
  size = len(dataloader.dataset)
  epoch_loss = 0
  correct = 0

  model.train()

  for X, Y in tqdm(dataloader):   # tqdm:進度條
    # X, Y = X.to(device), Y.to(device)

    # Compute prediction error
    pred = model(X)
    loss = loss_fn(pred, Y)

    # Backpropagation
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    epoch_loss += loss.item()
    pred = pred.argmax(dim = 1, keepdim = True)
    correct += pred.eq(Y.view_as(pred)).sum().item()

  avg_epoch_loss = epoch_loss / num_batches
  avg_acc = correct / size
  return avg_epoch_loss, avg_acc

def test(dataloader, model, loss_fn):
  num_batches = len(dataloader)
  size = len(dataloader.dataset)
  epoch_loss = 0
  correct = 0

  model.eval()

  with torch.no_grad():
    for X, Y in tqdm(dataloader):
      # X, Y = X.to(device), Y.to(device)

      pred = model(X)

      epoch_loss += loss_fn(pred, Y).item()
      pred = pred.argmax(dim = 1, keepdim = True)
      correct += pred.eq(Y.view_as(pred)).sum().item()

  avg_epoch_loss = epoch_loss / num_batches
  avg_acc = correct / size
  return avg_epoch_loss, avg_acc

def optimization(weight_select):
    set_all_seed(123)
    small_model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', weights = weight_select)
    big_model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', weights = weight_select)
    small_model.fc = torch.nn.Linear(small_model.fc.in_features, 10)  # model.fc.in_features = 512，替換新的全連接層，從1000個classes變成10個
    big_model.fc = torch.nn.Linear(big_model.fc.in_features, 10)  # model.fc.in_features = 512，替換新的全連接層，從1000個classes變成10個
    loss_fn = nn.CrossEntropyLoss()   # hyperparameter
    learning_rate = 1e-3   # hyperparameter
    small_model_optimizer_ADAM = torch.optim.Adam(small_model.parameters(), lr=learning_rate)   # hyperparameter
    big_model_optimizer_ADAM = torch.optim.Adam(big_model.parameters(), lr=learning_rate)   # hyperparameter
    small_model_optimizer_SGD = torch.optim.SGD(small_model.parameters(), lr=learning_rate)   # hyperparameter
    big_model_optimizer_SGD = torch.optim.SGD(big_model.parameters(), lr=learning_rate)   # hyperparameter
    return small_model, small_model_optimizer_ADAM, small_model_optimizer_SGD, big_model, big_model_optimizer_ADAM, big_model_optimizer_SGD, loss_fn
# ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
# ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
epochs = 5
x_train_dataloader = 1.0
x_half_train_dataloader = 0.5
x_sixteenth_train_dataloader = 1/16
data_size = np.array([x_sixteenth_train_dataloader, x_half_train_dataloader, x_train_dataloader])


###############Small Model#################
print("Starting Small Model Training\n")

small_model, small_model_optimizer_ADAM, small_model_optimizer_SGD, big_model, big_model_optimizer_ADAM, big_model_optimizer_SGD, loss_fn = optimization("IMAGENET1K_V1")

print("==========sixteenth_train_dataloader==========\n")
for epoch in range(epochs):
  small_model_train_loss, small_model_train_acc = train(sixteenth_train_dataloader, small_model, loss_fn, small_model_optimizer_ADAM)
  small_model_test_loss, small_model_test_acc = test(valid_dataloader, small_model, loss_fn)
  print(f"Epoch {epoch+1:2d}:Loss = {small_model_train_loss:.4f} Acc = {small_model_train_acc:.2f} Test_Loss = {small_model_test_loss:.4f} Test_Acc = {small_model_test_acc:.2f}")
  if (epoch == epochs-1):
    y_small_model_sixteenth_train = small_model_train_acc
    y_small_model_sixteenth_test = small_model_test_acc
print("Done!")

small_model, small_model_optimizer_ADAM, small_model_optimizer_SGD, big_model, big_model_optimizer_ADAM, big_model_optimizer_SGD, loss_fn = optimization("IMAGENET1K_V1")

print("==========half_train_dataloader==========\n")
for epoch in range(epochs):
  small_model_train_loss, small_model_train_acc = train(half_train_dataloader, small_model, loss_fn, small_model_optimizer_ADAM)
  small_model_test_loss, small_model_test_acc = test(valid_dataloader, small_model, loss_fn)
  print(f"Epoch {epoch+1:2d}:Loss = {small_model_train_loss:.4f} Acc = {small_model_train_acc:.2f} Test_Loss = {small_model_test_loss:.4f} Test_Acc = {small_model_test_acc:.2f}")
  if (epoch == epochs-1):
    y_small_model_half_train = small_model_train_acc
    y_small_model_half_test = small_model_test_acc
print("Done!")

small_model, small_model_optimizer_ADAM, small_model_optimizer_SGD, big_model, big_model_optimizer_ADAM, big_model_optimizer_SGD, loss_fn = optimization("IMAGENET1K_V1")

print("==========train_dataloader==========\n")
for epoch in range(epochs):
  small_model_train_loss, small_model_train_acc = train(train_dataloader, small_model, loss_fn, small_model_optimizer_ADAM)
  small_model_test_loss, small_model_test_acc = test(valid_dataloader, small_model, loss_fn)
  print(f"Epoch {epoch+1:2d}:Loss = {small_model_train_loss:.4f} Acc = {small_model_train_acc:.2f} Test_Loss = {small_model_test_loss:.4f} Test_Acc = {small_model_test_acc:.2f}")
  if (epoch == epochs-1):
    y_small_model_train = small_model_train_acc
    y_small_model_test = small_model_test_acc
print("Done!")


###############Big Model#################
print("Starting Big Model Training\n")

small_model, small_model_optimizer_ADAM, small_model_optimizer_SGD, big_model, big_model_optimizer_ADAM, big_model_optimizer_SGD, loss_fn = optimization("IMAGENET1K_V1")

print("==========sixteenth_train_dataloader==========\n")
for epoch in range(epochs):
  big_model_train_loss, big_model_train_acc = train(sixteenth_train_dataloader, big_model, loss_fn, big_model_optimizer_ADAM)
  big_model_test_loss, big_model_test_acc = test(valid_dataloader, big_model, loss_fn)
  print(f"Epoch {epoch+1:2d}:Loss = {big_model_train_loss:.4f} Acc = {big_model_train_acc:.2f} Test_Loss = {big_model_test_loss:.4f} Test_Acc = {big_model_test_acc:.2f}")
  if (epoch == epochs-1):
    y_big_model_sixteenth_train = big_model_train_acc
    y_big_model_sixteenth_test = big_model_test_acc
print("Done!")

small_model, small_model_optimizer_ADAM, small_model_optimizer_SGD, big_model, big_model_optimizer_ADAM, big_model_optimizer_SGD, loss_fn = optimization("IMAGENET1K_V1")

print("==========half_train_dataloader==========\n")
for epoch in range(epochs):
  big_model_train_loss, big_model_train_acc = train(half_train_dataloader, big_model, loss_fn, big_model_optimizer_ADAM)
  big_model_test_loss, big_model_test_acc = test(valid_dataloader, big_model, loss_fn)
  print(f"Epoch {epoch+1:2d}:Loss = {big_model_train_loss:.4f} Acc = {big_model_train_acc:.2f} Test_Loss = {big_model_test_loss:.4f} Test_Acc = {big_model_test_acc:.2f}")
  if (epoch == epochs-1):
    y_big_model_half_train = big_model_train_acc
    y_big_model_half_test = big_model_test_acc
print("Done!")

small_model, small_model_optimizer_ADAM, small_model_optimizer_SGD, big_model, big_model_optimizer_ADAM, big_model_optimizer_SGD, loss_fn = optimization("IMAGENET1K_V1")

print("==========train_dataloader==========\n")
for epoch in range(epochs):
  big_model_train_loss, big_model_train_acc = train(train_dataloader, big_model, loss_fn, big_model_optimizer_ADAM)
  big_model_test_loss, big_model_test_acc = test(valid_dataloader, big_model, loss_fn)
  print(f"Epoch {epoch+1:2d}:Loss = {big_model_train_loss:.4f} Acc = {big_model_train_acc:.2f} Test_Loss = {big_model_test_loss:.4f} Test_Acc = {big_model_test_acc:.2f}")
  if (epoch == epochs-1):
    y_big_model_train = big_model_train_acc
    y_big_model_test = big_model_test_acc
print("Done!")

small_model_accuracy_train = np.array([y_small_model_sixteenth_train, y_small_model_half_train, y_small_model_train])
big_model_accuracy_train = np.array([y_big_model_sixteenth_train, y_big_model_half_train, y_big_model_train])
small_model_accuracy_test = np.array([y_small_model_sixteenth_test, y_small_model_half_test, y_small_model_test])
big_model_accuracy_test = np.array([y_big_model_sixteenth_test, y_big_model_half_test, y_big_model_test])

print("small_model_accuracy_train:", small_model_accuracy_train)
print("small_model_accuracy_test:", small_model_accuracy_test)
print("big_model_accuracy_train:", big_model_accuracy_train)
print("big_model_accuracy_test:", big_model_accuracy_test)

Files already downloaded and verified
Files already downloaded and verified


Using cache found in C:\Users\zeus9/.cache\torch\hub\pytorch_vision_v0.10.0
Using cache found in C:\Users\zeus9/.cache\torch\hub\pytorch_vision_v0.10.0


Starting Small Model Training



Using cache found in C:\Users\zeus9/.cache\torch\hub\pytorch_vision_v0.10.0
Using cache found in C:\Users\zeus9/.cache\torch\hub\pytorch_vision_v0.10.0


Starting Big Model Training



Using cache found in C:\Users\zeus9/.cache\torch\hub\pytorch_vision_v0.10.0
Using cache found in C:\Users\zeus9/.cache\torch\hub\pytorch_vision_v0.10.0





100%|██████████| 1563/1563 [19:51<00:00,  1.31it/s]
100%|██████████| 313/313 [00:27<00:00, 11.32it/s]


Epoch  1:Loss = 1.7107 Acc = 0.41 Test_Loss = 1.7576 Test_Acc = 0.48


100%|██████████| 1563/1563 [19:27<00:00,  1.34it/s]
100%|██████████| 313/313 [00:28<00:00, 11.13it/s]


Epoch  2:Loss = 1.4558 Acc = 0.50 Test_Loss = 1.2797 Test_Acc = 0.56


100%|██████████| 1563/1563 [20:11<00:00,  1.29it/s]
100%|██████████| 313/313 [00:27<00:00, 11.31it/s]


Epoch  3:Loss = 1.1656 Acc = 0.59 Test_Loss = 1.3779 Test_Acc = 0.59


100%|██████████| 1563/1563 [19:38<00:00,  1.33it/s]
100%|██████████| 313/313 [00:25<00:00, 12.31it/s]


Epoch  4:Loss = 0.9997 Acc = 0.66 Test_Loss = 0.9029 Test_Acc = 0.69


100%|██████████| 1563/1563 [18:59<00:00,  1.37it/s]
100%|██████████| 313/313 [00:25<00:00, 12.15it/s]

Epoch  5:Loss = 0.9058 Acc = 0.69 Test_Loss = 0.7747 Test_Acc = 0.73
Done!





NameError: name 'y_small_model_sixteenth_train' is not defined

## Problems

1. (30%) Finish the rest of the codes for Problem 1 and Problem 2 according to the hint. (2 code cells in total.)
2. Train small model (resnet18) and big model (resnet50) from scratch on `sixteenth_train_dataloader`, `half_train_dataloader`, and `train_dataloader` respectively.
3. (30%) Achieve the best performance given all training data using whatever model and training strategy.  
  (You cannot use the model that was pretrained on CIFAR10)



## Discussion

Write down your insights in the report. The file name should be report.pdf.
For the following discussion, please present the results graphically as shown in Fig. 1 and discuss them.

- (30%) The relationship between the accuracy, model size, and the training dataset size.  
    (Total 6 models. Small model trains on the sixteenth, half, and all data. Big model trains on the sixteenth, half, and all data. If the result is different from Fig.1, please explain the possible reasons.)
- (10%) What if we train the ResNet with ImageNet initialized weights (`weights="IMAGENET1K_V1"`).
Please explain why the relationship changed this way?

Hint: You can try different hyperparameters combinations when training the models.

## Credits

1. [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html)