# Exercise

## 0. Prerequisites

In [None]:
# install packages
!pip install -qq torchinfo

In [None]:
# import scripts from GitHub
!git clone https://github.com/yhs2773/PyTorch-for-Deep-Learning-Machine-Learning-Full-Course
!mv PyTorch-for-Deep-Learning-Machine-Learning-Full-Course/going_modular .
!mv PyTorch-for-Deep-Learning-Machine-Learning-Full-Course/helper_functions.py .
!rm -rf PyTorch-for-Deep-Learning-Machine-Learning-Full-Course

In [None]:
# load libraries
import torch
import torchvision

import matplotlib.pyplot as plt
import pathplib

from torch import nn
from torchvision import transforms, models
from torchinfo import summary

from going_modular import data_setup, engine, predictions
from helper_functions import download_data, set_seeds, plot_loss_curves

from PIL import Image
from timeit import default_timer as timer
from tqdm.auto import tqdm
from typing import List, Dict
from pathlib import Path

In [None]:
# device agnostic code
deivce = "cuda" if torch.cuda.is_available() else "cpu"
device

In [None]:
# get data
data_20 = download_data(source='https://github.com/yhs2773/PyTorch-for-Deep-Learning-Machine-Learning-Full-Course/blob/main/data/pizza_steak_sushi_20_percent.zip',
                        destination="pizza_steak_sushi_20_percent")

In [None]:
# set directories
train_dir = data_20 / "train"
test_dir = data_20 / "test"

train_dir, test_dir

In [None]:
# create model function
def create_model(num_classes: int=3,
                 seed: int=42,
                 is_effnetb2: bool=True):
    if is_effnetb2:
        weights = models.EfficientNet_B2_Weights.DEFAULT
        transforms = weights.transforms()
        model = models.efficientnet_b2(weights=weights)

        for param in model.parameters():
            param.requires_grad = False

        torch.manual_seed(seed)
        model.classifier = nn.Sequential(
            nn.Dropout(0.3, inplace=True),
            nn.Linear(in_features=1408, out_features=num_classes)
        )
    else:
        weights = models.ViT_B_16.Weights.DEFAULT
        transforms = weights.transforms()
        model = models.vit_b_16(weights=weights)

        for param in model.parameters():
            param.requires_grad = False

        torch.manual_seed(seed)
        model.heads = nn.Sequential(
            nn.Linear(in_features=768,
                      out_features=num_classes)
        )

    return model, transforms

In [None]:
# EffNetB2 model
effnetb2, effnetb2_transforms = create_model(num_classes=3,
                                             seed=42,
                                             is_effnetb2=True)

In [None]:
# ViT model
vit, vit_transforms = create_model(num_classes=3,
                                   seed=42,
                                   is_effnetb2=False)

In [None]:
# create EffNetB2 dataloaders
train_dataloader_effnetb2, test_dataloader_effnetb2, class_names = data_setup.create_dataloaders(
    train_dir=train_dir,
    test_dir=test_dir,
    transforms=effnetb2_transforms,
    batch_size=32
)

In [None]:
# create ViT dataloaders
train_dataloder_vit, test_dataloader_vit, class_names = data_setup.create_dataloaders(
    train_dir=train_dir,
    test_dir=test_dir,
    transforms=vit_transforms,
    batch_size=32
)

## 1. Make and time predictions with both feature extractor models on the test dataset using the GPU (`device="cuda"`). Compare the model's prediction times on GPU vs CPU - does this close the gap between them? As in, does making predictions on the GPU make the ViT feature extractor prediction times closer to the EffNetB2 feature extractor prediction times?
- You'll find code to do these steps in [section 5. Making predictions with our trained models and timing them](https://www.learnpytorch.io/09_pytorch_model_deployment/#5-making-predictions-with-our-trained-models-and-timing-them) and [section 6. Comparing model results, prediction times and size](https://www.learnpytorch.io/09_pytorch_model_deployment/#6-comparing-model-results-prediction-times-and-size).

In [None]:
# get test data paths
test_data_paths = list(Path(test_dir).glob("*/*.jpg"))

In [None]:
# GPU model results
effnetb2_results_gpu = predictions.pred_and_store(paths=test_data_paths,
                                                  model=effnetb2,
                                                  transform=effnetb2_transforms,
                                                  class_names=class_names,
                                                  device="cuda")

vit_results_gpu = predictions.pred_and_store(paths=test_data_paths,
                                             model=vit,
                                             transform=vit_transforms,
                                             class_names=class_names,
                                             device="cuda")

In [None]:
# CPU model results
effnetb2_results_cpu = predictions.pred_and_store(paths=test_data_paths,
                                                  model=effnetb2,
                                                  transforms=effnetb2_transforms,
                                                  class_names=class_names,
                                                  device="cpu")

vit_results_cpu = predictions.pred_and_store(paths=test_data_paths,
                                             model=vit,
                                             transforms=vit_transforms,
                                             class_names=class_names,
                                             device="cpu")

## 2. The ViT feature extractor seems to have more learning capacity (due to more parameters) than EffNetB2, how does it go on the larger 20% split of the entire Food101 dataset?
- Train a ViT feature extractor on the 20% Food101 dataset for 5 epochs, just like we did with EffNetB2 in [section 10. Creating FoodVision Big](https://www.learnpytorch.io/09_pytorch_model_deployment/#10-creating-foodvision-big).


## 3. Make predictions across the 20% Food101 test dataset with the ViT feature extractor from exercise 2 and find the "most wrong" predictions.
- The predictions will be the ones with the highest prediction probability but with the wrong predicted label.
- Write a sentence or two about why you think the model got these predictions wrong.

## 4. Evaluate the ViT feature extractor across the whole Food101 test dataset rather than just the 20% version, how does it perform?
- Does it beat the original Food101 paper's best result of 56.4% accuracy?

## 5. Head to Paperswithcode.com and find the current best performing model on the Food101 dataset.
- What model architecture does it use?

## 6. Write down 1-3 potential failure points of our deployed FoodVision models and what some potential solutions might be.
- For example, what happens if someone was to upload a photo that wasn't of food to our FoodVision Mini model?

## 7. Pick any dataset from [`torchvision.datasets`](https://pytorch.org/vision/stable/datasets.html) and train a feature extractor model on it using a model from [`torchvision.models`](https://pytorch.org/vision/stable/models.html) (you could use one of the model's we've already created, e.g. EffNetB2 or ViT) for 5 epochs and then deploy your model as a Gradio app to Hugging Face Spaces.
- You may want to pick smaller dataset/make a smaller split of it so training doesn't take too long.
- I'd love to see your deployed models! So be sure to share them in Discord or on the [course GitHub Discussions page](https://github.com/mrdbourke/pytorch-deep-learning/discussions).