# 05. Going Modular: Part 2 (script mode)

This notebook is part 2/2 of section [05. Going Modular](https://www.learnpytorch.io/05_pytorch_going_modular/).

For reference, the two parts are:

1. [**05. Going Modular: Part 1 (cell mode)**](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/going_modular/05_pytorch_going_modular_cell_mode.ipynb) - this notebook is run as a traditional Jupyter Notebook/Google Colab notebook and is a condensed version of [notebook 04](https://www.learnpytorch.io/04_pytorch_custom_datasets/).
2. [**05. Going Modular: Part 2 (script mode)**](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/going_modular/05_pytorch_going_modular_script_mode.ipynb) - this notebook is the same as number 1 but with added functionality to turn each of the major sections into Python scripts, such as, `data_setup.py` and `train.py`.

Why two parts?

Because sometimes the best way to learn something is to see how it _differs_ from something else.

If you run each notebook side-by-side you'll see how they differ and that's where the key learnings are.


## What is script mode?

**Script mode** uses [Jupyter Notebook cell magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html) (special commands) to turn specific cells into Python scripts.

For example if you run the following code in a cell, you'll create a Python file called `hello_world.py`:

```
%%writefile hello_world.py
print("hello world, machine learning is fun!")
```

You could then run this Python file on the command line with:

```
python hello_world.py

>>> hello world, machine learning is fun!
```

The main cell magic we're interested in using is `%%writefile`.

Putting `%%writefile filename` at the top of a cell in Jupyter or Google Colab will write the contents of that cell to a specified `filename`.

> **Question:** Do I have to create Python files like this? Can't I just start directly with a Python file and skip using a Google Colab notebook?
>
> **Answer:** Yes. This is only _one_ way of creating Python scripts. If you know the kind of script you'd like to write, you could start writing it straight away. But since using Jupyter/Google Colab notebooks is a popular way of starting off data science and machine learning projects, knowing about the `%%writefile` magic command is a handy tip.


### PyTorch in the wild

For example, if you find a PyTorch project on GitHub, it may be structured in the following way:

```
pytorch_project/
├── pytorch_project/
│   ├── data_setup.py
│   ├── engine.py
│   ├── model.py
│   ├── train.py
│   └── utils.py
├── models/
│   ├── model_1.pth
│   └── model_2.pth
└── data/
    ├── data_folder_1/
    └── data_folder_2/
```

Here, the top level directory is called `pytorch_project` but you could call it whatever you want.

Inside there's another directory called `pytorch_project` which contains several `.py` files, the purposes of these may be:

- `data_setup.py` - a file to prepare data (and download data if needed).
- `engine.py` - a file containing various training functions.
- `model_builder.py` or `model.py` - a file to create a PyTorch model.
- `train.py` - a file to leverage all other files and train a target PyTorch model.
- `utils.py` - a file dedicated to helpful utility functions.

And the `models` and `data` directories could hold PyTorch models and data files respectively (though due to the size of models and data files, it's unlikely you'll find the _full_ versions of these on GitHub, these directories are present above mainly for demonstration purposes).

> **Note:** There are many different ways to structure a Python project and subsequently a PyTorch project. This isn't a guide on _how_ to structure your projects, only an example of how you _might_ come across PyTorch projects in the wild. For more on structuring Python projects, see Real Python's [_Python Application Layouts: A Reference_](https://realpython.com/python-application-layouts/) guide.


## What we're going to cover

By the end of this notebook you should finish with a directory structure of:

```
going_modular/
├── going_modular/
│   ├── data_setup.py
│   ├── engine.py
│   ├── model_builder.py
│   ├── train.py
│   └── utils.py
├── models/
│   ├── 05_going_modular_cell_mode_tinyvgg_model.pth
│   └── 05_going_modular_script_mode_tinyvgg_model.pth
└── data/
    └── pizza_steak_sushi/
        ├── train/
        │   ├── pizza/
        │   │   ├── image01.jpeg
        │   │   └── ...
        │   ├── steak/
        │   └── sushi/
        └── test/
            ├── pizza/
            ├── steak/
            └── sushi/
```

Using this directory structure, you should be able to train a model from within a notebook with the command:

```
!python going_modular/train.py
```

Or from the command line with:

```
python going_modular/train.py
```

In essence, we will have turned our helpful notebook code into **reusable modular code**.


## 0. Creating a folder for storing Python scripts

Since we're going to be creating Python scripts out of our most useful code cells, let's create a folder for storing those scripts.

We'll call the folder `going_modular` and create it using Python's [`os.makedirs()`](https://docs.python.org/3/library/os.html) method.


In [1]:
import os
from pathlib import Path

GOING_MODULAR = Path("going_modular/")
if GOING_MODULAR.is_dir():
    print(f"{GOING_MODULAR} directory exists.")
    
else:
    os.makedirs("going_modular", exist_ok = True)

going_modular directory exists.


## 1. Get data

We're going to start by downloading the same data we used in [notebook 04](https://www.learnpytorch.io/04_pytorch_custom_datasets/#1-get-data), the `pizza_steak_sushi` dataset with images of pizza, steak and sushi.


In [2]:
import os
import zipfile

from pathlib import Path

import requests

# Setup path to data folder
DATA_PATH = Path("data/")
IMG_PATH = DATA_PATH/ "pizza_steak_sushi"
ZIP_PATH = DATA_PATH/ "pizza_steak_sushi.zip"

# If the image folder doesn't exist, download it and prepare it...
if IMG_PATH.is_dir():
    print(f"{IMG_PATH} directory exists.")
else:
    print(f"Did not find {IMG_PATH} directory, creating one...")
    IMG_PATH.mkdir(parents=True, exist_ok=True)

# Download pizza, steak, sushi data
with open (ZIP_PATH, 'wb') as f:
    req = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip")
    print("Downloading pizza, steak, sushi data...")
    f.write(req.content)
    
# Unzip pizza, steak, sushi data
with zipfile.ZipFile(ZIP_PATH, 'r') as zip_ref:
    print("Unzipping pizza, steak, sushi data...") 
    zip_ref.extractall(IMG_PATH)
    
# Delete the zip file after extraction
ZIP_PATH.unlink()
print("Deleted ZIP file after extraction.")

data\pizza_steak_sushi directory exists.
Downloading pizza, steak, sushi data...
Unzipping pizza, steak, sushi data...
Deleted ZIP file after extraction.


In [3]:
# Setup train and testing paths
TRAIN_DIR = IMG_PATH/ "train"
TEST_DIR = IMG_PATH/ "test"

TRAIN_DIR, TEST_DIR

(WindowsPath('data/pizza_steak_sushi/train'),
 WindowsPath('data/pizza_steak_sushi/test'))

## 2. Create Datasets and DataLoaders

Now we'll turn the image dataset into PyTorch `Dataset`'s and `DataLoader`'s.


In [4]:
from torchvision import datasets, transforms

# Create simple transform
data_trans = transforms.Compose([
    transforms.Resize((64,64)),
    transforms.ToTensor()
])

# Use ImageFolder to create dataset(s)
train_dataset = datasets.ImageFolder(TRAIN_DIR,
                                    transform= data_trans,
                                    target_transform= None)

test_dataset = datasets.ImageFolder(TEST_DIR,
                                    transform=data_trans)

print(f"Train data:\n{train_dataset}\nTest data:\n{test_dataset}")

Train data:
Dataset ImageFolder
    Number of datapoints: 225
    Root location: data\pizza_steak_sushi\train
    StandardTransform
Transform: Compose(
               Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=True)
               ToTensor()
           )
Test data:
Dataset ImageFolder
    Number of datapoints: 75
    Root location: data\pizza_steak_sushi\test
    StandardTransform
Transform: Compose(
               Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=True)
               ToTensor()
           )


In [5]:
# Turn train and test Datasets into DataLoaders
from torch.utils.data import DataLoader

train_loader = DataLoader(train_dataset,
                        batch_size=1,
                        num_workers=1,
                        shuffle=True)

test_loader = DataLoader(test_dataset,
                        batch_size=1,
                        num_workers=1,
                        shuffle=False)

train_loader, test_loader

(<torch.utils.data.dataloader.DataLoader at 0x1c72f0792d0>,
 <torch.utils.data.dataloader.DataLoader at 0x1c72f079930>)

### 2.1 Create Datasets and DataLoaders (script mode)

Rather than rewriting all of the code above everytime we wanted to load data, we can turn it into a script called `data_setup.py`.

Let's capture all of the above functionality into a function called `create_dataloaders()`.


In [6]:
%%writefile going_modular/data_setup.py
"""
Contains functionality for creating PyTorch DataLoaders for 
image classification data.
"""

import os 

import torch
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

NUM_WORKERS = os.cpu_count()

def create_dataloader( 
    train_dir: str,
    test_dir: str,
    transform: transforms.Compose,
    batch_size: int,
    num_workers: int = NUM_WORKERS,
    transform_test: transforms.Compose = None
):
    """Creates training and testing DataLoaders.

    Takes in a training directory and testing directory path and turns
    them into PyTorch Datasets and then into PyTorch DataLoaders.

    Args:
    train_dir: Path to training directory.
    test_dir: Path to testing directory.
    transform: torchvision transforms to perform on training and testing data.
    batch_size: Number of samples per batch in each of the DataLoaders.
    num_workers: An integer for number of workers per DataLoader.
    transform_test (optional): specific transform for test dataset.
    
    Returns:
    A tuple of (train_dataloader, test_dataloader, class_names).
    Where class_names is a list of the target classes.
    Example usage:
        train_dataloader, test_dataloader, class_names = \
        = create_dataloaders(train_dir=path/to/train_dir,
                                test_dir=path/to/test_dir,
                                transform=some_transform,
                                batch_size=32,
                                num_workers=4)
    """
    
    # 1) Use ImageFolder to create dataset(s)
    train_dataset = datasets.ImageFolder(train_dir,
                                        transform= transform,
                                        target_transform= None)

    test_dataset = datasets.ImageFolder(test_dir,
                                        transform= transform_test if transform_test else transform)
    
    # 2) Get class_names
    class_names = train_dataset.classes
    
    # 3) Turn train and test Datasets into DataLoaders
    train_loader = DataLoader(train_dataset,
                            batch_size=batch_size,
                            num_workers=num_workers,
                            shuffle=True,
                            pin_memory=True)

    test_loader = DataLoader(test_dataset,
                            batch_size=batch_size,
                            num_workers=num_workers,
                            shuffle=False,
                            pin_memory=True)
    
    return train_loader, test_loader, class_names

Overwriting going_modular/data_setup.py


## 3. Making a model (TinyVGG)

We're going to use the same model we used in notebook 04: TinyVGG from the CNN Explainer website.

The only change here from notebook 04 is that a docstring has been added using [Google's Style Guide for Python](https://google.github.io/styleguide/pyguide.html#384-classes).


In [7]:
import torch
from torch import nn

class TinyVGG (nn.Module):
    """Creates the TinyVGG architecture.

    Replicates the TinyVGG architecture from the CNN explainer website in PyTorch.
    See the original architecture here: https://poloclub.github.io/cnn-explainer/

    Args:
    input_shape: An integer indicating number of input channels.
    hidden_units: An integer indicating number of hidden units between layers.
    output_shape: An integer indicating number of output units.
    """
    def __init__ (self, input_shape, hidden_units, output_shape):
        super().__init__()
        
        self.conv_block1 = nn.Sequential(
            nn.Conv2d(in_channels= input_shape,
                    out_channels= hidden_units,
                    
                    kernel_size= 3,
                    padding=1,
                    stride=1),
            
            nn.ReLU(),
            
            nn.Conv2d(in_channels= hidden_units,
                    out_channels= hidden_units,
                    
                    kernel_size= 3,
                    padding=1,
                    stride=1),
            
            nn.ReLU(),
            
            nn.MaxPool2d(2)
        )
        
        self.conv_block2 = nn.Sequential(
            nn.Conv2d(in_channels= hidden_units,
                    out_channels= hidden_units,
                    
                    kernel_size= 3,
                    padding=1,
                    stride=1),
            
            nn.ReLU(),
            
            nn.Conv2d(in_channels= hidden_units,
                    out_channels= hidden_units,
                    
                    kernel_size= 3,
                    padding=1,
                    stride=1),
            
            nn.ReLU(),
            
            nn.MaxPool2d(2)
        )
        
        with torch.no_grad():
            # This ensures the dummy input matches the input shape expected by the model.
            temp = torch.zeros(1, input_shape, 64, 64)
            dummy = self.conv_block2(self.conv_block1(temp))
            num_features = dummy.shape[1] * dummy.shape[2] * dummy.shape[3]
            
        self.classifier = nn.Sequential(
            nn.Flatten(),
            
            nn.Linear(in_features= num_features,
                    out_features= output_shape)
        )
        
    def forward(self, x):
        return self.classifier(self.conv_block2(self.conv_block1(x)))

In [8]:
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

# Instantiate an instance of the model
torch.manual_seed(42)
model_0 = TinyVGG(input_shape=3, # number of color channels (3 for RGB) 
                hidden_units=10, 
                output_shape=len(train_dataset.classes)).to(device)
model_0

TinyVGG(
  (conv_block1): Sequential(
    (0): Conv2d(3, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv_block2): Sequential(
    (0): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=2560, out_features=3, bias=True)
  )
)

In [9]:
# 1. Get a batch of images and labels from the DataLoader
img_batch, label_batch = next(iter(train_loader))

# 2. Get a single image from the batch and unsqueeze the image so its shape fits the model
img_single, label_single = img_batch[0].unsqueeze(dim=0), label_batch[0]
print(f"Single image shape: {img_single.shape}\n")

# 3. Perform a forward pass on a single image
model_0.eval()
with torch.inference_mode():
    pred = model_0(img_single.to(device))
    
# 4. Print out what's happening and convert model logits -> pred probs -> pred label
print(f"Output logits:\n{pred}\n")
print(f"Output prediction probabilities:\n{torch.softmax(pred, dim=1)}\n")
print(f"Output prediction label:\n{torch.argmax(torch.softmax(pred, dim=1), dim=1)}\n")
print(f"Actual label:\n{label_single}")

Single image shape: torch.Size([1, 3, 64, 64])

Output logits:
tensor([[0.0578, 0.0634, 0.0351]], device='cuda:0')

Output prediction probabilities:
tensor([[0.3352, 0.3371, 0.3277]], device='cuda:0')

Output prediction label:
tensor([1], device='cuda:0')

Actual label:
2


### 3.1 Making a model (TinyVGG) (script mode)

Over the past few notebooks (notebook 03 and notebook 04), we've built the TinyVGG model a few times.

So it makes sense to put the model into its file so we can reuse it again and again.

Let's put our `TinyVGG()` model class into a script called `model_builder.py` with the line `%%writefile going_modular/model_builder.py`. 

In [10]:
%%writefile going_modular/model_builder.py
"""
Contains PyTorch model code to instantiate a TinyVGG model.
"""

import torch
from torch import nn

class TinyVGG(nn.Module):
    """Creates the TinyVGG architecture.

    Replicates the TinyVGG architecture from the CNN explainer website in PyTorch.
    See the original architecture here: https://poloclub.github.io/cnn-explainer/

    Args:
        input_shape (int): Number of input channels (e.g., 3 for RGB images).
        hidden_units (int): Number of output channels (filters) for convolutional layers.
        output_shape (int): Number of output classes for classification.
    """
    def __init__(self, input_shape: int,
                hidden_units: int,
                output_shape: int):
        super().__init__()

        # First convolutional block
        self.conv_block1 = nn.Sequential(
            # First convolution layer
            nn.Conv2d(
                in_channels=input_shape,
                out_channels=hidden_units,
                kernel_size=3,
                stride=1,
                padding=1  # preserve spatial size
            ),
            nn.ReLU(),

            # Second convolution layer
            nn.Conv2d(
                in_channels=hidden_units,
                out_channels=hidden_units,
                kernel_size=3,
                stride=1,
                padding=1
            ),
            nn.ReLU(),

            # Downsample feature map (halves height and width)
            nn.MaxPool2d(kernel_size=2)
        )

        # Second convolutional block
        self.conv_block2 = nn.Sequential(
            nn.Conv2d(
                in_channels=hidden_units,
                out_channels=hidden_units,
                kernel_size=3,
                stride=1,
                padding=1
            ),
            nn.ReLU(),

            nn.Conv2d(
                in_channels=hidden_units,
                out_channels=hidden_units,
                kernel_size=3,
                stride=1,
                padding=1
            ),
            nn.ReLU(),

            nn.MaxPool2d(kernel_size=2)
        )

        # Automatically calculate the number of features going into the linear layer
        with torch.no_grad():
            # Simulate a dummy input tensor to pass through the conv layers
            temp = torch.zeros(1, input_shape, 64, 64)  # batch size 1, 64x64 image
            dummy = self.conv_block2(self.conv_block1(temp))
            num_features = dummy.shape[1] * dummy.shape[2] * dummy.shape[3]  # flatten dims

        # Final classification layer
        self.classifier = nn.Sequential(
            nn.Flatten(),  # Flatten 3D feature map to 1D vector
            nn.Linear(
                in_features=num_features,
                out_features=output_shape  # One output per class
            )
        )

    def forward(self, x):
        """Defines the forward pass of the network."""
        # Pass input through both convolutional blocks and the classifier
        return self.classifier(self.conv_block2(self.conv_block1(x)))

Overwriting going_modular/model_builder.py


In [11]:
import torch
from going_modular import model_builder

device = 'cuda' if torch.cuda.is_available() else 'cpu'

model_1 = model_builder.TinyVGG(input_shape= 3,
                                hidden_units=10,
                                output_shape= len(train_dataset.classes)).to(device)

model_1

TinyVGG(
  (conv_block1): Sequential(
    (0): Conv2d(3, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv_block2): Sequential(
    (0): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=2560, out_features=3, bias=True)
  )
)

In [12]:
# 1. Get a batch of images and labels from the DataLoader
img_batch, label_batch = next(iter(train_loader))

# 2. Get a single image from the batch and unsqueeze the image so its shape fits the model
img_single, label_single = img_batch[0].unsqueeze(dim=0), label_batch[0]
print(f"Single image shape: {img_single.shape}\n")

# 3. Perform a forward pass on a single image
model_1.eval()
with torch.inference_mode():
    pred = model_1(img_single.to(device))
    
# 4. Print out what's happening and convert model logits -> pred probs -> pred label
print(f"Output logits:\n{pred}\n")
print(f"Output prediction probabilities:\n{torch.softmax(pred, dim=1)}\n")
print(f"Output prediction label:\n{torch.argmax(torch.softmax(pred, dim=1), dim=1)}\n")
print(f"Actual label:\n{label_single}")

Single image shape: torch.Size([1, 3, 64, 64])

Output logits:
tensor([[ 0.0352, -0.0059,  0.0215]], device='cuda:0')

Output prediction probabilities:
tensor([[0.3394, 0.3258, 0.3348]], device='cuda:0')

Output prediction label:
tensor([0], device='cuda:0')

Actual label:
1


## 4. Creating `train_step()` and `test_step()` functions and `train()` to combine them

Rather than writing them again, we can reuse the `train_step()` and `test_step()` functions from [notebook 04](https://www.learnpytorch.io/04_pytorch_custom_datasets/#75-create-train-test-loop-functions).

The same goes for the `train()` function we created.

The only difference here is that these functions have had docstrings added to them in [Google's Python Functions and Methods Style Guide](https://google.github.io/styleguide/pyguide.html#383-functions-and-methods).

Let's start by making `train_step()`.


In [13]:
import torchinfo
    
from torchinfo import summary
summary(model= model_0,
        input_size=[1, 3, 64, 64])

Layer (type:depth-idx)                   Output Shape              Param #
TinyVGG                                  [1, 3]                    --
├─Sequential: 1-1                        [1, 10, 32, 32]           --
│    └─Conv2d: 2-1                       [1, 10, 64, 64]           280
│    └─ReLU: 2-2                         [1, 10, 64, 64]           --
│    └─Conv2d: 2-3                       [1, 10, 64, 64]           910
│    └─ReLU: 2-4                         [1, 10, 64, 64]           --
│    └─MaxPool2d: 2-5                    [1, 10, 32, 32]           --
├─Sequential: 1-2                        [1, 10, 16, 16]           --
│    └─Conv2d: 2-6                       [1, 10, 32, 32]           910
│    └─ReLU: 2-7                         [1, 10, 32, 32]           --
│    └─Conv2d: 2-8                       [1, 10, 32, 32]           910
│    └─ReLU: 2-9                         [1, 10, 32, 32]           --
│    └─MaxPool2d: 2-10                   [1, 10, 16, 16]           --
├─Sequentia

In [14]:
from typing import Tuple
import torchmetrics 

def train_step (model: torch.nn.Module,
                dataloader: torch.utils.data.DataLoader,
                loss_fn: torch.nn.Module ,
                acc_fn: torchmetrics.Accuracy,
                optimizer: torch.optim.Optimizer,
                device: torch.device) -> Tuple[float, float]:
    
    """Trains a PyTorch model for a single epoch.

    Turns a target PyTorch model to training mode and then
    runs through all of the required training steps (forward
    pass, loss calculation, optimizer step).

    Args:
    model: A PyTorch model to be trained.
    dataloader: A DataLoader instance for the model to be trained on.
    loss_fn: A PyTorch loss function to minimize.
    optimizer: A PyTorch optimizer to help minimize the loss function.
    device: A target device to compute on (e.g. "cuda" or "cpu").

    Returns:
    A tuple of training loss and training accuracy metrics.
    In the form (train_loss, train_accuracy). For example:

    (0.1112, 0.8743)
    """
    model.train()
    loss_total, acc_total = 0,0
    
    for batch, (x, y) in enumerate(dataloader):
        x,y = x.to(device), y.to(device)

        y_pred = model (x)
        
        loss = loss_fn(y_pred, y)
        loss_total += int(loss)
        acc_fn.update(y_pred.argmax(dim=1), y)
        
        optimizer.zero_grad()
        
        loss.backward()
        
        optimizer.step()
        
    # Calculate loss and accuracy per epoch and print out what's happening
    loss_total /= len(dataloader)
    acc_total = acc_fn.compute().item()
    acc_total *= 100
    acc_fn.reset()  
    
    return loss_total, acc_total   

  from .autonotebook import tqdm as notebook_tqdm


In [15]:
import torchmetrics 

def test_step ( model: torch.nn.Module,
                dataloader: torch.utils.data.DataLoader,
                loss_fn: torch.nn.Module ,
                acc_fn: torchmetrics.Accuracy,
                device: torch.device) -> Tuple[float, float]:
    
    """Tests a PyTorch model for a single epoch.

    Turns a target PyTorch model to "eval" mode and then performs
    a forward pass on a testing dataset.

    Args:
    model: A PyTorch model to be tested.
    dataloader: A DataLoader instance for the model to be tested on.
    loss_fn: A PyTorch loss function to calculate loss on the test data.
    device: A target device to compute on (e.g. "cuda" or "cpu").

    Returns:
    A tuple of testing loss and testing accuracy metrics.
    In the form (test_loss, test_accuracy). For example:

    (0.0223, 0.8985)
    """
    
    model.eval()
    
    loss_total, acc_total = 0,0
    
    with torch.inference_mode():
        for batch, (x, y) in enumerate(dataloader):
            x,y = x.to(device), y.to(device)

            y_pred = model (x)
            
            loss = loss_fn(y_pred, y)
            loss_total += int(loss)
            acc_fn.update(y_pred.argmax(dim=1), y)
        
    # Calculate loss and accuracy per epoch and print out what's happening
    loss_total /= len(dataloader)
    acc_total = acc_fn.compute().item()
    acc_total *= 100
    acc_fn.reset()  
    
    return loss_total, acc_total    

In [16]:
from tqdm.auto import tqdm
from torchmetrics import Accuracy
import torch
from typing import Dict, List

from tqdm.auto import tqdm

# 1. Take in various parameters required for training and test steps
def train(model: torch.nn.Module, 
        train_dataloader: torch.utils.data.DataLoader, 
        test_dataloader: torch.utils.data.DataLoader, 
        
        optimizer: torch.optim.Optimizer,
        loss_fn: torch.nn.Module = nn.CrossEntropyLoss(),
        
        epochs: int = 5,
        device: torch.device = 'cpu') -> Dict[str, List[float]]:
    
    """Trains and tests a PyTorch model.

    Passes a target PyTorch models through train_step() and test_step()
    functions for a number of epochs, training and testing the model
    in the same epoch loop.

    Calculates, prints and stores evaluation metrics throughout.

    Args:
    model: A PyTorch model to be trained and tested.
    train_dataloader: A DataLoader instance for the model to be trained on.
    test_dataloader: A DataLoader instance for the model to be tested on.
    optimizer: A PyTorch optimizer to help minimize the loss function.
    loss_fn: A PyTorch loss function to calculate loss on both datasets.
    epochs: An integer indicating how many epochs to train for.
    device: A target device to compute on (e.g. "cuda" or "cpu").

    Returns:
    A dictionary of training and testing loss as well as training and
    testing accuracy metrics. Each metric has a value in a list for 
    each epoch.
    In the form: {train_loss: [...],
                    train_acc: [...],
                    test_loss: [...],
                    test_acc: [...]} 
    For example if training for epochs=2: 
                    {train_loss: [2.0616, 1.0537],
                    train_acc: [0.3945, 0.3945],
                    test_loss: [1.2641, 1.5706],
                    test_acc: [0.3400, 0.2973]} 
    """
    
    acc_fn = Accuracy(task="multiclass", num_classes=3).to(device)
    
    # 2. Create empty results dictionary
    results = {"train_loss": [],
        "train_acc": [],
        "test_loss": [],
        "test_acc": []
    }
    
    # 3. Loop through training and testing steps for a number of epochs
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(model=model,
                                        dataloader=train_dataloader,
                                        loss_fn=loss_fn,
                                        optimizer=optimizer,
                                        acc_fn = acc_fn,
                                        device= device)
        test_loss, test_acc = test_step(model=model,
            dataloader=test_dataloader,
            loss_fn=loss_fn,
            acc_fn= acc_fn,
            device= device)
        
        # 4. Print out what's happening
        print(
            f"Epoch: {epoch+1} | "
            f"train_loss: {train_loss:.4f} | "
            f"train_acc: {train_acc:.4f} | "
            f"test_loss: {test_loss:.4f} | "
            f"test_acc: {test_acc:.4f}"
        )

        # 5. Update results dictionary
        # Ensure all data is moved to CPU and converted to float for storage
        results["train_loss"].append(train_loss.item() if isinstance(train_loss, torch.Tensor) else train_loss)
        results["train_acc"].append(train_acc.item() if isinstance(train_acc, torch.Tensor) else train_acc)
        results["test_loss"].append(test_loss.item() if isinstance(test_loss, torch.Tensor) else test_loss)
        results["test_acc"].append(test_acc.item() if isinstance(test_acc, torch.Tensor) else test_acc)

    # 6. Return the filled results at the end of the epochs
    return results

### 4.1 Creating `train_step()` and `test_step()` functions and `train()` to combine them (script mode)   

To create a script for `train_step()`, `test_step()` and `train()`, we'll combine their code all into a single cell.

We'll then write that cell to a file called `engine.py` because these functions will be the "engine" of our training pipeline.

We can do so with the magic line `%%writefile going_modular/engine.py`.

We'll also make sure to put all the imports we need (`torch`, `typing`, and `tqdm`) at the top of the cell.

In [17]:
%%writefile going_modular/engine.py
"""
Contains functions for training and testing a PyTorch model.
"""

from typing import Dict, Tuple, List
import torch 
from torch import nn
from tqdm.auto import tqdm
import torchmetrics
from torchmetrics import Accuracy


# ------------------------ TRAINING STEP ------------------------
def train_step(model: torch.nn.Module,
               dataloader: torch.utils.data.DataLoader,
               loss_fn: torch.nn.Module,
               acc_fn: torchmetrics.Accuracy,
               optimizer: torch.optim.Optimizer,
               device: torch.device) -> Tuple[float, float]:
    """
    Trains a PyTorch model for a single epoch.

    Args:
        model: A PyTorch model to be trained.
        dataloader: A DataLoader providing batches of training data.
        loss_fn: The loss function used for optimization (e.g., CrossEntropyLoss).
        acc_fn: A torchmetrics Accuracy instance to track accuracy.
        optimizer: An optimizer instance (e.g., SGD, Adam).
        device: The device to run computations on (CPU or CUDA).

    Returns:
        Tuple of average training loss and accuracy over the epoch.
    """
    model.train()  # Set model to training mode
    loss_total, acc_total = 0, 0

    # Loop over training batches
    for batch, (x, y) in enumerate(dataloader):
        # Move input and target to the target device
        x, y = x.to(device), y.to(device)

        # Forward pass - get model predictions
        y_pred = model(x)

        # Compute loss
        loss = loss_fn(y_pred, y)
        loss_total += int(loss)

        # Update accuracy metric
        acc_fn.update(y_pred.argmax(dim=1), y)

        # Clear previous gradients
        optimizer.zero_grad()

        # Backpropagation - calculate gradients
        loss.backward()

        # Update model weights
        optimizer.step()

        # Optional logging every 400 batches
        if batch % 400 == 0:
            print(f"Looked at {batch * len(x)}/{len(dataloader.dataset)} samples")

    # Compute average loss and accuracy for the entire epoch
    loss_total /= len(dataloader)
    acc_total = acc_fn.compute().item() * 100  # Convert to %
    acc_fn.reset()  # Reset metric for next epoch

    return loss_total, acc_total


# ------------------------ TESTING STEP ------------------------
def test_step(model: torch.nn.Module,
              dataloader: torch.utils.data.DataLoader,
              loss_fn: torch.nn.Module,
              acc_fn: torchmetrics.Accuracy,
              device: torch.device) -> Tuple[float, float]:
    """
    Evaluates the model on the test dataset.

    Args:
        model: A PyTorch model to be evaluated.
        dataloader: A DataLoader for test data.
        loss_fn: Loss function to evaluate prediction error.
        acc_fn: Accuracy metric from torchmetrics.
        device: Device to compute on.

    Returns:
        Tuple of average test loss and accuracy.
    """
    model.to(device)
    model.eval()  # Turn off dropout, batchnorm, etc.
    loss_total, acc_total = 0, 0

    # Disable gradient calculation for inference
    with torch.inference_mode():
        for batch, (x, y) in enumerate(dataloader):
            x, y = x.to(device), y.to(device)

            # Get predictions
            y_pred = model(x)

            # Compute loss and update accumulators
            loss = loss_fn(y_pred, y)
            loss_total += int(loss)
            acc_fn.update(y_pred.argmax(dim=1), y)

    # Compute epoch-level loss and accuracy
    loss_total /= len(dataloader)
    acc_total = acc_fn.compute().item() * 100  # Convert to %
    acc_fn.reset()  # Reset metric for reuse

    return loss_total, acc_total


# ------------------------ TRAINING LOOP ------------------------
def train(model: torch.nn.Module,
          train_dataloader: torch.utils.data.DataLoader,
          test_dataloader: torch.utils.data.DataLoader,
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module = nn.CrossEntropyLoss(),
          epochs: int = 5,
          device: torch.device = 'cpu') -> Dict[str, List[float]]:
    """
    Runs the full training loop: training + testing for multiple epochs.

    Args:
        model: PyTorch model to train and evaluate.
        train_dataloader: DataLoader for training data.
        test_dataloader: DataLoader for test data.
        optimizer: Optimization algorithm.
        loss_fn: Loss function (default = CrossEntropyLoss).
        epochs: Total number of epochs to train.
        device: Target computation device.

    Returns:
        Dictionary containing loss and accuracy history for training and testing.
    """

    # Initialize accuracy metric for classification task
    acc_fn = Accuracy(task="multiclass", num_classes=3).to(device)

    # Initialize history dictionary for storing results
    results = {
        "train_loss": [],
        "train_acc": [],
        "test_loss": [],
        "test_acc": []
    }

    # Loop through epochs
    for epoch in tqdm(range(epochs)):
        # Training step
        train_loss, train_acc = train_step(
            model=model,
            dataloader=train_dataloader,
            loss_fn=loss_fn,
            optimizer=optimizer,
            acc_fn=acc_fn,
            device=device
        )

        # Display training results
        print("\n")
        print("\033[91m======================================================\033[0m")
        print(f"\033[94mTrain loss: {train_loss:.5f} | Train accuracy: {train_acc:.2f}%\033[0m")

        # Testing step
        test_loss, test_acc = test_step(
            model=model,
            dataloader=test_dataloader,
            loss_fn=loss_fn,
            acc_fn=acc_fn,
            device=device
        )

        # Display testing results
        print(f"\033[92mTest loss: {test_loss:.5f} | Test accuracy: {test_acc:.2f}%\033[0m")
        print("\033[91m======================================================\033[0m")
        print("\n")

        # Save results to history (converted to CPU floats if needed)
        results["train_loss"].append(train_loss.item() if isinstance(train_loss, torch.Tensor) else train_loss)
        results["train_acc"].append(train_acc.item() if isinstance(train_acc, torch.Tensor) else train_acc)
        results["test_loss"].append(test_loss.item() if isinstance(test_loss, torch.Tensor) else test_loss)
        results["test_acc"].append(test_acc.item() if isinstance(test_acc, torch.Tensor) else test_acc)

    # Return metrics for analysis/visualization
    return results


Overwriting going_modular/engine.py


## 5. Creating a function to save the model

Let's setup a function to save our model to a directory.


In [18]:
from pathlib import Path

def save_model(model: torch.nn.Module,
                target_dir: str,
                model_name: str):
    """Saves a PyTorch model to a target directory.

    Args:
    model: A target PyTorch model to save.
    target_dir: A directory for saving the model to.
    model_name: A filename for the saved model. Should include
        either ".pth" or ".pt" as the file extension.

    Example usage:
    save_model(model=model_0,
                target_dir="models",
                model_name="05_going_modular_tingvgg_model.pth")
    """
    # Create target directory
    target_dir_path = Path(target_dir)
    target_dir_path.mkdir(parents=True,
                        exist_ok=True)

    # Create model save path
    assert model_name.endswith(".pth") or model_name.endswith(".pt"), "model_name should end with '.pt' or '.pth'"
    model_save_path = target_dir_path / model_name

    # Save the model state_dict()
    print(f"[INFO] Saving model to: {model_save_path}")
    torch.save(obj=model.state_dict(),
                f=model_save_path)

### 5.1 Creating a function to save the model (script mode)

How about we add our `save_model()` function to a script called `utils.py` which is short for "utilities".

We can do so with the magic line `%%writefile going_modular/utils.py`.

In [19]:
%%writefile going_modular/utils.py
"""
Contains various utility functions for PyTorch model training and saving.
"""

from pathlib import Path
import torch

def save_model(model: torch.nn.Module,
                target_dir: str,
                model_name: str):
    """
    Saves a PyTorch model's state_dict to a specified directory.

    Args:
        model (torch.nn.Module): The PyTorch model to be saved.
        target_dir (str): The directory where the model should be saved.
        model_name (str): The name for the saved model file. Should end with '.pt' or '.pth'.

    Example:
        save_model(model=model_0,
                target_dir="models",
                model_name="05_going_modular_tinyvgg_model.pth")
    """

    # Convert target_dir to a Path object and create the directory if it doesn't exist
    target_dir_path = Path(target_dir)
    target_dir_path.mkdir(parents=True, exist_ok=True)

    # Check the file extension is valid
    assert model_name.endswith(".pth") or model_name.endswith(".pt"), \
        "❌ Error: model_name should end with '.pt' or '.pth'"

    # Create the full path to where the model will be saved
    model_save_path = target_dir_path / model_name

    # Save only the state_dict (recommended for most use cases)
    torch.save(obj=model.state_dict(), f=model_save_path)

    print(f"[INFO] ✅ Model saved to: {model_save_path}")



Overwriting going_modular/utils.py


## 6. Train, evaluate and save the model

Let's leverage the functions we've got above to train, test and save a model to file.


In [20]:
# Set random seeds
torch.manual_seed(42) 
torch.cuda.manual_seed(42)

# Set number of epochs
NUM_EPOCHS = 5

# Recreate an instance of TinyVGG
model_0 = TinyVGG(input_shape=3, # number of color channels (3 for RGB) 
                    hidden_units=10, 
                    output_shape=len(train_dataset.classes)).to(device)

# Setup loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params=model_0.parameters(), lr=0.001)

# Start the timer
from timeit import default_timer as timer 
start_time = timer()

# Train model_0 
# model_0_results = train(model=model_0, 
#                         train_dataloader=train_loader,
#                         test_dataloader=test_loader,
#                         optimizer=optimizer,
#                         loss_fn=loss_fn, 
#                         epochs=NUM_EPOCHS,
#                         device=device)

# End the timer and print out how long it took
end_time = timer()
print(f"[INFO] Total training time: {end_time-start_time:.3f} seconds")

# Save the model
save_model(model=model_0,
            target_dir="models",
            model_name="05_going_modular_cell_mode_tinyvgg_model.pth")

[INFO] Total training time: 0.000 seconds
[INFO] Saving model to: models\05_going_modular_cell_mode_tinyvgg_model.pth


### 6.1 Train, evaluate and save the model (script mode)

Let's combine all of our modular files into a single script `train.py`.

This will allow us to run all of the functions we've written with a single line of code on the command line:

`python going_modular/train.py`

Or if we're running it in a notebook:

`!python going_modular/train.py`

We'll go through the following steps:
1. Import the various dependencies, namely `torch`, `os`, `torchvision.transforms` and all of the scripts from the `going_modular` directory, `data_setup`, `engine`, `model_builder`, `utils`.
  * **Note:** Since `train.py` will be *inside* the `going_modular` directory, we can import the other modules via `import ...` rather than `from going_modular import ...`.
2. Setup various hyperparameters such as batch size, number of epochs, learning rate and number of hidden units (these could be set in the future via [Python's `argparse`](https://docs.python.org/3/library/argparse.html)).
3. Setup the training and test directories.
4. Setup device-agnostic code.
5. Create the necessary data transforms.
6. Create the DataLoaders using `data_setup.py`.
7. Create the model using `model_builder.py`.
8. Setup the loss function and optimizer.
9. Train the model using `engine.py`.
10. Save the model using `utils.py`. 

In [21]:
%%writefile going_modular/train.py
"""
Trains a PyTorch image classification model using device-agnostic code.
"""

import os
import torch
from torchvision import transforms
from going_modular import data_setup, engine, model_builder, utils

def init(num_epochs: int = 3,
        batch_size: int = 32,
        hidden_units: int = 10,
        learning_rate: float = 0.001,
        train_dir: str = "data/pizza_steak_sushi/train",
        test_dir: str = "data/pizza_steak_sushi/test"):
    """
    Initializes training pipeline using modular components.

    Args:
        num_epochs (int): Number of epochs to train the model.
        batch_size (int): Number of samples per batch for training/testing.
        hidden_units (int): Number of hidden units in the TinyVGG model.
        learning_rate (float): Learning rate for optimizer.
        train_dir (str): Path to training data.
        test_dir (str): Path to testing data.
    """

    # Setup hyperparameters for easy tracking and reference
    HYPER_PARAMS = {
        'NUM_EPOCHS': num_epochs,
        'BATCH_SIZE': batch_size,
        'HIDDEN_UNITS': hidden_units,
        'LEARNING_RATE': learning_rate
    }

    # Detect the device (GPU if available, otherwise CPU)
    device = "cuda" if torch.cuda.is_available() else "cpu"

    # Define image transforms (resize + convert to tensor)
    data_transform = transforms.Compose([
        transforms.Resize((64, 64)),  # Resize all images to 64x64
        transforms.ToTensor()         # Convert PIL image to PyTorch tensor
    ])

    # Create training and testing dataloaders
    train_dataloader, test_dataloader, class_names = data_setup.create_dataloader(
        train_dir=train_dir,
        test_dir=test_dir,
        transform=data_transform,
        batch_size=HYPER_PARAMS["BATCH_SIZE"]
    )
    print("✅ Dataloaders created successfully!")

    # Build the TinyVGG model based on the provided parameters
    model = model_builder.TinyVGG(
        input_shape=3,  # RGB images
        hidden_units=HYPER_PARAMS["HIDDEN_UNITS"],
        output_shape=len(class_names)  # Number of classes
    ).to(device)
    print("✅ Model built successfully!")

    # Set up the loss function and optimizer
    loss_fn = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(),
                                lr=HYPER_PARAMS["LEARNING_RATE"])

    # Train and evaluate the model using the engine module
    engine.train(
        model=model,
        train_dataloader=train_dataloader,
        test_dataloader=test_dataloader,
        loss_fn=loss_fn,
        optimizer=optimizer,
        epochs=HYPER_PARAMS["NUM_EPOCHS"],
        device=device
    )

    # Save the trained model using a utility function
    utils.save_model(
        model=model,
        target_dir="models",  # Save directory
        model_name="05_going_modular_script_mode_tinyvgg_model.pth"
    )
    print("✅ Model saved successfully!")

# When running the script directly, start training
if __name__ == "__main__":
    init()

    
    

Overwriting going_modular/train.py


Now our final directory structure looks like:
```
data/
  pizza_steak_sushi/
    train/
      pizza/
        train_image_01.jpeg
        train_image_02.jpeg
        ...
      steak/
      sushi/
    test/
      pizza/
        test_image_01.jpeg
        test_image_02.jpeg
        ...
      steak/
      sushi/
going_modular/
  data_setup.py
  engine.py
  model_builder.py
  train.py
  utils.py
models/
  saved_model.pth
```

Now to put it all together!

Let's run our `train.py` file from the command line with:

```
!python going_modular/train.py
```


In [22]:
from going_modular import train

train.init()

✅ Dataloaders created successfully!
✅ Model built successfully!


  0%|          | 0/3 [00:00<?, ?it/s]

Looked at 0/225 samples


[94mTrain loss: 1.00000 | Train accuracy: 29.33%[0m


 33%|███▎      | 1/3 [00:36<01:12, 36.23s/it]

[92mTest loss: 1.00000 | Test accuracy: 33.33%[0m


Looked at 0/225 samples


[94mTrain loss: 1.00000 | Train accuracy: 34.67%[0m


 67%|██████▋   | 2/3 [01:14<00:37, 37.72s/it]

[92mTest loss: 0.66667 | Test accuracy: 33.33%[0m


Looked at 0/225 samples


[94mTrain loss: 1.00000 | Train accuracy: 34.67%[0m


100%|██████████| 3/3 [01:55<00:00, 38.54s/it]

[92mTest loss: 1.00000 | Test accuracy: 33.33%[0m


[INFO] ✅ Model saved to: models\05_going_modular_script_mode_tinyvgg_model.pth
✅ Model saved successfully!



