# PyTorch Model Notebook #
This notebook will allow you to get practice in building and working with PyTorch models.  Code excersises denoted by a problem number (i.e. Problem #1) will include a task and a code block that asks for your solution.  These blocks will be denoted by comments of the form '# YOUR CODE HERE #'.  The code immediately following include assertions that are used to check completeness of the response.  They will raise an exception if the previous solution is not complete or not correct.

## Datasets and DataLoaders

Reference:  The Linux Foundation, "Datasets & DataLoaders - PyTorch Tutorials 2.6.0 +cu124 documentation," pytorch.org https://pytorch.org/tutorials/beginner/basics/data_tutorial.html (accessed Mar. 20, 2025).

In [1]:
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

**Problem #1:**  Finish implementing the "RandDataset" Dataset by 1) setting "self.mapping" to a random tensor of dimension (output_dims, input_dims), 2) implementing the '\_\_len\_\_' method by returning the length of the dataset, and 3) setting the **output_tensor** = **Mx** in the '\_\_getitem\_\_' method, where **M** is the "self.mapping" tensor and **x** is the "input_tensor".  Also remember to implement the "self.target_transform" (if not None) on the "output_tensor", analagous to the "self.transform" already implemented.

In [2]:
class RandDataset(Dataset):
    def __init__(self, input_dims, output_dims, length, transform=None, target_transform=None):
        self.input_dims = input_dims
        self.output_dims = output_dims
        self.transform = transform
        self.target_transform = target_transform
        ### BEGIN SOLUTION
        self.mapping = torch.rand(output_dims, input_dims)  #Hint this is the random mapping tensor
        ### END SOLUTION
        self.length = length
    ### BEGIN SOLUTION
    def __len__(self):
        return self.length
    ### END SOLUTION

    def __getitem__(self, idx):
        input_tensor = torch.rand(self.input_dims)
        if self.transform:
            input_tensor = self.transform(input_tensor)
        ### BEGIN SOLUTION
        output_tensor = self.mapping.matmul(input_tensor)
        if self.target_transform:
            output_tensor = self.target_transform(output_tensor)
        ### END SOLUTION
        return input_tensor, output_tensor

assert len(RandDataset(5,10,1000)) == 1000
assert (RandDataset(5, 10, 1000)).mapping.shape == (10,5)
assert (RandDataset(5, 10, 1000))[1][1].shape[0] == 10
assert ((RandDataset(5, 10, 1000, target_transform=lambda x: x + 20))[1][1] > 20).all()

**Problem #2:**  Instatiate the RandDataset class with a length=32000 and variables "input_dims" and "output_dims".  Set variables named "input_dims" and "output_dims" to apropriate values and use in the RandDataset instantiation call. Also, instantiate a DataLoader using this dataset object using a "batch_size" of 32, already implemented. Name this dataloader object "rand_dataset".

In [3]:
batch_size = 32
### BEGIN SOLUTION
input_dims, output_dims = 5, 10
rand_dataset = RandDataset(input_dims, output_dims, 32000)
rand_dataloader = DataLoader(rand_dataset, batch_size=batch_size)
### END SOLUTION

assert input_dims > 0 and output_dims > 0
assert len(rand_dataloader.dataset) == 32000
assert rand_dataloader.batch_size == 32
assert len(rand_dataloader) == 1000

**Setting the train and test dataloaders from above**

In [4]:
import copy

train_dataloader = rand_dataloader

test_dataset = copy.deepcopy(train_dataloader.dataset)
test_dataset.length = 1000
test_dataloader = DataLoader(test_dataset, batch_size=batch_size)

assert len(test_dataloader.dataset) == 1000
assert (train_dataloader.dataset.mapping == test_dataloader.dataset.mapping).all()

## Building the PyTorch Model ##

Reference:  The Linux Foundation, "Build the Neural Network - PyTorch Tutorials 2.6.0 +cu124 documentation," pytorch.org https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html (accessed Mar. 13, 2025).

**Problem #3:**  Implement a Pytorch model class named "NNModel". Fill in the instantiation of the model's layers, which should include a nn.Linear, nn.ReLU, nn.Linear, nn.ReLU, and nn.Linear layers.  There should be **n** input neurons, **h** hidden neurons, and **m** output neurons.  Hint: Both hidden linear layers (first two) should have **h** neurons. Implement the forward computation of the model using the **input_tensor** as input and return the result.

In [5]:
from torch import nn

class NNModel(nn.Module):
    def __init__(self,n,m,h):
        super().__init__()

        ### BEGIN SOLUTION
        self.stack = nn.Sequential(
             nn.Linear(n,h),
             nn.ReLU(),
             nn.Linear(h,h),
             nn.ReLU(),
             nn.Linear(h, m),
        )
        ### END SOLUTION
    
    def forward(self, input_tensor):
        ### BEGIN SOLUTION
        y = self.stack(input_tensor)
        return y
        ### END SOLUTION

import re
assert ((NNModel(input_dims, output_dims, 50))(torch.rand(5,input_dims))).shape == (5,output_dims)
assert len(re.findall('Linear', str(NNModel(input_dims, output_dims, 50)))) == 3
assert len(re.findall('ReLU()', str(NNModel(input_dims, output_dims, 50)))) == 2

## Optimizing the PyTorch Model ##

Reference:  The Linux Foundation, "Optimizing Model Parameters - PyTorch Tutorials 2.6.0 +cu124 documentation," pytorch.org https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html (accessed Mar. 24, 2025).

**Problem #4:**  Instantiate the nn.MSELoss function with reduction='sum' and name the object, "my_loss_fn".  Instantiate the NNModel using "input_dims", "output_dims", and any number of hidden neurons and name the object, "my_model".   Instantiate the optim.SGD optimizer with the model parameters and the learning_rate defined above and name the object, "my_optimizer".

In [6]:
from torch import optim

learning_rate = 1e-2
epochs = 50
tolerance = 1e-2

### BEGIN SOLUTION
my_loss_fn = nn.MSELoss(reduction='sum')
my_model = NNModel(input_dims, output_dims, 20)
my_optimizer = optim.SGD(my_model.parameters(), lr=learning_rate)
### END SOLUTION

assert isinstance(my_loss_fn, nn.MSELoss)
assert my_loss_fn.reduction == 'sum'
assert isinstance(my_model, NNModel)
assert isinstance(my_optimizer, optim.SGD)

**Problem #5:**  Implement the training loop by including the 1) model predictions from the batch inputs, **X**, and calculating the loss via the loss_fn using the predictions and batch outputs, **Y**.  Divide the loss by the number of predictions to get the **avg_loss**.

In [7]:
def train_loop(dataloader, model, loss_fn, optimizer, device=None):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X,Y) in enumerate(dataloader):
        if device:
            X = X.to(device)
            Y = Y.to(device)
        ### BEGIN SOLUTION
        pred = model(X)
        loss = loss_fn(pred, Y)
        avg_loss = loss / len(pred)
        ### END SOLUTION

        avg_loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if (batch+1) %100 == 0:
            avg_loss, current = avg_loss.item(), batch * batch_size + len(pred)
            print(f"Avg. loss: {avg_loss:>7f}, [current:{current:>5d}/{size:>5d}]")

    return True

assert train_loop(DataLoader(RandDataset(3,5,batch_size*100),batch_size=batch_size), NNModel(3,5,20), nn.MSELoss(reduction='sum'), optim.SGD((NNModel(3,5,20)).parameters(), lr=learning_rate))

Avg. loss: 4.258340, [current: 3200/ 3200]


**Problem #6:**  Implement the test loop by including the 1) model predictions from the batch inputs, **X**, and calculating the test loss via the loss_fn using the predictions and batch outputs, **Y**.  Divide the test_loss by the number of predictions and remember to use the .item() method to extract the scalar value.

In [8]:
def test_loop(dataloader, model, loss_fn, tolerance, device=None):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()

    test_loss = 0
    correct = 0

    with torch.no_grad():
        for (X,Y) in dataloader:
            if device:
                X = X.to(device)
                Y = Y.to(device)
            ### BEGIN SOLUTION
            pred = model(X)
            test_loss += (loss_fn(pred, Y) / len(pred)).item()
            ### END SOLUTION
            correct += ((pred - Y).abs() < tolerance).all(dim=1).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size

    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg. loss: {test_loss:>8f}\n")
    return True

assert test_loop(DataLoader(RandDataset(3,5,batch_size),batch_size=batch_size), NNModel(3,5,20), nn.MSELoss(reduction='sum'), tolerance=tolerance)

Test Error: 
 Accuracy: 0.0%, Avg. loss: 3.864772



**Implementing the epoch loop and running the training loop**

**Problem #7:**  Implement the epoch loop by using the train_loop and test_loop functions defined above.

In [9]:
def epoch_loop(epochs, train_dataloader, test_dataloader, model, loss_fn, optimizer, tolerance, device=None):
    for t in range(epochs):
        print(f"Epoch {t+1}\n------------------------------")
        ### BEGIN SOLUTION ###
        train_loop(train_dataloader, model, loss_fn, optimizer, device)
        test_loop(test_dataloader, model, loss_fn, tolerance, device)
        ### END SOLUTION ###
    print("Done")
    return True

epochs_cpu = 2
assert epoch_loop(epochs_cpu, train_dataloader, test_dataloader, my_model, my_loss_fn, my_optimizer, tolerance)

Epoch 1
------------------------------
Avg. loss: 0.466994, [current: 3200/32000]
Avg. loss: 0.275905, [current: 6400/32000]
Avg. loss: 0.173477, [current: 9600/32000]
Avg. loss: 0.155771, [current:12800/32000]
Avg. loss: 0.138271, [current:16000/32000]
Avg. loss: 0.105612, [current:19200/32000]
Avg. loss: 0.106437, [current:22400/32000]
Avg. loss: 0.107267, [current:25600/32000]
Avg. loss: 0.081672, [current:28800/32000]
Avg. loss: 0.061309, [current:32000/32000]
Test Error: 
 Accuracy: 0.0%, Avg. loss: 0.069360

Epoch 2
------------------------------
Avg. loss: 0.052960, [current: 3200/32000]
Avg. loss: 0.045673, [current: 6400/32000]
Avg. loss: 0.034378, [current: 9600/32000]
Avg. loss: 0.030830, [current:12800/32000]
Avg. loss: 0.018746, [current:16000/32000]
Avg. loss: 0.015056, [current:19200/32000]
Avg. loss: 0.011399, [current:22400/32000]
Avg. loss: 0.010450, [current:25600/32000]
Avg. loss: 0.007051, [current:28800/32000]
Avg. loss: 0.007651, [current:32000/32000]
Test Error:

## Utilizing the GPU for Training ##

In [10]:
device = torch.device('cpu')

if torch.cuda.is_available():
    device = torch.device(torch.cuda.current_device())

print(f"Using device - {device}")

Using device - cuda:0


**Problem #8:**  If a GPU device is available, move "my_model" to the device and save as "my_model_gpu".  Reinitialize the SGD optimizer using "my_model_gpu.parameters()".

In [11]:
if torch.cuda.is_available():
    ### BEGIN SOUTION
    my_model_gpu = my_model.to(device)
    my_optimizer_gpu = optim.SGD(my_model_gpu.parameters(), lr=learning_rate)
    ### END SOLUTION

    assert epoch_loop(epochs, train_dataloader, test_dataloader, my_model_gpu, my_loss_fn, my_optimizer_gpu, tolerance, device)

Epoch 1
------------------------------
Avg. loss: 0.004681, [current: 3200/32000]
Avg. loss: 0.002996, [current: 6400/32000]
Avg. loss: 0.003277, [current: 9600/32000]
Avg. loss: 0.001482, [current:12800/32000]
Avg. loss: 0.001766, [current:16000/32000]
Avg. loss: 0.001653, [current:19200/32000]
Avg. loss: 0.000958, [current:22400/32000]
Avg. loss: 0.002503, [current:25600/32000]
Avg. loss: 0.000641, [current:28800/32000]
Avg. loss: 0.000622, [current:32000/32000]
Test Error: 
 Accuracy: 42.7%, Avg. loss: 0.001426

Epoch 2
------------------------------
Avg. loss: 0.001284, [current: 3200/32000]
Avg. loss: 0.001154, [current: 6400/32000]
Avg. loss: 0.001246, [current: 9600/32000]
Avg. loss: 0.000883, [current:12800/32000]
Avg. loss: 0.000802, [current:16000/32000]
Avg. loss: 0.000592, [current:19200/32000]
Avg. loss: 0.000378, [current:22400/32000]
Avg. loss: 0.000213, [current:25600/32000]
Avg. loss: 0.000165, [current:28800/32000]
Avg. loss: 0.000207, [current:32000/32000]
Test Error

## Saving and Loading PyTorch Models ##

Reference:  The Linux Foundation, "Save and Load the Model - PyTorch Tutorials 2.6.0 +cu124 documentation," pytorch.org https://pytorch.org/tutorials/beginner/basics/saveloadrun_tutorial.html (accessed Mar. 24, 2025).

Reference:  The Linux Foundation, "Saving and Loading Models - PyTorch Tutorials 2.6.0 +cu124 documentation," pytorch.org https://pytorch.org/tutorials/beginner/saving_loading_models.html (accessed Mar. 24, 2025).

**Problem #9:**  Finish the implementation of the save_model_checkpoint funciton. Add the elements to the save_dict dictionary corresponding to keys, "model_state_dict", "optimizer_state_dict", and "epoch".  The values for these should be the .state_dict() for the model and optimizer and the epoch. Secondly, add the function to save the save_dict as a file given in file_path. Hint:  Use torch.save.

In [12]:
def save_model_checkpoint(model, optimizer, dataloader, epoch, file_path):
    dataloader_mapping = dataloader.dataset.mapping
    save_dict = dict(
        ### BEGIN SOLUTION ###
        model_state_dict = model.state_dict(), 
        optimizer_state_dict = optimizer.state_dict(), 
        epoch = epoch, 
        ### END SOLUTION ###
        dataloader_mapping = dataloader_mapping,
    )
    ### BEGIN SOLUTION ###
    torch.save(save_dict,file_path)
    ### END SOLUTION ###
    return True

from pathlib import Path
assert save_model_checkpoint(NNModel(3,5,20), optim.SGD((NNModel(3,5,20)).parameters(), lr=learning_rate), DataLoader(RandDataset(3,5,batch_size),batch_size=batch_size), 20, Path() / "test_checkpoint.pth")
assert not set(['model_state_dict','optimizer_state_dict','epoch','dataloader_mapping']) - set((torch.load(Path() / "test_checkpoint.pth", weights_only=True)).keys())
for key, val in (torch.load(Path() / "test_checkpoint.pth", weights_only=True)).items():
    if 'dict' in key:
        assert isinstance(val, dict)
    elif key == 'epoch':
        assert val == 20

**Problem #9:**  Finish the implementation of the "restore_model_checkpoint" function. Load the checkpoint file defined at "file_path" using torch.load(...).  Update the "model" and "optimizer" state_dicts from the checkpoint.  Update the "epoch" variable from the checkpoint.

In [13]:
def restore_model_checkpoint(model, optimizer, train_dataloader, test_dataloader, file_path):
    epoch = -1
    if file_path.exists():
        print(f"Restarting from checkpoint: {str(file_path)}")
        ### BEGIN SOLUTION ###
        checkpoint = torch.load(file_path, weights_only=True)
        model.load_state_dict(checkpoint['model_state_dict'])
        optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
        epoch = checkpoint['epoch']
        ### END SOLUTION
        train_dataloader.dataset.mapping = checkpoint['dataloader_mapping']
        test_dataloader.dataset.mapping = checkpoint['dataloader_mapping']
    return epoch

test_model = NNModel(3,5,20)
test_optim = optim.SGD(test_model.parameters(), lr=learning_rate)
assert restore_model_checkpoint(test_model,test_optim,DataLoader(RandDataset(3,5,batch_size),batch_size=batch_size),DataLoader(RandDataset(3,5,batch_size),batch_size=batch_size),Path() / "test_checkpoint.pth") == 20
(Path() / "test_checkpoint.pth").unlink()

Restarting from checkpoint: test_checkpoint.pth


**Problem #10:**  Reimplement the "epoch_loop" with the restore_model_checkpoint and save_model_checkpoint functions. The epoch returned by restore_model_checkpoint should be saved to the "epoch_last" variable.

In [14]:
def epoch_loop(epochs, train_dataloader, test_dataloader, model, loss_fn, optimizer, tolerance, device=None, file_path=None):
    if file_path:
        ### BEGIN SOLUTION ###
        epoch_last = restore_model_checkpoint(model, optimizer, train_dataloader, test_dataloader, file_path)
        ### END SOLUTION ###
    for t in range(epoch_last+1,epochs):
        print(f"Epoch {t+1}\n------------------------------")
        train_loop(train_dataloader, model, loss_fn, optimizer, device)
        test_loop(test_dataloader, model, loss_fn, tolerance, device)
        if file_path:
            ### BEGIN SOLUTION ###
            save_model_checkpoint(model, optimizer, train_dataloader, t, file_path)
            ### END SOLUTION ###
    print("Done")
    return True

use_model = NNModel(input_dims, output_dims, 20).to(device)
use_optimizer = optim.SGD(use_model.parameters(), lr=learning_rate)
my_epochs = 3
assert epoch_loop(my_epochs, train_dataloader, test_dataloader, use_model, my_loss_fn, use_optimizer, tolerance, device, Path()/"my_checkpoint_file.pth")
if torch.cuda.is_available():
    my_epochs = epochs
else:
    my_epochs = 6
assert epoch_loop(my_epochs, train_dataloader, test_dataloader, use_model, my_loss_fn, use_optimizer, tolerance, device, Path()/"my_checkpoint_file.pth")
(Path()/"my_checkpoint_file.pth").unlink()

Epoch 1
------------------------------
Avg. loss: 0.688889, [current: 3200/32000]
Avg. loss: 0.300688, [current: 6400/32000]
Avg. loss: 0.207974, [current: 9600/32000]
Avg. loss: 0.161903, [current:12800/32000]
Avg. loss: 0.165507, [current:16000/32000]
Avg. loss: 0.160524, [current:19200/32000]
Avg. loss: 0.169589, [current:22400/32000]
Avg. loss: 0.153118, [current:25600/32000]
Avg. loss: 0.108491, [current:28800/32000]
Avg. loss: 0.100971, [current:32000/32000]
Test Error: 
 Accuracy: 0.0%, Avg. loss: 0.111498

Epoch 2
------------------------------
Avg. loss: 0.103993, [current: 3200/32000]
Avg. loss: 0.093950, [current: 6400/32000]
Avg. loss: 0.069131, [current: 9600/32000]
Avg. loss: 0.050902, [current:12800/32000]
Avg. loss: 0.056272, [current:16000/32000]
Avg. loss: 0.042546, [current:19200/32000]
Avg. loss: 0.035998, [current:22400/32000]
Avg. loss: 0.028343, [current:25600/32000]
Avg. loss: 0.022143, [current:28800/32000]
Avg. loss: 0.023384, [current:32000/32000]
Test Error: