## **Why do we need to Save and Load Models**
Saving and loading PyTorch models is a crucial aspect of deep learning projects. It allows you to preserve trained models for future use, share them with others, or continue training on them without starting from scratch. In this comprehensive guide, we will explore the techniques and best practices for saving and loading PyTorch models. Whether you are a machine learning practitioner, researcher, or enthusiast, understanding how to manage your models is essential for efficient and effective deep learning workflows. Let's dive into the world of model persistence and continuity with PyTorch.

In [1]:
import torch              # Importing the PyTorch library
import torch.nn as nn     # Importing the PyTorch neural network module
import numpy as np        # Importing NumPy for numerical operations
from torch.utils.data import Dataset, DataLoader  # Importing PyTorch data utilities
!pip install torch_summary  # Installing the torch_summary package for model summary
from torchsummary import summary  # Importing the summary function from torch_summary
from torch.optim import SGD  # Importing the stochastic gradient descent optimizer from PyTorch
import time  # Importing the time module for measuring time intervals

Collecting torch_summary
  Downloading torch_summary-1.4.5-py3-none-any.whl (16 kB)
Installing collected packages: torch_summary
Successfully installed torch_summary-1.4.5


In [2]:
# Determine the compute device (CPU or CUDA GPU) availability
# and set the 'device' variable accordingly.
device = "cuda" if torch.cuda.is_available() else "cpu"
# 'device' will be "cuda" if a GPU is available, otherwise, it will be "cpu"
print(device)

cpu


## **Data Preprocessing**

In [3]:
class DataProcessing(Dataset):
    def __init__(self, X, y):
        """
        Initialize the dataset with input data (X) and labels (y).

        Args:
        - X: Input data as a NumPy array.
        - y: Labels as a NumPy array.
        """
        self.X = torch.tensor(X).float().to(device)  # Convert and move input data to the specified device (CPU or GPU)
        self.y = torch.tensor(y).float().to(device)  # Convert and move labels to the specified device (CPU or GPU)

    def __getitem__(self, index):
        """
        Get a single data sample and its corresponding label by index.

        Args:
        - index: Index of the sample to retrieve.

        Returns:
        A tuple containing the data sample and its label.
        """
        return self.X[index], self.y[index]

    def __len__(self):
        """
        Get the total number of data samples in the dataset.

        Returns:
        The length of the dataset.
        """
        return len(self.X)

In the below code, we generate a random dataset 'X' consisting of 10 samples, each with 2 features, where the values range from 0 to 10. We then calculate the sum of each row in 'X' to obtain row-wise sums, storing them in 'y' after reshaping to form a column vector. The resulting 'X' holds the original random integer data, while 'y' contains the computed row-wise sums. This data preprocessing step is often a crucial initial stage in many machine learning tasks, as it prepares the data for further analysis or modeling.

In [4]:
# Generate random integer data in the range [0, 10) with a shape of (10, 2)
X = np.random.randint(0, 10, size=(10, 2))

# Calculate the sum of each row along axis=1 to get row-wise sums
row_sums = np.sum(X, axis=1)

# Reshape the 'row_sums' array to have a column shape, effectively converting it to a column vector
y = row_sums.reshape(-1, 1)

# 'X' contains the original random integer data (10 samples, 2 features)
# 'y' contains the row-wise sums of 'X' as a column vector
X, y

(array([[0, 4],
        [6, 8],
        [0, 9],
        [6, 2],
        [8, 9],
        [4, 1],
        [8, 0],
        [4, 2],
        [6, 8],
        [3, 6]]),
 array([[ 4],
        [14],
        [ 9],
        [ 8],
        [17],
        [ 5],
        [ 8],
        [ 6],
        [14],
        [ 9]]))

In [5]:
# Create an instance of the custom dataset 'DataProcessing' with input data 'X' and labels 'y'
dp = DataProcessing(X, y)

# Access the second data sample and its corresponding label (index 1)
sample_1, label_1 = dp[1]

# Access the first data sample and its corresponding label (index 0)
sample_0, label_0 = dp[0]

print("Sample 1:", sample_1)
print("Label 1:", label_1)
print("Sample 0:", sample_0)
print("Label 0:", label_0)

Sample 1: tensor([6., 8.])
Label 1: tensor([14.])
Sample 0: tensor([0., 4.])
Label 0: tensor([4.])


In [6]:
# Create a DataLoader named 'dloader' with the custom dataset 'dp'
# Using a batch size of 2 and enabling data shuffling
dloader = DataLoader(dp, batch_size=2, shuffle=True)

## **Creating Model**

In [7]:
# Create a neural network model using a sequential architecture
model = nn.Sequential(
    nn.Linear(2, 5),  
    nn.ReLU(),        
    nn.Linear(5, 1)  
).to(device)  

In [8]:
# Generate a summary of the model's architecture by passing a sample input of shape (1, 2)
summary(model, torch.zeros(1, 2))

Layer (type:depth-idx)                   Output Shape              Param #
├─Linear: 1-1                            [-1, 5]                   15
├─ReLU: 1-2                              [-1, 5]                   --
├─Linear: 1-3                            [-1, 1]                   6
Total params: 21
Trainable params: 21
Non-trainable params: 0
Total mult-adds (M): 0.00
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00


Layer (type:depth-idx)                   Output Shape              Param #
├─Linear: 1-1                            [-1, 5]                   15
├─ReLU: 1-2                              [-1, 5]                   --
├─Linear: 1-3                            [-1, 1]                   6
Total params: 21
Trainable params: 21
Non-trainable params: 0
Total mult-adds (M): 0.00
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00

In [9]:
# Define a loss function (Mean Squared Error) and an optimizer (Stochastic Gradient Descent) for the model
loss_fn = nn.MSELoss()  # Mean Squared Error loss function
optimizer = SGD(model.parameters(), lr=0.001)  # Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.001

## **Model Training**

In [10]:
# Initialize an empty list to store loss values
loss_history = []

# Record the start time for training
start = time.time()

# Iterate through 50 epochs
for epoch in range(50):
    # Iterate through mini-batches from the data loader
    for X, y in dloader:
        optimizer.zero_grad()       # Zero the gradients
        y_pred = model(X)           # Forward pass to get predictions
        loss_value = loss_fn(y_pred, y)  # Calculate the loss
        loss_value.backward()       # Backpropagation to compute gradients
        optimizer.step()            # Update model parameters using the optimizer
        loss_history.append(loss_value.item())  # Append the loss value to the history

# Record the end time for training
end = time.time()

# Print the total training time
print(f"Total Training Time = {end - start} seconds")

Total Training Time = 0.16399168968200684 seconds


## **Saving Model**
In this code, we define a save_path where the model's state dictionary will be saved using torch.save. After saving the model, we use the du command to calculate and print the size of the saved model in megabytes (MB). The os.path.getsize function is used to determine the file size in bytes, which is then converted to MB for readability. This code helps to assess the storage requirements of the saved model.

In [11]:
# Define the path for saving the model
save_path = "mymodel.pth"

# Save the model's state_dict to the specified path
torch.save(model.state_dict(), save_path)

# Calculate and print the size of the saved model on disk
import os
model_size = os.path.getsize(save_path)
print(f"Size of the model on disk: {model_size / (1024 * 1024):.2f} MB")

Size of the model on disk: 0.00 MB


## **Loading Model**
Now we will load the same model that we just saved in the previous code block.

In [12]:
# Define the path for loading the pre-trained model
load_path = "mymodel.pth"

# Load the pre-trained model's state_dict into the model
model.load_state_dict(torch.load(load_path))

<All keys matched successfully>

## **Prediction Using the Loaded Model**
In this code, we generate random test data 'X_test' with the same characteristics as the training data. We calculate row-wise sums and reshape them to form 'y_test.' Then, we convert 'X_test' and 'y_test' to PyTorch tensors and move them to the specified device. Finally, we display the converted test data and labels. This prepares the test data for evaluation with our loaded model.

In [13]:
# Generate random integer test data in the range [0, 10) with a shape of (10, 2)
X_test = np.random.randint(0, 10, size=(10, 2))

# Calculate the row-wise sum along axis=1
row_sums = np.sum(X_test, axis=1)

# Reshape the row_sums array to have a column shape
y_test = row_sums.reshape(-1, 1)

# Convert the test data and labels to PyTorch tensors and move them to the specified device (CPU or GPU)
X_test = torch.tensor(X_test).float().to(device)
y_test = torch.tensor(y_test).float().to(device)

# Display the converted test data and labels
X_test, y_test

(tensor([[3., 4.],
         [2., 7.],
         [3., 0.],
         [6., 1.],
         [9., 6.],
         [2., 5.],
         [7., 8.],
         [2., 1.],
         [5., 6.],
         [5., 0.]]),
 tensor([[ 7.],
         [ 9.],
         [ 3.],
         [ 7.],
         [15.],
         [ 7.],
         [15.],
         [ 3.],
         [11.],
         [ 5.]]))

In [14]:
# Make predictions on the test data using the trained model
y_test_pred = model(X_test)

# Round the predictions to the nearest integer and convert to torch.int dtype
y_test_pred = y_test_pred.round().to(dtype=torch.int)

# Display the predicted labels for the test data
y_test_pred

tensor([[ 7],
        [ 9],
        [ 3],
        [ 7],
        [15],
        [ 7],
        [15],
        [ 3],
        [11],
        [ 5]], dtype=torch.int32)

## **Accuracy**

In [15]:
def calculate_accuracy(predicted_labels, true_labels):
    # Convert predicted and true labels to numpy arrays if they are tensors
    if torch.is_tensor(predicted_labels):
        predicted_labels = predicted_labels.cpu().numpy()
    if torch.is_tensor(true_labels):
        true_labels = true_labels.cpu().numpy()

    # Calculate accuracy
    num_correct = (predicted_labels == true_labels).sum()
    total_samples = len(true_labels)
    accuracy = num_correct / total_samples

    return accuracy

In [16]:
accuracy = calculate_accuracy(y_test_pred, y_test)
print(f"Accuracy: {accuracy*100}%")

Accuracy: 100.0%


## **Closing Thoughts**
In this notebook, we explored the essential concepts of saving and loading PyTorch models, a critical aspect of deep learning and machine learning workflows. We learned how to save the trained model's state dictionary to disk, enabling us to reuse and deploy models for various applications. Loading a pre-trained model allows us to continue training or make predictions on new data efficiently. Understanding these techniques empowers us to harness the full potential of PyTorch for building and deploying deep learning models in real-world scenarios. As you continue your journey in machine learning and deep learning, the ability to save and load models will prove invaluable, enabling you to unlock the power of your trained models for practical applications.

## **Learn More**

#### **Here are some of my contributions so far:**
====================================
- **[Mastering PyTorch Tensors: Fundamentals to Advance](https://www.kaggle.com/code/tanvirnwu/mastering-pytorch-tensors-fundamentals-to-advance)**
- **[Comprehensive Guide on NumPy for Beginners](https://www.kaggle.com/code/tanvirnwu/comprehensive-guide-on-numpy-for-beginners#Learn-More)**
- **[Boolean in Python with Example and Explanation](https://www.kaggle.com/code/tanvirnwu/boolean-in-python-with-example-and-explanation)**
- **[Dictionary in Python with Examples & Explanations](https://www.kaggle.com/code/tanvirnwu/dictionary-in-python-with-examples-explanations)**
- **[List in Python for Beginners](https://www.kaggle.com/code/tanvirnwu/list-in-python-for-beginners)**
- **[A Brief Introduction of Graph Neural Network (GNN): Concepts, Types, and Uses](https://www.kaggle.com/discussions/general/449125#2493256)**
- **[Essential Python Libraries for Data Visualization](https://www.kaggle.com/discussions/getting-started/450857)**

#### **Thank you!!**