<a href="https://colab.research.google.com/github/willdphan/pet-classifier-cnn/blob/main/Pet_Classifier_CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pet Classifier CNN


This code implements a deep learning model based on the `ResNet-50` architecture for image classification. The model is trained and tested on a dataset containing images of cats and dogs. The training process involves iterating over mini-batches of images, calculating losses, and updating the model parameters using the `Adam optimizer`. After training, the model is evaluated on a separate test dataset to measure its accuracy.

During the testing phase, the code loads the test dataset and iterates over the images. Each image is passed through the trained model, which predicts whether the image contains a cat or a dog. The predicted class (cat or dog) is then printed for each image.

> ## Steps to CNN

>>[Import Libraries](#scrollTo=ZKJON9aGFwmG)

>>[Set Device and Retrieve Data](#scrollTo=iOdOihFWFsf_)

>>[Split Training Dataset](#scrollTo=Dhc5MmpQHE_7)

>>[Preprocess with ImageLoader Function](#scrollTo=iPJdkKQocVwk)

>>[Transform Datasets and Load into Preprocessor](#scrollTo=KXPGe3-Fke97)

>>[Load Preprocessed Datasets into DataLoader for Batch Training](#scrollTo=coFdrtxtlYKP)

>>[Load Pre-Trained Model](#scrollTo=j1heu6iUm1rk)

>>[Load Loss Function and Optimizer](#scrollTo=ZiuaaZMrIO5Y)

>>[Train Function](#scrollTo=3s8QcW8fsTao)

>>[Test Function](#scrollTo=QMQDEmdRtYzG)

>>[Save checkpoint](#scrollTo=_HMqTmiIJ4o7)

>>[Preprocess Untouched Testing Data and Get Results](#scrollTo=rExehwpHKPnQ)

>>[Create Function to Try with Random Images](#scrollTo=birF5CVvKt1L)



### Import Libraries


In [8]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import torch
import torchvision
import torch.nn as nn # All neural network modules, nn.Linear, nn.Conv2d, BatchNorm, Loss functions
import torchvision.datasets as datasets # Has standard datasets we can import in a nice way
import torchvision.transforms as transforms # Transformations we can perform on our dataset
import torch.nn.functional as F # All functions that don't have any parameters
from torch.utils.data import DataLoader, Dataset # Gives easier dataset managment and creates mini batches
from torchvision.datasets import ImageFolder
import torch.optim as optim # For all Optimization algorithms, SGD, Adam, etc.
from PIL import Image
from sklearn.model_selection import train_test_split
import zipfile

### Set Device and Retrieve Data

In [9]:
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [10]:
zip_path = '/content/archive.zip'
extract_path = '/content/sample_data'

with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_path)

### Split Training Dataset

Notice how we are splitting the training dataset and not the testing dataset. The untouched testing dataset will be used to compare to training predictions.

In [11]:
dataset = ImageFolder("/content/sample_data/training_set/training_set")

In [12]:
from numpy.random.mtrand import RandomState
train_data, test_data, train_label, test_label = train_test_split(dataset.imgs, dataset.targets, test_size=0.2, random_state=42)

### Preprocess with ImageLoader Function

`init` function filters the original dataset and remove any images that are not in the RGB color channel format using `checkChannel` within the `ImageLoader` class.
`ImageLoader` is used to load or preprocess the images for testing/training.

In [13]:
class ImageLoader(Dataset):
    # note that the transform method is optional
    def __init__(self, dataset, transform=None):
        self.dataset = self.checkChannel(dataset) # some images are CMYK, Grayscale, check only RGB
        self.transform = transform

    # get total number of images in dataset
    def __len__(self):
        return len(self.dataset)

    # allows indexing and retrieving items from the dataset
    def __getitem__(self, item):
        # retrieves the item (contains image path and class category) at the given index from the dataset
        # the 0 retrieves the image path (file name)
        image = Image.open(self.dataset[item][0])
        # retrieves the class category as second item
        classCategory = self.dataset[item][1]
        # if transform is not 'none' it applies transformation and
        # returns transformed image and class category as tuple
        if self.transform:
            image = self.transform(image)
        return image, classCategory

    # filters out images in the dataset that are not in RGB format
    # returns a new dataset that only contains RGB images
    def checkChannel(self, dataset):
        # initialize array of RBD formatted images
        datasetRGB = []
        # loops over images in dataset
        for index in range(len(dataset)):
            # checks if bands contain RGB, if so then add to datasetRGB[]
            if (Image.open(dataset[index][0]).getbands() == ("R", "G", "B")): # Check Channels
                datasetRGB.append(dataset[index])
        return datasetRGB

### Transform Datasets and Load into Preprocessor

The values for Normalize are `[0.5]*3` because it's used to create a list of three elements that represent 3 color channels (R, G, B). For each color channel (R, G, B) in the image, the mean and standard deviation values of `0.5` are used. This means that the pixel values of each channel will be subtracted by `0.5` and then divided by `0.5` during the normalization process. This centers the pixel values around zero and scales them to a range of approximately -1 to 1.

In [14]:
train_transform = transforms.Compose([                                         # mean   # std dev
    transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize([0.5]*3, [0.5]*3)
]) # train transform

test_transform = transforms.Compose([
    transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize([0.5]*3, [0.5]*3)
]) # test transform

train_dataset = ImageLoader(train_data, train_transform)
test_dataset = ImageLoader(test_data, test_transform)

### Load Preprocessed Datasets into DataLoader for Batch Training

`DataLoader` enables batches of data and whether data should be shuffled during the training or testing process. Used with transformed and preprocessed datasets.

In [15]:
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=True)

### Load Pre-Trained Model

Load the pre-trained ResNet model. The loop iterates over all parameters in the model and sets their `requires_grad` attribute to False.

By doing so, it freezes the parameters of the pre-trained layers, meaning their weights will not be updated during the training process - This is a common practice in transfer learning to prevent the gradients from propagating through these layers.

`model.fc = nn.Linear(num_ftrs, 2)` replaces the last fully connected layer (`model.fc`) with a new `nn.Linear` layer. The new layer has `num_ftrs` input features (matching the previous layer's output features) and 2 output features, indicating a classification task with 2 classes (dog and cat). By replacing the classifier layer, we can adapt the model to the our classification problem.

In [16]:
# display progress bars
from tqdm import tqdm
# import torch models
from torchvision import models
# load pretrained resnet model and modify
model = models.resnet50(pretrained=True)

# freeze params of pre-trained layers
for param in model.parameters():
    param.requires_grad = False

# replace last fc layer of pre-trained model
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 2)

# model to device
model.to(device)

Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
100%|██████████| 97.8M/97.8M [00:00<00:00, 239MB/s]


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

### Load Loss Function and Optimizer

In [17]:
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

### Train Function

The `train` function is responsible for training a deep learning model.

1. Initializes a list to track the losses during training.
2. Sets the model in training mode.
3. Creates a progress bar to visualize the training progress.
4. Iterates over the data batches.
  *   Moves data and targets to the device for computation.
  *   Computes predicted scores using the model.
  *   Calculates the loss between the scores and targets.
  *   Resets gradients to zero.
  *   Performs backpropagation and updates the model's parameters.
  *   Obtains predicted class labels.
  *   Updates the progress bar and displays the current loss.
5. Saves the model's state and optimizer's state at the end of each epoch.

TLDR; The `train` function trains the model by iteratively updating its parameters based on computed gradients, tracking the losses, and saving the model's progress for future use.

In [18]:
# Train Function
# use train function over number of epoch with model
def train(num_epoch, model):
    for epoch in range(0, num_epoch):
#         current_loss = 0.0
#         current_corrects = 0

        losses = [] # keep track of total losses
        model.train() # set model in training mode
        loop = tqdm(enumerate(train_loader), total=len(train_loader)) # create a progress bar

        for batch_idx, (data, targets) in loop:
            data = data.to(device=device)
            targets = targets.to(device=device)
            scores = model(data)

            loss = criterion(scores, targets) # calculate loss with CE
            optimizer.zero_grad() # reset gradient params to 0
            losses.append(loss) # add the loss to the total loss
            loss.backward() # computes the gradients of the loss with respect to the model's params with backpropagation
            optimizer.step() # updates the model's params with an optimization step based on the computed gradients
            _, preds = torch.max(scores, 1) # returns max value and index associated with value

            # updates progress bar description with current epoch and batch progress
            loop.set_description(f"Epoch {epoch+1}/{num_epoch} process: {int((batch_idx / len(train_loader)) * 100)}")
            # updates progress bar with current loss value for batch
            loop.set_postfix(loss=loss.data.item())

        # saves the model's state, model's parameters and optimizer's state
        # saves the model's progress at the end of each epoch.
        torch.save({
                    'model_state_dict': model.state_dict(),
                    'optimizer_state_dict': optimizer.state_dict(),
                    }, 'checpoint_epoch_'+str(epoch)+'.pt')

### Test Function

`model.eval()` is a kind of switch for some specific layers/parts of the model that behave differently. During training and inference (evaluating) time. For example, Dropouts Layers, BatchNorm Layers etc. `.eval()` turns them off them during model evaluation

In addition, the common practice for evaluating/validation is using `torch.no_grad()` in pair with `model.eval()` to turn off gradients computation.

Format of `test()` is similar to `train()`.

In [19]:
def test():
    # set to evaluation mode and initialize total test loss and correct
    model.eval()
    test_loss = 0
    correct = 0

    with torch.no_grad():
        for x, y in test_loader:
            x = x.to(device)
            y = y.to(device)
            # output of model
            output = model(x)
            # Get Prediction number and Class (Dog/Cat)
            _, predictions = torch.max(output, 1)
            # update total correct
            correct += (predictions == y).sum().item()
            # set the test loss
            test_loss = criterion(output, y)

    # calc avg loss per sample in test dataset
    test_loss /= len(test_loader.dataset)
    # average loss and accuracy of the model on the test dataset
    print("Average Loss: ", test_loss, "  Accuracy: ", correct, " / ",
    len(test_loader.dataset), "  ", int(correct / len(test_loader.dataset) * 100), "%")

Call the train and test function. By using the if `__name__ == "__main__"` condition, you ensure that these functions are only executed when the script is run directly.

In [20]:
if __name__ == "__main__":
    train(5, model) # train
    test() # test

Epoch 1/5 process: 99: 100%|██████████| 101/101 [00:49<00:00,  2.04it/s, loss=3.96]
Epoch 2/5 process: 99: 100%|██████████| 101/101 [00:42<00:00,  2.36it/s, loss=1.49e-7]
Epoch 3/5 process: 99: 100%|██████████| 101/101 [00:40<00:00,  2.48it/s, loss=0.00606]
Epoch 4/5 process: 99: 100%|██████████| 101/101 [00:40<00:00,  2.47it/s, loss=3.55e-6]
Epoch 5/5 process: 99: 100%|██████████| 101/101 [00:41<00:00,  2.41it/s, loss=0.00484]


Average Loss:  tensor(0.0004, device='cuda:0')   Accuracy:  1523  /  1601    95 %


### Save checkpoint

The provided code snippet loads a checkpoint file named `"checpoint_epoch_4.pt"`. It attempts to load the saved state of a model and its corresponding optimizer from the checkpoint. The model's state is loaded using model.`load_state_dict(checkpoint["model_state_dict"])`, and the optimizer's state is loaded using `optimizer.load_state_dict(checkpoint["optimizer_state_dict"])`. After loading the checkpoint, the message "Loading checkpoint" is printed to indicate that the process has been completed.

In [21]:
checkpoint = torch.load("./checpoint_epoch_4.pt") # Try to load last checkpoint
model.load_state_dict(checkpoint["model_state_dict"])
optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
print("...Loading checkpoint")

...Loading checkpoint


### Preprocess Untouched Testing Data and Get Results

Preprocess the testing dataset and load it into the DataLoader. Set the model to evaluation mode to avoid modifying weights and get results for untouched testing dataset.

TLDR; test the model with the testing data. Track total Dogs and Cats.

In [23]:
dataset = ImageFolder("/content/sample_data/test_set",
                     transform=transforms.Compose([
                         transforms.Resize((224, 224)),
                         transforms.ToTensor(),
                         transforms.Normalize([0.5]*3, [0.5]*3)
                     ]))
print(dataset)
dataloader = DataLoader(dataset, batch_size=1, shuffle = False)

Dataset ImageFolder
    Number of datapoints: 2023
    Root location: /content/sample_data/test_set
    StandardTransform
Transform: Compose(
               Resize(size=(224, 224), interpolation=bilinear, max_size=None, antialias=warn)
               ToTensor()
               Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
           )


In [27]:
# for j, (data, labels) in enumerate(dataloader):
with torch.no_grad():
    model.eval()
    Dogs = 0
    Cats = 0
    for data, target in dataloader:
        data, target = data.to(device), target.to(device)
        output = model(data)
        _, predicted = torch.max(output, 1)
        if predicted[0] == 1:
          Dogs += 1
        else:
          Cats +=1
    print(f"Total Dogs: {Dogs}")
    print(f"Total Cats: {Cats}")

Total Dogs: 1092
Total Cats: 931


### Create Function to Try with Random Images

Takes a filepath/file name as input. The function opens an image file and converts it to RGB format.

It then applies a series of transformations, including resizing the image to 224x224 pixels, converting it to a tensor, and normalizing the pixel values. The transformed image is stored in a tensor and wrapped with a dimension of size one.

The function uses a `DataLoader` to load the transformed image tensor. It then iterates over the `DataLoader`, passing each batch of data to the model for prediction. The predicted class labels are obtained by finding the maximum value along the predicted scores axis. The predicted class label is printed as either "Dog" or "Cat" based on the value of `preds[0]`.

Overall, the function performs image preprocessing, passes the preprocessed image through a model for prediction, and prints the predicted class label.

In [26]:
def RandomImagePrediction(filepath):
    # opens file and converts to RGB
    img_array = Image.open(filepath).convert("RGB")
    # set transformation var
    data_transforms=transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize([0.5]*3, [0.5]*3)
    ])

    # set new transformed image with above tranformation var
    img = data_transforms(img_array).unsqueeze(dim=0) # Returns a new tensor with a dimension of size one inserted at the specified position.
    # use DataLoader of img to help batch for training
    load = DataLoader(img)

    # loops over img DataLoader
    for x in load:
        x= x.to(device)
        # get result of model
        pred = model(x)
        # get result and class
        _, preds = torch.max(pred, 1)

        # print the class - Dog or Cat
        print(f"Class (Dog/Cat): {preds}")

        # if 1, then it's Dog, if 0, then it's Cat
        if preds[0] == 1:
          print(f"Prediction: Dog")
        else:
          print(f"Prediction: Cat")

In [28]:
if __name__ == "__main__":
    RandomImagePrediction("/content/sample_data/dog-puppy-on-garden-royalty-free-image-1586966191.jpg") # dog image

Class (Dog/Cat): tensor([1], device='cuda:0')
Prediction: Dog
