<a href="https://colab.research.google.com/github/wisdomscode/AI-Lab-Deep-Learning-PyTorch/blob/main/AI_Lab_4_Multiclass_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multiclass Classification

**Summary:** In this lesson, we'll work with the full wildlife dataset, which has eight classes. This is more than the network in the previous notebook can handle. Here we'll build and train a more complicated neural network, called a Convolutional Neural Network, that is meant for working with images. We'll use this network to get the predictions we need for the competition at [DrivenData.org](https://www.drivendata.org/competitions/87/competition-image-classification-wildlife-conservation/).

**Objectives:**

* Read in data with multiple classes
* Normalize our data to improve performance
* Create a Convolutional Neural Network that works well with images
* Train that network to do multiclass classification
* Reformat the network predictions to complete the competition

**New Terms:**

* Multiclass
* Normalize
* Convolution
* Max pooling

# The Competition

As a reminder, the data we're working with came from a competition at [DrivenData.org](https://www.drivendata.org/competitions/87/competition-image-classification-wildlife-conservation/). The goal of the competition is to build a model that takes an image and classifies what animal is in it. There are seven animals, plus a 'blank' where no animal is present in the image.

So far, we have

* Read in image data
* Loaded that data in PyTorch
* Built a neural network
* Used that neural network to do binary classification

We're almost ready for the competition now. We need to expand our network to handle all eight categories. We could do this with the simple network we've already built, but it would perform poorly. Instead, we're going to build a more complex network that's meant for working with images. This is called a Convolutional Neural Network, and involves arranging the neurons in a different pattern.

Once we have built and trained this network, its output will be what the competition is looking for. We'll get our predictions and save them into the requested format.

# Getting Started

we'll import the packages we'll need in this notebook.

In [3]:
import os
import sys
from collections import Counter

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import PIL
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix
from torch.utils.data import DataLoader, random_split
from torchinfo import summary
from torchvision import datasets, transforms
from tqdm.notebook import tqdm

torch.backends.cudnn.deterministic = True

In [2]:
pip install torchinfo

Collecting torchinfo
  Downloading torchinfo-1.8.0-py3-none-any.whl.metadata (21 kB)
Downloading torchinfo-1.8.0-py3-none-any.whl (23 kB)
Installing collected packages: torchinfo
Successfully installed torchinfo-1.8.0


Let's print out the versions of our packages again. If we come back to this later, we'll know what we used.

In [5]:
print("Platform:", sys.platform)
print("Python version:", sys.version)
print("---")
print("matplotlib version:", matplotlib.__version__)
print("pandas version:", pd.__version__)
print("PIL version:", PIL.__version__)
print("torch version:", torch.__version__)
print("torchvision version:", torchvision.__version__)



Platform: linux
Python version: 3.11.11 (main, Dec  4 2024, 08:55:07) [GCC 11.4.0]
---
matplotlib version: 3.10.0
pandas version: 2.2.2
PIL version: 11.1.0
torch version: 2.5.1+cu124
torchvision version: 0.20.1+cu124


We should be running on GPUs, so the device should be `cuda`.

In [6]:
if torch.cuda.is_available():
    device = "cuda"
elif torch.backends.mps.is_available():
    device = "mps"
else:
    device = "cpu"

print(f"Using {device} device.")

Using cpu device.


# Reading files

We'll need to read in our data. Since we'll be using images once again, we'll need to convert them to something our network can understand. To start with, we'll use the same set of transformations we used in the previous notebook.

These transformations are

* Convert any grayscale images to RGB format with a custom class
* Resize the image, so that they're all the same size (we chose
224 x 224, but other sizes would work as well)
* Convert the image to a Tensor of pixel values

This should result in each image becoming a Tensor of size
3 x 224 x 224. We'll check this once we read in the data.

In [7]:
class ConvertToRGB:
    def __call__(self, img):
        if img.mode != "RGB":
            img = img.convert("RGB")
        return img

In [8]:
transform = transforms.Compose(
    [
        ConvertToRGB(),
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
    ]
)

In the previous notebook, we were working with only two categories. That data was in the data_binary subdirectory. Here we'll work with all eight categories, in the data_multiclass subdirectory. Let's load that data. We will follow the same pattern we used in the last notebook.

**Task 1.4.1:** Assign the path to the multi-class training data to train_dir. Then use the ImageFolder tool to open those files and apply our transforms.

In [None]:
data_dir = 'data_p1/data_multiclass/'
train_dir = os.path.join(data_dir, 'train')

print("Will read data from", train_dir)

#output
Will read data from data_p1/data_multiclass/train

In [None]:
dataset = datasets.ImageFolder(root=train_dir, transform=transform)

Now that we have our data, let's verify that we got what we wanted. We should have classes for each of the seven animals, and one 'blank' for when there wasn't an animal in the image. Additionally, the tensors we get should be of size 3 x 224 x 224.

In [None]:
print("Classes:")
print(dataset.classes)
print(f"That's {len(dataset.classes)} classes")
print()
print("Tensor shape for one image:")
print(dataset[0][0].shape)

#output
Classes:
['antelope_duiker', 'bird', 'blank', 'civet_genet', 'hog', 'leopard', 'monkey_prosimian', 'rodent']
That's 8 classes

Tensor shape for one image:
torch.Size([3, 224, 224])

In principle, we could work with the data like this. But PyTorch is expecting the data to be broken into batches with a DataLoader. This prevents PyTorch from trying to load all of the files into memory at once, which would cause our notebook to crash. Instead, it loads just a few (the batch_size), works with them, then discards them. Since all the tools are expecting it, we should convert ours. The batch size to work with will depend on our system, but something in the
20 to 100 range is usually fine. We'll pick 32.

In [None]:
batch_size = 32
dataset_loader = DataLoader(dataset, batch_size=batch_size)

# Get one batch
first_batch = next(iter(dataset_loader))

print(f"Shape of one batch: {first_batch[0].shape}")
print(f"Shape of labels: {first_batch[1].shape}")

#output
Shape of one batch: torch.Size([32, 3, 224, 224])
Shape of labels: torch.Size([32])

When we loop over this loader, it'll produce small batches of our images. This is what we want — these are the "minibatches" that will speed up our computations. In our case, each batch is
32 images, with each image
3 x 224 x 224
. It also provides us with the labels for the correct answers. This is the information we need to train a network.

# Prepare Our Data

As we were reading in the data, we already did some preparation. Our images are all the same shape, and have been converted to tensors. But neural networks tend to perform best with data that has a mean of 0 and a standard deviation of 1. Data that has that property is called *normalized*. In our case, that would be the mean and standard deviation of all of the pixels in all of the images.

Let's see what they are for our data. Here's a function that computes the mean and standard deviation for each color channel (red, green, and blue) separately. It takes in a DataLoader and returns the mean and standard deviation of each channel.