# Dog and Cat Detection

- Torchvision provides several pre-trained object detection models, such as faster r-cnn and SSD. 
- These models have been pre-trained on the MS COCO image dataset, which contains 91 unique classes such as person, bicycle and traffic light to name a few.
- We will use faster r-cnn model. 

##### During training, the model expects the following inputs:
- Images as a list of torch tensors.
- Targets as a list of dictionaries containing the bounding box coordinates (x1, y1, x2, y2) as well as the labels.

##### During inference, the model takes a list of tensors as input, and returns a list of dictionaries with the following keys:

- boxes: the predicted bounding boxes of any detected objects,
- labels: the predicted object labels,
- scores: the prediction confidence scores which range from 0–100%.

#### Outline

1. download and prepare the training images and annotations from Kaggle,
2. build the Datasets and Dataloaders required for inputting data into a torchvision model,
3. download a pre-trained faster r-cnn model, and modify it to detect only dogs and cats out of the original 91 classes,
4. perform transfer learning on the model using the downloaded dataset

In [1]:
import os
import time
import numpy as np
import torch
import torchvision

In [4]:
import matplotlib.pyplot as plt
import matplotlib.patches as patches

import PIL

https://pypi.org/project/torch-snippets/
- torch snippets does a lot of default importing for you
- Whether it is numpy, pandas, matplotlib or the useful functions that are mentioned below Simply call

In [5]:
!pip install torch-snippets

Collecting torch-snippets
  Downloading torch_snippets-0.499.9-py3-none-any.whl (54 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.4/54.4 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
Collecting typing
  Downloading typing-3.7.4.3.tar.gz (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.6/78.6 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
Collecting xmltodict
  Downloading xmltodict-0.13.0-py2.py3-none-any.whl (10.0 kB)
Collecting loguru
  Downloading loguru-0.6.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: typing
  Building wheel for typing (setup.py) ... [?25ldone
[?25h  Created wheel for typing: filename=typing-3.7.4.3-py3-none-any.whl size=26325 sha256=aa0304199d2f8dedc5563fff117509e71579ba850bf33df35d45b4c504e8372d
  Stored in directory: /r

In [6]:
torch.cuda.current_device()

0

In [7]:
torch.cuda.device(0)

<torch.cuda.device at 0x7facdf9cd050>

In [8]:
torch.cuda.device_count()

2

In [9]:
torch.cuda.get_device_name(0)

'Tesla T4'

In [10]:
torch.cuda.is_available()

True

In [12]:
# setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()

#Additional Info when using cuda
if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))
    print('Memory Usage:')
    print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
    print('Cached:   ', round(torch.cuda.memory_reserved(0)/1024**3,1), 'GB')

Using device: cuda

Tesla T4
Memory Usage:
Allocated: 0.0 GB
Cached:    0.0 GB


In [13]:
!nvidia-smi

Wed Dec  7 17:39:45 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P8     8W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:00:05.0 Off |                    0 |
| N/A   37C    P8     9W /  70W |      3MiB / 15109MiB |      0%      Default |
|       

To download and use kaggle data within Google Colab:
1. Go to your account, Scroll to API section and Click Expire API Token to remove previous tokens
2. Click on Create New API Token - It will download kaggle.json file on your machine.
3. Go to your Google Colab project file and run the following commands:

In [None]:
!pip install -q kaggle

In [None]:
from google.colab import files
files.upload()         # expire any previous token(s) and upload recreated token

 removes any file and delete .kaggle directory, move the uploaded token to a newly created directory and finishes off.

In [None]:
#removes any file and delete .kaggle directory
!rm -r ~/.kaggle

#Choose the kaggle.json file that you downloaded
#Make directory named kaggle and copy kaggle.json file there.
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/

# Change the permissions of the file.
!chmod 600 ~/.kaggle/kaggle.json

#check if everything's okay by running this command.
!kaggle datasets list

In [None]:
!kaggle datasets download 'andrewmvd/dog-and-cat-detection'
!unzip -q dog-and-cat-detection.zip

In [14]:
ROOT = "/kaggle/input/dog-and-cat-detection/"

##### Decoding the Downloaded Annotation Files
- We have two folders
  - images: contains the .png images, 
  - annotations: contain the .xml annotation files.
- the annotation file can be of many types. It will interesting to us xml file. Let's define a function to extract the bounding boxes and labels from the .xml files, we use the following function

#### xml_to_dict(xml_path_
- uses xml.etree.ElementTree to decode the .xml annotation files. 
- and returns the 
  - image size, 
  - object label and 
  - object bounding box coordinates 
as a dictionary.

In [15]:
import xml.etree.ElementTree as ET

In [16]:
def xml_to_dict(xml_path):
    # Decode the .xml file
    tree = ET.parse(xml_path)
    root = tree.getroot()
    # Return the image size, object label and bounding box 
    # coordinates together with the filename as a dict.
    return {"filename": xml_path,
            "image_width": int(root.find("./size/width").text),
            "image_height": int(root.find("./size/height").text),
            "image_channels": int(root.find("./size/depth").text),
            "label": root.find("./object/name").text,
            "x1": int(root.find("./object/bndbox/xmin").text),
            "y1": int(root.find("./object/bndbox/ymin").text),
            "x2": int(root.find("./object/bndbox/xmax").text),
            "y2": int(root.find("./object/bndbox/ymax").text)}

##### Preparing the Datasets
- Let's first create a CatDogDataset class 
- it inherits from torch.utils.data.Dataset 
- and loads the downloaded images and annotations into Python. 

##### CatDogDataset() 
- takes 2 arguments: 
  - root: a string containing the path to the data directory, and
  - transforms: which contains torchvision image transformations.

In [17]:
# Convert human readable str label to int.
label_dict = {"dog": 1, "cat" : 2}
# Convert label int to human readable str.
reverse_label_dict = {1: "dog", 2: "cat"}

In [18]:
class CatDogDataset(torch.utils.data.Dataset):
    def __init__(self, root, transforms = None):
        """
        Inputs
            root: str
                Path to the data folder.
            transforms: Compose or list
                Torchvision image transformations.
        """
        self.root = root
        self.transforms = transforms
        self.files = sorted(os.listdir("images"))
        for i in range(len(self.files)):
            self.files[i] = self.files[i].split(".")[0]
            self.label_dict = label_dict
    def __getitem__(self, i):
        # Load image from the hard disc.
        img = PIL.Image.open(os.path.join(self.root, 
              "images/" + self.files[i] + ".png")).convert("RGB")
        # Load annotation file from the hard disc.
        ann = xml_to_dict(os.path.join(self.root, 
              "annotations/" + self.files[i] + ".xml"))
        # The target is given as a dict.
        target = {}
        target["boxes"] = torch.as_tensor([[ann["x1"], 
                                            ann["y1"], 
                                            ann["x2"], 
                                            ann["y2"]]], 
                                   dtype = torch.float32)
        target["labels"]=torch.as_tensor([label_dict[ann["label"]]],
                         dtype = torch.int64)
        target["image_id"] = torch.as_tensor(i)
        # Apply any transforms to the data if required.
        if self.transforms is not None:
            img, target = self.transforms(img, target)
        return img, target
    def __len__(self):
        return len(self.files)

## Image Transformations
[https://pytorch.org/vision/main/transforms.html]

Transforms are common image transformations available in the torchvision.transforms module. They can be chained together using Compose. Most transform classes have a function equivalent: functional transforms give fine-grained control over the transformations. This is useful if you have to build a more complex transformation pipeline (e.g. in the case of segmentation tasks).

Most transformations accept both PIL images and tensor images, although some transformations are PIL-only and some are tensor-only. The Conversion Transforms may be used to convert to and from PIL images.

The transformations that accept tensor images also accept batches of tensor images. A Tensor Image is a tensor with (C, H, W) shape, where C is a number of channels, H and W are image height and width. A batch of Tensor Images is a tensor of (B, C, H, W) shape, where B is a number of images in the batch.

The expected range of the values of a tensor image is implicitly defined by the tensor dtype. Tensor images with a float dtype are expected to have values in [0, 1). Tensor images with an integer dtype are expected to have values in [0, MAX_DTYPE] where MAX_DTYPE is the largest value that can be represented in that dtype.

Randomized transformations will apply the same transformation to all the images of a given batch, but they will produce different transformations across calls. For reproducible transformations across calls, you may use functional transforms.

##### Image Transformations
- Image transformations augment the training images to make the model more robust to noise or distortions.
- Torchvision contains several such transformations, which can be composed together into a sequence using Compose. 
- Compose also allows for the composed transformations to be applied to images directly.

In [19]:
import torchvision.transforms.functional as F
import torchvision.transforms.transforms as T

In [20]:
class Compose:
    """
    Composes several torchvision image transforms 
    as a sequence of transformations.
    Inputs
        transforms: list
            List of torchvision image transformations.
    Returns
        image: tensor
        target: dict
    """
    def __init__(self, transforms = []):
        self.transforms = transforms
    # __call__ sequentially performs the image transformations on
    # the input image, and returns the augmented image.
    def __call__(self, image, target):
        for t in self.transforms:
            image, target = t(image, target)
        return image, target

- we use two transformations: 
  - ToTensor converts a PIL image into a torch tensor (only for training set)
  - RandomHorizontalFlip randomly flips an image horizontally. 
- ToTensor is applied to all images in order to convert the images into a format which can be input into the model, while RandomHorizontalFlip is applied only to the train image set.

In [21]:
class ToTensor(torch.nn.Module):
    """
    Converts a PIL image into a torch tensor.
    Inputs
        image: PIL Image
        target: dict
    Returns
        image: tensor
        target: dict
    """
    def forward(self, image, target = None):
        image = F.pil_to_tensor(image)
        image = F.convert_image_dtype(image)
        return image, target

class RandomHorizontalFlip(T.RandomHorizontalFlip):
    """
    Randomly flips an image horizontally.
    Inputs
        image: tensor
        target: dict
    Returns
        image: tensor
        target: dict
    """
    def forward(self, image, target = None):
        if torch.rand(1) < self.p:
            image = F.hflip(image)
            if target is not None:
                width, _ = F.get_image_size(image)
                target["boxes"][:, [0, 2]] = width - \
                                     target["boxes"][:, [2, 0]]
        return image, target

#get_transform is a helper function which composes several image transformations together for CatDogDataset using Compose. 
# Other transformations such as adding Gaussian blurring, image resizing or grayscale conversion can be added to get_transform 
# later on.
def get_transform(train):
    """
    Transforms a PIL Image into a torch tensor, and performs
    random horizontal flipping of the image if training a model.
    Inputs
        train: bool
            Flag indicating whether model training will occur.
    Returns
        compose: Compose
            Composition of image transforms.
    """
    transforms = []
    # ToTensor is applied to all images.
    transforms.append(ToTensor())
    # The following transforms are applied only to the train set.
    if train == True:
        transforms.append(RandomHorizontalFlip(0.5))
        # Other transforms can be added here later on.
    return Compose(transforms)

##### Train-Validation-Test Split
- load and shuffle the data, and split the entire dataset into train, validation and test splits.

- First of all, we load the data using CatDogDataset.

In [25]:
os.getcwd()

'/kaggle/working'

In [26]:
os.chdir(ROOT)
os.getcwd()

'/kaggle/input/dog-and-cat-detection'

In [27]:
os.listdir()

['annotations', 'images']

In [28]:
# Train dataset | Set train = True to apply the training image transforms.
train_ds = CatDogDataset("./", get_transform(train = True))

# Validation dataset.
val_ds = CatDogDataset("./", get_transform(train = False))

# Test dataset.
test_ds = CatDogDataset("./", get_transform(train = False))

- we randomly shuffle the data and split the data into the train-validation-test splits. 
- 80/20 train and test splits, 
- further split the train split into 80/20 train and validation sub-splits. 
- This results in a 64/16/20 split.

In [29]:
# shuffle
indices = torch.randperm(len(train_ds)).tolist()

# Train dataset: 64% of the entire data
train_ds = torch.utils.data.Subset(train_ds,
           indices[:int(len(indices) * 0.64)])

# Validation dataset: 16% of the entire data
val_ds = torch.utils.data.Subset(val_ds, 
         indices[int(len(indices) * 0.64):int(len(indices) * 0.8)])

# Test dataset: 20% of the entire data.
test_ds = torch.utils.data.Subset(test_ds, 
          indices[int(len(indices) * 0.8):])

##### Feed the data into the torchvision models
- create DataLoaders to feed the data into the torchvision models.
- We use a batch size of 16 images per batch. 
- Depending on your available GPU memory, you might need to change the batch size— smaller memory means smaller batch sizes!

In [30]:
BATCH_SIZE = 16

### TORCH.UTILS.DATA
At the heart of PyTorch data loading utility is the torch.utils.data.DataLoader class. It represents a Python iterable over a dataset, with support for
  - map-style and iterable-style datasets,
  - customizing data loading order,
  - automatic batching,
  - single- and multi-process data loading,
  - automatic memory pinning.

These options are configured by the constructor arguments of a DataLoader, which has signature:

[https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader]


DataLoader(dataset, batch_size=1, shuffle=False, sampler=None,
           batch_sampler=None, num_workers=0, collate_fn=None,
           pin_memory=False, drop_last=False, timeout=0,
           worker_init_fn=None, *, prefetch_factor=2,
           persistent_workers=False)

Data loader. Combines a dataset and a sampler, and provides an iterable over the given dataset.

The DataLoader supports both map-style and iterable-style datasets with single- or multi-process loading, customizing loading order and optional automatic batching (collation) and memory pinning.

### Dataset Types
The most important argument of DataLoader constructor is dataset, which indicates a dataset object to load data from. PyTorch supports two different types of datasets:
  - map-style datasets,
  - iterable-style datasets.

##### Map-style datasets
A map-style dataset is one that implements the __getitem__() and __len__() protocols, and represents a map from (possibly non-integral) indices/keys to data samples.

For example, such a dataset, when accessed with dataset[idx], could read the idx-th image and its corresponding label from a folder on the disk.


##### Iterable-style datasets
An iterable-style dataset is an instance of a subclass of IterableDataset that implements the __iter__() protocol, and represents an iterable over data samples. This type of datasets is particularly suitable for cases where random reads are expensive or even improbable, and where the batch size depends on the fetched data.

For example, such a dataset, when called iter(dataset), could return a stream of data reading from a database, a remote server, or even logs generated in real time.

In [31]:
# Collate image-target pairs into a tuple.
def collate_fn(batch):
    return tuple(zip(*batch))

In [32]:
# Create the DataLoaders from the Datasets. 
train_dl = torch.utils.data.DataLoader(train_ds, 
                                 batch_size = BATCH_SIZE, 
                                 shuffle = True, 
                        collate_fn = collate_fn)

val_dl = torch.utils.data.DataLoader(val_ds, 
                             batch_size = BATCH_SIZE, 
                            shuffle = False, 
                    collate_fn = collate_fn)

test_dl = torch.utils.data.DataLoader(test_ds, 
                               batch_size = BATCH_SIZE, 
                              shuffle = False, 
                      collate_fn = collate_fn)

#### Download the Object Detection Model
- The original faster r-cnn model locates 91 classes, and we have to make some modifications to the output layer of the model in order to focus on cats and dogs only.
- Cats and dogs comprise 2 different classes. Additionally, the model also makes predictions for the background which is set to class 0 by default. Therefore we have to modify the model to locate 3 classes.

In [33]:
NUM_CLASSES = 3

The torchvision.models subpackage contains definitions of models for addressing different tasks, including: image classification, pixelwise semantic segmentation, object detection, instance segmentation, person keypoint detection, video classification, and optical flow.

The Faster R-CNN model is based on the Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks paper.	


Faster R-CNN model with a ResNet-50-FPN backbone from the [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497) paper. [https://pytorch.org/vision/main/models/faster_rcnn.html]

In [34]:
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

In [36]:
def get_object_detection_model(num_classes = NUM_CLASSES, 
                               feature_extraction = True):
    """
    Inputs
        num_classes: int
            Number of classes to predict. includes the 
            background which is class 0 by definition
        feature_extraction: bool
            Flag indicating whether to freeze the pre-trained 
            weights. If set to True the pre-trained weights will be  
            frozen and not be updated during.
    Returns
        model: FasterRCNN
    """
    # Load the pretrained faster r-cnn model.
    model = fasterrcnn_resnet50_fpn(pretrained = True)

    # If True, the pre-trained weights will be frozen.
    if feature_extraction == True:
        for p in model.parameters():
            p.requires_grad = False
    
    # Replace the original 91 class top layer with a new layer tailored for num_classes.
    
    # get number of input features for the classifier
    in_feats = model.roi_heads.box_predictor.cls_score.in_features
    
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_feats,
                                                   num_classes)
    return model

##### Model Training Helper Functions
- unbatching data from the Dataloaders
- performing model training using back propagation.

In [37]:
def unbatch(batch, device):
    """
    Unbatches a batch of data from the Dataloader.
    Inputs
        batch: tuple
            Tuple containing a batch from the Dataloader.
        device: str
            Indicates which device (CPU/GPU) to use.
    Returns
        X: list
            List of images.
        y: list
            List of dictionaries.
    """
    X, y = batch
    X = [x.to(device) for x in X]
    y = [{k: v.to(device) for k, v in t.items()} for t in y]
    return X, y

def train_batch(batch, model, optimizer, device):
    """
    Uses back propagation to train a model.
    Inputs
        batch: tuple
            Tuple containing a batch from the Dataloader.
        model: torch model
        optimizer: torch optimizer
        device: str
            Indicates which device (CPU/GPU) to use.
    Returns
        loss: float
            Sum of the batch losses.
        losses: dict
            Dictionary containing the individual losses.
    """
    model.train()
    X, y = unbatch(batch, device = device)
    optimizer.zero_grad()
    losses = model(X, y)
    loss = sum(loss for loss in losses.values())
    loss.backward()
    optimizer.step()
    return loss, losses
@torch.no_grad()

def validate_batch(batch, model, optimizer, device):
    """
    Evaluates a model's loss value using validation data.
    Inputs
        batch: tuple
            Tuple containing a batch from the Dataloader.
        model: torch model
        optimizer: torch optimizer
        device: str
            Indicates which device (CPU/GPU) to use.
    Returns
        loss: float
            Sum of the batch losses.
        losses: dict
            Dictionary containing the individual losses.
    """
    model.train()
    X, y = unbatch(batch, device = device)
    optimizer.zero_grad()
    losses = model(X, y)
    loss = sum(loss for loss in losses.values())
    return loss, losses

- Finally, we create a main driver function to actually train the downloaded faster r-cnn model using the helper functions above. 
- As PyTorch does not natively include any training history recorder, we use Report from the torch_snippets package to record our train and validation losses.

In [38]:
def train_fasterrcnn(model, 
                 optimizer, 
                  n_epochs, 
              train_loader, 
        test_loader = None, 
                log = None, 
               keys = None, 
            device = "cpu"):
    """
    Trains a FasterRCNN model using train and validation 
    Dataloaders over n_epochs. 
    Returns a Report on the training and validation losses.
    Inputs
        model: FasterRCNN
        optimizer: torch optimizer
        n_epochs: int
            Number of epochs to train.
        train_loader: DataLoader
        test_loader: DataLoader
        log: Record
            torch_snippet Record to record training progress.
        keys: list
            List of strs containing the FasterRCNN loss names.
        device: str
            Indicates which device (CPU/GPU) to use.
    Returns
        log: Record
            torch_snippet Record containing the training records.
    """
    if log is None:
        log = Report(n_epochs)
    if keys is None:
        # FasterRCNN loss names.
        keys = ["loss_classifier", 
                   "loss_box_reg", 
                "loss_objectness", 
               "loss_rpn_box_reg"]
    
    model.to(device)
    
    for epoch in range(n_epochs):
        N = len(train_loader)
        for ix, batch in enumerate(train_loader):
            loss, losses = train_batch(batch, model, 
                                  optimizer, device)
            # Record the current train loss.
            pos = epoch + (ix + 1) / N
            log.record(pos = pos, trn_loss = loss.item(), 
                       end = "\r")
        if test_loader is not None:
            N = len(test_loader)
            for ix, batch in enumerate(test_loader):
                loss, losses = validate_batch(batch, model, 
                                         optimizer, device)
                
                # Record the current validation loss.
                pos = epoch + (ix + 1) / N
                log.record(pos = pos, val_loss = loss.item(), 
                           end = "\r")
    
    log.report_avgs(epoch + 1)
    return log

Training the Faster R-CNN Model
Now that we have everything ready, we can finally start training the faster r-cnn model! We use the stochastic gradient descent optimizer and train the model over 1 epoch.

In [39]:
EPOCHS = 15
LEARNING_RATE = 0.001
MOMENTUM = 0.9
WEIGHT_DECAY = 0.0005

In [41]:
#create model
model = get_object_detection_model(num_classes = NUM_CLASSES,   
                        feature_extraction = False)

Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /root/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth


  0%|          | 0.00/160M [00:00<?, ?B/s]

In [42]:
# Use the stochastic gradient descent optimizer.
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, 
                        lr = LEARNING_RATE, 
                    momentum = MOMENTUM, 
             weight_decay = WEIGHT_DECAY)

In [43]:
from torch_snippets import Report

In [59]:
del train_ds
del val_ds
del test_ds

In [61]:
torch.cuda.empty_cache()

##### TORCH.CUDA.CURRENT_DEVICE
- Read : [https://pytorch.org/docs/stable/generated/torch.cuda.mem_get_info.html#torch.cuda.mem_get_info]
- Returns the global free and total GPU memory occupied for a given device using cudaMemGetInfo.

In [65]:
torch.cuda.mem_get_info(device=torch.device(torch.cuda.current_device()))

(148635648, 15843721216)

In [66]:
!nvidia-smi

Wed Dec  7 18:30:08 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   59C    P0    26W /  70W |  14968MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:00:05.0 Off |                    0 |
| N/A   37C    P8     9W /  70W |      3MiB / 15109MiB |      0%      Default |
|       

Hugging Face Accelerate
Accelerate is a lightweight framework designed for 4 things:

Consolidating your code to be ran on CPU, TPU, and Multi-GPU
A simplistic CLI launcher to do the above without having to remember complex commands
A simplistic notebook launcher to do the above while inside a Jupyter Notebook
An interface to let you perform inference on huge models with small compute
That third item is what we'll focus on in this notebook

Kaggle comes preinstalled with accelerate already for you:



- upgrade it to enable the notebook_launcher through Github (or pypi once it is released and Kaggle hasn't updated it yet):

notebook_launcher
The notebook_launcher is a small function that can launch your code on a notebook on TPUs or multiple GPUs.

To use it you pass in the function, the arguments as a tuple, and the number of processes to train on.

(See the basic tutorial to learn more)

In [72]:
# Train the model over 15 epochs.
log = train_fasterrcnn(model = model, 
               optimizer = optimizer, 
                        n_epochs = 1,
             train_loader = train_dl, 
                test_loader = val_dl,
             log = None, keys = None,
                     device = device)

RuntimeError: CUDA out of memory. Tried to allocate 1.45 GiB (GPU 0; 14.76 GiB total capacity; 12.92 GiB already allocated; 441.75 MiB free; 13.41 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

### Model Predictions
- Now that we have a faster r-cnn model trained specifically to detect cats and dogs, we will make some predictions using images from the test set.

###### Helper functions
- predict()
- predict_batch()

In [None]:
@torch.no_grad() #when testing the model

In [None]:
def predict_batch(batch, model, device):
    """
    Gets the predictions for a batch of data.
    Inputs
        batch: tuple
            Tuple containing a batch from the Dataloader.
        model: torch model
        device: str
            Indicates which device (CPU/GPU) to use.
    Returns
        images: list
            List of tensors of the images.
        predictions: list
            List of dicts containing the predictions for the 
            bounding boxes, labels and confidence scores.
    """
    model.to(device)
    model.eval()
    X, _ = unbatch(batch, device = device)
    predictions = model(X)
    return [x.cpu() for x in X], predictions

def predict(model, data_loader, device = "cpu"): 
    """
    Gets the predictions for a batch of data.
    Inputs
        model: torch model
        data_loader: torch Dataloader
        device: str
            Indicates which device (CPU/GPU) to use.
    Returns
        images: list
            List of tensors of the images.
        predictions: list
            List of dicts containing the predictions for the 
            bounding boxes, labels and confidence scores.
    """
    images = []
    predictions = []
    for i, batch in enumerate(data_loader):
        X, p = predict_batch(batch, model, device)
        images = images + X
        predictions = predictions + p
    
    return images, predictions

The model outputs predictions as a list of dictionaries with the following keys:
- boxes: bounding boxes of any detected objects,
- scores: model prediction confidence scores,
- labels: labels of any detected objects.


###### decode_prediction()
- decode_prediction takes decodes this dictionary, and filters out any predictions with a score lower than score_threshold
- Additionally, non-maximum suppression is used to remove any overlapping bounding boxes using nms_iou_threshold.

In [None]:
def decode_prediction(prediction, 
                      score_threshold = 0.8, 
                      nms_iou_threshold = 0.2):
    """
    Inputs
        prediction: dict
        score_threshold: float
        nms_iou_threshold: float
    Returns
        prediction: tuple
    """
    boxes = prediction["boxes"]
    scores = prediction["scores"]
    labels = prediction["labels"]

    # Remove any low-score predictions.
    if score_threshold is not None:
        want = scores > score_threshold
        boxes = boxes[want]
        scores = scores[want]
        labels = labels[want]
    # Remove any overlapping bounding boxes using NMS.
    
    if nms_iou_threshold is not None:
        want = torchvision.ops.nms(boxes = boxes, scores = scores, 
                                iou_threshold = nms_iou_threshold)
        boxes = boxes[want]
        scores = scores[want]
        labels = labels[want]
    return (boxes.cpu().numpy(), 
            labels.cpu().numpy(), 
            scores.cpu().numpy())

With these functions prepared, the model’s predictions on the test dataset can be obtained together with the images used.

In [None]:
images, predictions = predict(model, test_dl, device)

The predicted bounding box, labels and scores can be displayed together with the images to visualize the model’s outputs.

In [None]:
img_index = 0
boxes, labels, scores = decode_prediction(predictions[img_index])
fig, ax = plt.subplots(figsize = [5, 5])
ax.imshow(images[img_index].permute(1, 2, 0).numpy())

for i, b in enumerate(boxes):
    rect = patches.Rectangle(b[:2].astype(int),
                             (b[2] - b[0]).astype(int),
                             (b[3] - b[1]).astype(int),
                             linewidth = 1,
                             edgecolor = "r",
                             facecolor = "none")
    ax.add_patch(rect)
    ax.text(b[0].astype(int),
            b[1].astype(int) - 5,
            "{} : {:.3f}".format(reverse_label_dict[labels[i]],
            scores[i]), color = "r")
plt.show()