## Exploratory data analysis and training

This notebook describes the steps of my exploratory data analyis and training with pytorch.
The training is done by finetunning a Fast RCNN model using pytocrch. I followed this tutorial https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html for the purpose. 

Also I copied some code from this notebook to quickly familiarise myself with understanding the dataset.
https://www.kaggle.com/mrinath/efficientdet-train-pytorch

### Setting up my training environment

Since I am using pytorch and finetunning a FastRCNN (https://arxiv.org/pdf/1506.01497.pdf), I need to install
and download certain files for my data analysis. Unfortunately, I had few dependecies issues among the different
packages so I had to reinstall most on kaggle (because kaggle actually have a lot of these packages already install).

If you don't want to run my model and you only want to familiarise your self with understanding the data, you can skip these first few cells untill the ones below EDA section.


In [None]:
# Try doing all the installations here
!pip install -I numpy

!pip install -I torchvision
!pip install -I torch -U   

# First, we need to install pycocotools. This library will be used for computing the evaluation metrics following the COCO metric for intersection over union.

!pip install cython
# Install pycocotools, the version by default in Colab
# has a bug fixed in https://github.com/cocodataset/cocoapi/pull/354
!pip install -I 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI' --no-binary pycocotools
# !ls /kaggle/input/baseline-predict-pytorch

# !pip install -I pycocotools==2.0.0

In [None]:
# git clone the utility functions and evaluation functions from pytorch coco dataset

!git clone https://github.com/pytorch/vision.git
!cd vision
!git checkout v0.8.2

Note that in kaggle all your input files will be in a directory /kaggle/input/

So you need to do !ls /kaggle/input/ if you want to list the files in this directory.

All the output files generated by your notebook are stored in /kaggle/output/

If you to use external scripts in your notebook, you have to put them in /kaggle/working. 

So after git clone vision from pytorch, I copy the folling files to /kaggle/working/ because 
I will need them during my training

In [None]:
!cp vision/references/detection/utils.py /kaggle/working/
!cp vision/references/detection/transforms.py /kaggle/working/
!cp vision/references/detection/coco_eval.py /kaggle/working/
!cp vision/references/detection/engine.py /kaggle/working/
!cp vision/references/detection/coco_utils.py /kaggle/working/

### EDA

In [None]:
# import some necessary packages
import numpy as np
import pandas as pd
import torch
import matplotlib.pyplot as plt
import seaborn as sns
import ast
import sys
import os


# os.environ['TORCH_HOME'] = '\\kaggle\\input\\resnet'

import glob
import sklearn
import math
import random

import cv2
import albumentations as A
from albumentations.pytorch import ToTensorV2

from transformers import get_cosine_schedule_with_warmup

from PIL import Image

import warnings
warnings.filterwarnings('ignore')
from sklearn import metrics, model_selection, preprocessing
from sklearn.model_selection import GroupKFold

# Make sure to select GPU when you are training on kaggle if not you might need to run your model
# ages

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

print(device)

In [None]:
# read the train csv with pandas
df = pd.read_csv("../input/tensorflow-great-barrier-reef/train.csv")
# df = df[df.annotations != '[]'] # in my next training I will activate this to drop images with no starfish
# df = df.reset_index(drop = True)
df.head(20)

In [None]:
# This is just assigning different folds to the data. This basically is just to help if you want to cross validation
# google cross validation if you don't know what that means
df['fold'] = -1
kf = GroupKFold(n_splits = 5)
for fold, (train_idx, val_idx) in enumerate(kf.split(df, y = df.video_id.tolist(), groups=df.sequence)):
    df.loc[val_idx, 'fold'] = fold

In [None]:
df.head()

In [None]:
df.fold.value_counts()

In [None]:
# add the imaging paths to the dataframe
df['path'] = [f"../input/tensorflow-great-barrier-reef/train_images/video_{a}/{b}.jpg" for a,b in zip(df["video_id"],df["video_frame"])]
df['annotations'] = df['annotations'].apply(eval)
df['number_boxes'] = df['annotations'].apply(lambda x: len(x))
df.head()

In [None]:
# plot some of the images
import matplotlib.pyplot as plt
from matplotlib import patches

def get_rectangle_edges_from_pascal_bbox(bbox):
    xmin_top_left, ymin_top_left, xmax_bottom_right, ymax_bottom_right = bbox

    bottom_left = (xmin_top_left, ymax_bottom_right)
    width = xmax_bottom_right - xmin_top_left
    height = ymin_top_left - ymax_bottom_right

    return bottom_left, width, height

def draw_pascal_voc_bboxes(
    plot_ax,
    bboxes,
    get_rectangle_corners_fn=get_rectangle_edges_from_pascal_bbox,
):
    for bbox in bboxes:
        bottom_left, width, height = get_rectangle_corners_fn(bbox)

        rect_1 = patches.Rectangle(
            bottom_left,
            width,
            height,
            linewidth=2,
            edgecolor="black",
            fill=False,
        )
        rect_2 = patches.Rectangle(
            bottom_left,
            width,
            height,
            linewidth=2,
            edgecolor="red",
            fill=False,
        )

        # Add the patch to the Axes
        plot_ax.add_patch(rect_1)
        plot_ax.add_patch(rect_2)

def draw_image(
    image, bboxes=None, draw_bboxes_fn=draw_pascal_voc_bboxes, figsize=(10, 10)
):
    fig, ax = plt.subplots(1, figsize=figsize)
    ax.imshow(image)

    if bboxes is not None:
        draw_bboxes_fn(ax, bboxes)

    plt.show()

In [None]:
class DataAdaptor:
    def __init__(self,df):
        self.df = df
    def __len__(self):
        return len(self.df)
    
    def get_boxes(self, row):
        """Returns the bboxes for a given row as a 3D matrix with format [x_min, y_min, x_max, y_max]"""
        
        boxes = pd.DataFrame(row['annotations'], columns=['x', 'y', 'width', 'height']).astype(float).values
        
        # Change from [x_min, y_min, w, h] to [x_min, y_min, x_max, y_max]
        # if you check you will see the images have shape 1280 by 720
        boxes[:, 2] = np.clip(boxes[:, 0] + boxes[:, 2],0,1280)
        boxes[:, 3] = np.clip(boxes[:, 1] + boxes[:, 3],0,720) 
        
        return boxes
    
    def get_image_bb(self , idx):
        img_src = self.df.loc[idx,'path']
        image   = cv2.imread(img_src)
        image   = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        row     = self.df.iloc[idx]
        bboxes  = self.get_boxes(row) 
        class_labels = np.ones(len(bboxes))
        return image, bboxes, class_labels, idx
    
        
    def show_image(self, index):
        image, bboxes, class_labels, image_id = self.get_image_bb(index)
        print(f"image_id: {image_id}")
        draw_image(image, bboxes.tolist())
#         print(class_labels) 
        return image

In [None]:
train_ds = DataAdaptor(df)

In [None]:
im,bb,_,_ = train_ds.get_image_bb(4005)
#bb are the position of the boxes in this image.
#These what we trying to predict
bb

In [None]:
img = train_ds.show_image(20163)

In [None]:
np.where(df["number_boxes"] > 2)

In [None]:
# Checking the number of images in each video folder

num_seq = [len(df[df['video_id'] == i]) for i in range(3)]
labels = ["0", "1", "2"]

fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9,6))
ax.set_facecolor('aliceblue')
plt.grid(color="gray", linestyle="-", zorder=0)
plt.ylabel("Number of Frames", fontsize=16, fontweight="bold")
plt.xlabel("Video ID", fontsize=16, fontweight="bold")
plt.title("Length of train videos", fontsize=20, fontweight="bold")
plt.bar(labels, num_seq, color="orange", zorder=3)
plt.show()

In [None]:
max_num = max(df.number_boxes)
max_sample = df[df["number_boxes"] == max_num].sample()
max_vid_id = max_sample.video_id.values[0]
max_vid_frame = max_sample.video_frame.values[0]

print('\033[1m' + f"Maximum number of starfish in one frame: {max_num} (Video {max_vid_id}, Frame {max_vid_frame})" + '\033[0m')

In [None]:
img = train_ds.show_image(max_sample.index[0])

In [None]:
# Check number of samples without boxes
min_num = 0
min_sample = df[df["number_boxes"] == 0]
print(len(min_sample), len(df), len(df)-len(min_sample))

### Training the model
This is the part where I define and train the model with pytroch

In [None]:
import os
import numpy as np
import torch
import torch.utils.data
from PIL import Image

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset,DataLoader
from torch import optim
from torchvision import transforms


class CotsData(torch.utils.data.Dataset):
    def __init__(self, df, transforms=None):
        self.ds = df
        self.transforms = transforms
    
    def get_boxes(self, row):
        """Returns the bboxes for a given row as a 3D matrix with format [x_min, y_min, x_max, y_max]"""
        
        boxes = pd.DataFrame(row['annotations'], columns=['x', 'y', 'width', 'height']).astype(float).values
        
        # Change from [x_min, y_min, w, h] to [x_min, y_min, x_max, y_max]
        boxes[:, 2] = np.clip(boxes[:, 0] + boxes[:, 2],0,1280)
        boxes[:, 3] = np.clip(boxes[:, 1] + boxes[:, 3],0,720) 
        
        return boxes
            
    def __getitem__(self, idx):
        # load images
        img_path = self.ds.loc[idx,'path']
        # mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
        img = Image.open(img_path).convert("RGB")
        # note that we haven't converted the mask to RGB,
        
        row = self.ds.iloc[idx]
        boxes = self.get_boxes(row)
        num_objs = self.ds.loc[idx, 'number_boxes']

        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        # there is only one class
        labels = torch.ones((num_objs,), dtype=torch.int64)
        
        image_id = torch.tensor([idx])
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        # suppose all instances are not crowd
        iscrowd = torch.zeros((num_objs,), dtype=torch.int64) # check this probably have to set this to true

        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["image_id"] = image_id
        target["area"] = area
        target["iscrowd"] = iscrowd

        if self.transforms is not None:
            img, target = self.transforms(img, target)

        return img, target

    def __len__(self):
        return len(self.ds)

In [None]:
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor

      
def get_instance_segmentation_model(num_classes):
    # load an instance segmentation model pre-trained on COCO
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
    
    # during predict I will use this instead
#     model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=False)
#     model.load_state_dict(torch.load('/kaggle/input/resnet/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth'))

    # get the number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)


    return model

In [None]:
from engine import train_one_epoch, evaluate
import utils
import transforms as T


# I can use this function do define different types of augmentations
# I want to apply to the data
def get_transform(train):
    transforms = []
    # converts the image, a PIL image, into a PyTorch Tensor
    transforms.append(T.ToTensor())
    if train:
        # during training, randomly flip the training images
        # and ground-truth for data augmentation
        transforms.append(T.RandomHorizontalFlip(0.5))
    return T.Compose(transforms)

In [None]:
# for now I will not do any cross validation
# I will just train the model and use the first fold, fold_n = 0 as validation data set

fold_n = 0
train_df= df[df.fold != fold_n]
val_df  = df[df.fold == fold_n]

# use our dataset and defined transformations
dataset = CotsData(train_df.reset_index(drop=True), get_transform(train=True))
dataset_test = CotsData(val_df.reset_index(drop=True), get_transform(train=False))

In [None]:
# split the dataset in train and test set
torch.manual_seed(1)
# # indices = torch.randperm(len(dataset)).tolist()
# # dataset = torch.utils.data.Subset(dataset, indices[:-50])
# # dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:])

# define training and validation data loaders
# we need to use a small batch size if not the we may run out of memory

data_loader = torch.utils.data.DataLoader(
    dataset, batch_size=2, shuffle=True, num_workers=4,
    collate_fn=utils.collate_fn)

data_loader_test = torch.utils.data.DataLoader(
    dataset_test, batch_size=1, shuffle=False, num_workers=4,
    collate_fn=utils.collate_fn)

In [None]:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

# our dataset has two classes only - background and starfish
num_classes = 2

# get the model using our helper function
model = get_instance_segmentation_model(num_classes)
# move model to the right device
model.to(device)

# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
                            momentum=0.9, weight_decay=0.0005)

# and a learning rate scheduler which decreases the learning rate by
# 10x every 3 epochs
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                               step_size=3,
                                               gamma=0.1)

In [None]:
# will train and save the model
# let's train it for 1 epochs
from torch.optim.lr_scheduler import StepLR
num_epochs = 1

for epoch in range(num_epochs):
    # train for one epoch, printing every 10 iterations
    train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
    # update the learning rate
    lr_scheduler.step()
    # evaluate on the test dataset
    evaluate(model, data_loader_test, device=device)

torch.save(model.state_dict(), 'checkpoint.pth')

In [None]:
# pick one image from the test set
img, target = dataset_test[5]
# put the model in evaluation mode
model.eval()
with torch.no_grad():
    prediction = model([img.to(device=device, dtype=torch.float)])[0]
    
print('predicted #boxes: ', len(prediction['labels']))
print('real #boxes: ', len(target['labels']))
print('scores: ', prediction['scores'])
