<a href="https://colab.research.google.com/github/yurigalindo/PyTorchSamples/blob/main/Colab%20Notebooks/Fruit_Detection_v3_Reduced_Size.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Visualizing the dataset

First, let's get the dataset to a local folder.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive') #Mounting my google drive in colab

Mounted at /content/gdrive


In [None]:
%%capture
!unzip /content/gdrive/My\ Drive/Dataset_questao2_adroit -d . #Unzip the file to current directory

Now I'll also download some helper functions from torchvision. The references directory also has a lot of code that I adapted on this notebook.

In [None]:
%%shell

# Download TorchVision repo to use some files from
# references/detection
git clone https://github.com/pytorch/vision.git
cd vision

cp references/detection/utils.py ../
cp references/detection/transforms.py ../
cp references/detection/coco_eval.py ../
cp references/detection/engine.py ../
cp references/detection/coco_utils.py ../

Cloning into 'vision'...
remote: Enumerating objects: 29130, done.[K
remote: Counting objects: 100% (1673/1673), done.[K
remote: Compressing objects: 100% (436/436), done.[K
remote: Total 29130 (delta 1249), reused 1603 (delta 1206), pack-reused 27457[K
Receiving objects: 100% (29130/29130), 37.49 MiB | 33.86 MiB/s, done.
Resolving deltas: 100% (21859/21859), done.




Let's define the dataset

In [None]:
import os
import numpy as np
import torch
import torch.utils.data
from PIL import Image
import json

LABEL_ENCODING = {'Fruta Madura': 1, 'Fruta Verde': 2, 'Madura Anomalia':1, 'Verde Anomalia': 2}
#The labels are in string format, we'll convert to integer. Ripe will be 1 and Green will be 2.
#0 will be reserved for the background class
class FruitDataset(torch.utils.data.Dataset):
    def __init__(self, root, transforms=None):
        self.root = root
        self.transforms = transforms
        #store all the filename of the images
        self.imgs = list(sorted([os.path.splitext(file)[0] for file in os.listdir(root) if file.endswith(".jpg")])) #gets only the filename of jpgs

    def __getitem__(self, idx):
        # load images and boxes
        img_path = os.path.join(self.root, self.imgs[idx] + '.jpg')
        img = Image.open(img_path).convert("RGB")
        
        json_path = os.path.join(self.root,self.imgs[idx] + '.json')
        with open(json_path) as ofile:
          json_file = json.load(ofile)

        
        # get bounding box coordinates for each detection 
        num_objs = len(json_file['shapes'])
        boxes = []
        labels = []
        for shape in json_file['shapes']:
          pos = shape['points']
          xmin = np.min(pos[0][0])
          ymax = np.min(pos[0][1])
          xmax = np.max(pos[1][0])
          ymin = np.max(pos[1][1])
          boxes.append([xmin, ymin, xmax, ymax])
          labels.append(LABEL_ENCODING[shape['label']])

        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        labels = torch.as_tensor(labels, dtype=torch.int64)
        

        target = {}
        target["image_id"] =  torch.tensor([idx])
        target["boxes"] = boxes
        target["labels"] = labels
        # the evaluate function needs this extra info, so let's add it
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        iscrowd = torch.zeros((num_objs,), dtype=torch.int64)
        target["area"] = area
        target["iscrowd"] = iscrowd 

        if self.transforms is not None:
            img,target = self.transforms(img,target)

        return img, target

    def __len__(self):
        return len(self.imgs)

In [None]:
from engine import  evaluate
import utils
import transforms as T


def get_transform(train):
    transforms = []
    # converts the image, a PIL image, into a PyTorch Tensor
    transforms.append(T.ToTensor())
    if train:
        # during training, randomly flip the training images
        # and boxes for data augmentation
        transforms.append(T.RandomHorizontalFlip(p=0.5))
    return T.Compose(transforms)

In [None]:
# use our dataset and defined transformations
dataset = FruitDataset('Dados', get_transform(train=True))
dataset_test = FruitDataset('Dados', get_transform(train=False))

# split the dataset in train and test set
torch.manual_seed(1)
indices = torch.randperm(len(dataset)).tolist()
dataset = torch.utils.data.Subset(dataset, indices[:65]) #approx 80%
dataset_test = torch.utils.data.Subset(dataset_test, indices[65:])

# define training and validation data loaders
data_loader = torch.utils.data.DataLoader(
    dataset, batch_size=8, shuffle=True,
    collate_fn=utils.collate_fn)

data_loader_test = torch.utils.data.DataLoader(
    dataset_test, batch_size=8, shuffle=False,
    collate_fn=utils.collate_fn)

Now let's take a look at the dataset

In [None]:
import matplotlib.pyplot as plt
import torchvision.transforms.functional as F
plt.rcParams["savefig.bbox"] = 'tight'
plt.rcParams["figure.figsize"] = [20,20]

#We'll define this function for showing an image with the boxes
def show(imgs):
    if not isinstance(imgs, list):
        imgs = [imgs]
    fix, axs = plt.subplots(ncols=len(imgs), squeeze=False)
    for i, img in enumerate(imgs):
        img = img.detach()
        img = F.to_pil_image(img)
        axs[0, i].imshow(np.asarray(img))
        axs[0, i].set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])
        

In [None]:
from torchvision.transforms.functional import convert_image_dtype
from torchvision.utils import draw_bounding_boxes

image = dataset[0][0]
boxes = dataset[0][1]['boxes']
labels = dataset[0][1]['labels']
image_int = convert_image_dtype(image,dtype=torch.uint8)
pic_with_boxes = [
    draw_bounding_boxes(image_int, boxes=boxes, colors = [['red','yellow','green'][x] for x in labels],width=4)
]
show(pic_with_boxes)

*Images ommited due to copyright and filesize of the notebook*

We can see that the dataset is working as intended. Let's move on to defining and training the model.

In [None]:
import math
import time


def train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq):
    #This function trains the model on the dataloader once (for one epoch)
    #Since the dataset is small, I accumulated the gradient for the whole dataset instead of updating for each batch.
    model.train()
    metric_logger = utils.MetricLogger(delimiter="  ")
    metric_logger.add_meter('lr', utils.SmoothedValue(window_size=1, fmt='{value:.6f}'))
    header = 'Epoch: [{}]'.format(epoch)

    optimizer.zero_grad()
    for images, targets in metric_logger.log_every(data_loader, print_freq, header):
        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
    
        loss_dict = model(images, targets)
        losses = sum(loss for loss in loss_dict.values())
        #Gets the image and target of the batch and calculate losses

        # reduce losses over all GPUs if using for logging purposes
        loss_dict_reduced = utils.reduce_dict(loss_dict)
        losses_reduced = sum(loss for loss in loss_dict_reduced.values())

        loss_value = losses_reduced.item()

        if not math.isfinite(loss_value): #Checks if the loss has exploded
            print("Loss is {}, stopping training".format(loss_value))
            print(loss_dict_reduced)
            sys.exit(1)

        
        losses.backward() #Accumulate the gradient
        metric_logger.update(loss=losses_reduced, **loss_dict_reduced)
        metric_logger.update(lr=optimizer.param_groups[0]["lr"])
    optimizer.step() #Update the model with the gradients and optimizer
    return metric_logger


Now defining the model

In [None]:
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
# load a model pre-trained pre-trained on COCO
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)


# replace the classifier with a new one, that has
# number of classes + background
num_classes = 3  
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes) 
model.roi_heads.box_predictor.requires_grad_=True

# move model to the right device
model.to(device)

# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.AdamW (params,lr=3e-4,weight_decay=1e-2)

# and a learning rate scheduler which decreases the learning rate by
# 10x every 20 epochs
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                               step_size=10,
                                               gamma=0.36)

Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /root/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth


HBox(children=(FloatProgress(value=0.0, max=167502836.0), HTML(value='')))




#And finally, training:


In [None]:
# let's train it for 60 epochs
num_epochs = 60

for epoch in range(num_epochs):
    # train for one epoch, printing every 10 iterations
    train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
    # update the learning rate
    lr_scheduler.step()
    # evaluate on the training dataset to see if the model is improving
    evaluate(model, data_loader, device=device)

    if (epoch % 5 ==0):
      #print on the validation dataset each 5 epochs
      evaluate(model, data_loader_test, device=device)

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


Epoch: [0]  [0/9]  eta: 0:01:08  lr: 0.000300  loss: 7.0007 (7.0007)  loss_classifier: 1.0869 (1.0869)  loss_box_reg: 0.5195 (0.5195)  loss_objectness: 5.0139 (5.0139)  loss_rpn_box_reg: 0.3804 (0.3804)  time: 7.6445  data: 4.8433  max mem: 10452
Epoch: [0]  [8/9]  eta: 0:00:06  lr: 0.000300  loss: 6.8501 (6.7968)  loss_classifier: 1.0869 (1.0822)  loss_box_reg: 0.5233 (0.5391)  loss_objectness: 4.8744 (4.8404)  loss_rpn_box_reg: 0.3265 (0.3351)  time: 6.2689  data: 3.8982  max mem: 10557
Epoch: [0] Total time: 0:00:56 (6.2696 s / it)
creating index...
index created!
Test:  [0/9]  eta: 0:00:52  model_time: 0.9965 (0.9965)  evaluator_time: 0.2816 (0.2816)  time: 5.8430  data: 4.1494  max mem: 10557
Test:  [8/9]  eta: 0:00:05  model_time: 1.0153 (0.9148)  evaluator_time: 0.3771 (0.3519)  time: 5.4625  data: 3.8228  max mem: 10557
Test: Total time: 0:00:49 (5.4627 s / it)
Averaged stats: model_time: 1.0153 (0.9148)  evaluator_time: 0.3771 (0.3519)
Accumulating evaluation results...
DONE (

#Let's evaluate the model on the validation set:

In [None]:
evaluate(model, data_loader_test, device=device)

creating index...
index created!
Test:  [0/3]  eta: 0:00:16  model_time: 0.9942 (0.9942)  evaluator_time: 0.3886 (0.3886)  time: 5.3419  data: 3.5588  max mem: 10870
Test:  [2/3]  eta: 0:00:04  model_time: 0.9942 (0.8836)  evaluator_time: 0.3886 (0.3376)  time: 4.6797  data: 3.1064  max mem: 10870
Test: Total time: 0:00:14 (4.6806 s / it)
Averaged stats: model_time: 0.9942 (0.8836)  evaluator_time: 0.3886 (0.3376)
Accumulating evaluation results...
DONE (t=0.03s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.166
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.370
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.105
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.015
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.150
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.274
 Average Recall     (AR) @[ IoU=0.

<coco_eval.CocoEvaluator at 0x7f0fbd778690>

After 60 epochs (approximately 3 hours of training), we obtained 37% mean average precision at 50% IoU (the model we used achieves 60% at the original COCO dataset). 

This means that for the predictions with confidence over 50%, 37% of them had over 50% of of common area with the correct box for the predicted class.

We obtained 29% recall, meaning that of all objects in the image about 29% of them were correctly identified with thresholds and common area above 50%-95%.  (the original model obtains 51% at the COCO dataset).

We can probably improve these results, but they are still respectable and useful.

We can also visualize all the predictions, and store them at a Google Drive folder.

In [None]:
from torchvision.transforms.functional import convert_image_dtype
from torchvision.utils import draw_bounding_boxes
model.eval()
for i,example in enumerate(dataset_test):
  image = example[0]
  prediction = model(image.unsqueeze(dim=0).to(device))
  image_int = convert_image_dtype(image,dtype=torch.uint8)
  score_threshold = .5
  pic_with_boxes = [
      draw_bounding_boxes(image_int, boxes=prediction[0]['boxes'][prediction[0]['scores'] > score_threshold], colors = [['red','yellow','green'][x] for x in prediction[0]['labels'][prediction[0]['scores'] > score_threshold]],width=4)
  ]
  show(pic_with_boxes)
  plt.savefig('prediction_v3_{0}.jpg'.format(i))
!mv *.jpg /content/gdrive/My\ Drive/fruit_predictions

*Images ommited due to copyright and filesize of the notebook*

We can see that the model gets confused with images with unusual lightning, that alter the color of the leaves and fruits. Leaves are the main source of false positives.

The model also misses a lot of fruits, which is aggravated with darker images and occlusion.

Aside from these mistakes, the model does all in all a good job, and we can see that it has learned to detect fruits and to classify between green and ripe.

We can save the model for future use and reference:

In [None]:
torch.save(model.state_dict, 'v3_statedict.pt')
torch.save(model, 'v3_wholemodel.pt' )
!mv *.pt /content/gdrive/My\ Drive/fruit_predictions

And also visualize the predictions on the training data.

In [None]:
from torchvision.transforms.functional import convert_image_dtype
from torchvision.utils import draw_bounding_boxes
model.eval()
for i,example in enumerate(dataset):
  image = example[0]
  prediction = model(image.unsqueeze(dim=0).to(device))
  image_int = convert_image_dtype(image,dtype=torch.uint8)
  score_threshold = .5
  pic_with_boxes = [
      draw_bounding_boxes(image_int, boxes=prediction[0]['boxes'][prediction[0]['scores'] > score_threshold], colors = [['red','yellow','green'][x] for x in prediction[0]['labels'][prediction[0]['scores'] > score_threshold]],width=4)
  ]
  show(pic_with_boxes)
  plt.savefig('prediction_train_v3_{0}.jpg'.format(i))
!mv *.jpg /content/gdrive/My\ Drive/fruit_train_pred

*Images ommited due to copyright and filesize of the notebook*

The model seems to do better on the training set but still makes the same type of mistakes, indicating that overfitting is probably not a factor.

The model I presented was the third one I trained completely. For comparison, below is the evaluation of the second version. 

The main differences were changing the code of the training and learning rate.

In [None]:
evaluate(model, data_loader_test, device=device)

creating index...
index created!
Test:  [0/3]  eta: 0:00:16  model_time: 0.9451 (0.9451)  evaluator_time: 0.3384 (0.3384)  time: 5.3931  data: 3.6936  max mem: 10871
Test:  [2/3]  eta: 0:00:04  model_time: 0.9364 (0.8338)  evaluator_time: 0.3384 (0.3050)  time: 4.8150  data: 3.3196  max mem: 10871
Test: Total time: 0:00:14 (4.8154 s / it)
Averaged stats: model_time: 0.9364 (0.8338)  evaluator_time: 0.3384 (0.3050)
Accumulating evaluation results...
DONE (t=0.03s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.073
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.195
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.039
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.072
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.104
 Average Recall     (AR) @[ IoU=0.

<coco_eval.CocoEvaluator at 0x7fac2439eb50>

#After this third version, Google Colaboratory stopped giving me GPU allocation. I believe that the model could be further improved by further exploring some directions:



-Experimenting with different optimizers, learning rates and learning rate schedules. The model was not capable of achieving small loss on the training set, probably due to innefective training.


-Different sized models and partial training of the network. The inabilty of decreasing the loss can be due to the network being too big and the function too complex, which could be helped by using smaller models or maintaining the first layers fixed. It can also be that the problem is too complex and we need a bigger network. Some experiments are needed to guide the direction


-Other forms of data augmentation, in particular changing the brightness and possibly small variations of hue. The model seems to struggle with images in which the light alters the colors of the fruit. By applying these augmentations, we can effectively enlarge the dataset.