# Image Tampering Recognition Algorithm Based on Improved YOLOv5s

### A paper by: Zhen Liu 

### **Author**: Tadesse Abateneh

# Object Detection Task in Computer Vision

Object detection, a fundamental task in computer vision, involves identifying objects within images or video frames. While humans effortlessly recognize objects in their surroundings, computers require sophisticated algorithms to achieve similar capabilities. Object detection for computers entails two primary objectives: classification, which involves identifying the type of objects present, and localization, which determines the precise location of these objects within the image.

Various approaches have been devised to address this challenge, with one leading algorithm being YOLO (You Only Look Once). Renowned for its real-time accuracy, YOLO stands out among contemporary solutions. In this exploration, we will delve into training YOLO on custom datasets using Pytorch.

## YOLO: A Swift Real-Time Object Detector

### Understanding YOLO

**`YOLO`**, short for **"You Only Look Once,"** revolutionizes object detection by swiftly identifying all objects within an image through a single algorithm run. This is achieved by dividing the image into a grid, where each cell predicts bounding boxes and class probabilities.

The raw output of YOLO often yields multiple bounding boxes for the same object, varying in shape and size. To refine these predictions, a Non-maximum suppression (`NMS`) algorithm is employed. NMS utilizes confidence levels associated with each predicted box to eliminate those with low certainty, typically below `0.5`. Among the remaining high-confidence boxes, NMS selects the optimal one based on intersection calculations with neighboring boxes, ensuring accurate object localization.

#### `YOLO` versus Other Detectors

While `YOLO` employs a convolutional neural network (`CNN`) like many other detectors, its distinctive single-stage approach enables real-time performance. In contrast, slower algorithms such as Faster R-CNN adopt a two-stage methodology. Firstly, they identify interesting image regions potentially containing objects, followed by classification using a CNN. This two-stage process, however, is more time-consuming as it involves classifying numerous regions individually. YOLO circumvents this by directly predicting bounding boxes and classes for the entire image in a single forward pass, eliminating the need for region selection and significantly enhancing speed.

# Introduction: Implementing `YOLOv5` for Effective Object Detection

Welcome to our `YOLOv5` implementation notebook! 

`YOLOv5` is a cutting-edge object detection model celebrated for its speed and accuracy. In this notebook, we'll explore its architecture, implementation using PyTorch, and training on custom datasets. We'll also optimize inference speed for seamless integration into real-world applications. 

Let's dive in!

### Clone YOLOv5 model

In [None]:
!git clone https://github.com/ultralytics/yolov5
%cd yolov5

### Install Necessary Dependencies

In [None]:
!pip install -qr requirements.txt

In [None]:
!pip install roboflow

In [None]:
!pip install clearml>=1.2.0

## Integrating ClearML for visualization

In [None]:
%env CLEARML_WEB_HOST=https://app.clear.ml
%env CLEARML_API_HOST=https://api.clear.ml
%env CLEARML_FILES_HOST=https://files.clear.ml
%env CLEARML_API_ACCESS_KEY="Insert your API Key"
%env CLEARML_API_SECRET_KEY="Insert your Sectret Key"

### Import Necessary Libraries

In [None]:

import torch
import torch.nn as nn
from roboflow import Roboflow
import yaml
from IPython.core.magic import register_line_cell_magic
from utils.plots import plot_results  
from IPython.display import Image, clear_output  # to display images
from utils.downloads import attempt_download  # to download models/datasets


# clear_output()
print('Setup complete. Using torch %s %s' % (torch.__version__, torch.cuda.get_device_properties(0) if torch.cuda.is_available() else 'CPU'))



### CBAM Attention Module

In [None]:
# Define CBAM Module
class ChannelAttention(nn.Module):
    """
    Channel Attention Module
    """
    def __init__(self, in_planes, ratio=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)
        self.fc = nn.Sequential(
            nn.Conv2d(in_planes, in_planes // ratio, 1, bias=False),
            nn.ReLU(),
            nn.Conv2d(in_planes // ratio, in_planes, 1, bias=False)
        )
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = self.fc(self.avg_pool(x))
        max_out = self.fc(self.max_pool(x))
        out = avg_out + max_out
        return self.sigmoid(out)

class SpatialAttention(nn.Module):
    """
    Spatial Attention Module
    """
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()
        assert kernel_size in (3, 7), 'kernel size must be 3 or 7'
        padding = 3 if kernel_size == 7 else 1
        self.conv = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)
        x = torch.cat([avg_out, max_out], dim=1)
        x = self.conv(x)
        return self.sigmoid(x)

class CBAMBlock(nn.Module):
    """
    CBAM (Convolutional Block Attention Module) Block
    """
    def __init__(self, in_planes, ratio=16):
        super(CBAMBlock, self).__init__()
        self.ca = ChannelAttention(in_planes, ratio)
        self.sa = SpatialAttention()

    def forward(self, x):
        x = x * self.ca(x)
        x = x * self.sa(x)
        return x



### Optimization of Loss Funciton - CIOU Loss Funtion

In [None]:
# Define CIoU Loss Function
def compute_ciou_loss(pred_boxes, target_boxes):
    """
    Compute CIoU (Complete Intersection over Union) Loss
    """
    # Extract coordinates for ease of calculation
    x1, y1, w1, h1 = pred_boxes[..., 0], pred_boxes[..., 1], pred_boxes[..., 2], pred_boxes[..., 3]
    x2, y2, w2, h2 = target_boxes[..., 0], target_boxes[..., 1], target_boxes[..., 2], target_boxes[..., 3]

    # Compute intersection and union areas
    x_left = torch.max(x1 - w1 / 2, x2 - w2 / 2)
    y_top = torch.max(y1 - h1 / 2, y2 - h2 / 2)
    x_right = torch.min(x1 + w1 / 2, x2 + w2 / 2)
    y_bottom = torch.min(y1 + h1 / 2, y2 + h2 / 2)

    intersection = torch.clamp((x_right - x_left), min=0) * torch.clamp((y_bottom - y_top), min=0)
    union = w1 * h1 + w2 * h2 - intersection

    # Compute IoU
    iou = intersection / union

    # Compute center distance
    center_distance = ((x2 - x1) ** 2 + (y2 - y1) ** 2)

    # Compute enclosed area
    enclosed_area = torch.min(w1, w2) * torch.min(h1, h2)

    # Compute CIoU
    ciou = iou - (center_distance / (enclosed_area + 1e-7))

    # Compute loss
    ciou_loss = 1 - ciou

    return ciou_loss.mean()


### Dataset Loading from Roboflow

#### Citation of Dataset

```python

@misc{
    forge-eq4rh_dataset,
    title = { forge Dataset },
    type = { Open Source Dataset },
    author = { Pavan Kumar },
    howpublished = { \url{ https://universe.roboflow.com/pavan-kumar/forge-eq4rh } },
    url = { https://universe.roboflow.com/pavan-kumar/forge-eq4rh },
    journal = { Roboflow Universe },
    publisher = { Roboflow },
    year = { 2023 },
    month = { nov },
    note = { visited on 2024-02-29 },
    }

```

### Dataset Count

- **Total:** `7257`
- **Train set:** `5075 (70%)`
- **Valid set:** `1459 (20%)`
- **Test set:** `723 (10%)`

In [None]:
# Roboflow Dataset Download
rf = Roboflow(api_key="FKec27mGI5ejI1KBxPZI")
project = rf.workspace("pavan-kumar").project("forge-eq4rh")
dataset = project.version(1).download("yolov5")


### YAML Configuration

In [None]:
# Load YAML file
with open(dataset.location + "/data.yaml", 'r') as stream:
    num_classes = str(yaml.safe_load(stream)['nc'])



In [None]:
# Model Configuration
@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, 'w') as f:
        f.write(cell.format(**globals()))



### Customize the Configration of the Model

In [None]:
%%writetemplate /kaggle/working/yolov5/models/custom_yolov5s.yaml

# parameters
nc: {num_classes}  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple

# anchors
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 backbone with CBAM
backbone:
  # [from, number, module, args]
  [[-1, 1, Focus, [64, 3]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, BottleneckCSP, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 9, BottleneckCSP, [256]],
   #[-1, 1, CBAMBlock, [512]],  # CBAM added here
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, BottleneckCSP, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 1, SPP, [1024, [5, 9, 13]]],
   [-1, 3, BottleneckCSP, [1024, False]],  # 9
  ]

# YOLOv5 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, BottleneckCSP, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, BottleneckCSP, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, BottleneckCSP, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, BottleneckCSP, [1024, False]],  # 23 (P5/32-large)

   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]


### Model Training

In [None]:
# Train YOLOv5s on custom data for 200 epochs
# Time its performance
%time
%cd /kaggle/working/yolov5
!python train.py --img 416 --batch 32 --epochs 50 --data {dataset.location}/data.yaml --cfg ./models/custom_yolov5s.yaml --weights 'yolov5s.pt' --name yolov5s_results  --cache


- `Epoch`: This indicates the current epoch of training.
- `GPU_mem`: Represents the GPU memory usage during training.
- `box_loss`, `obj_loss`, `cls_loss`: These represent the losses associated with bounding box coordinates, objectness prediction, and class prediction, respectively.
- `Instances`: Refers to the number of instances detected.
- `Size`: Indicates the input size of the images being processed during training.
- `Class`, `Images`, `Instances`, `P`, `R`, `mAP50`: These metrics provide evaluation results on the validation set.
    - `Class`: Indicates the class being evaluated.
    - `Images`: The number of images evaluated.
    - `Instances`: Total instances of the class.
    - `P`: Precision for the class.
    - `R`: Recall for the class.
    - `mAP50`: Mean Average Precision for the class at IoU (Intersection over Union) threshold of 0.5.

### Detection

In [None]:
# Detection using YOLOv5s on custom test set
%cd /kaggle/working/yolov5/
!python detect.py --weights runs/train/yolov5s_results/weights/best.pt --img 416 --conf 0.4 --source /kaggle/working/yolov5/forge-1/test/images

### Evaluation

In [None]:
# Validate YOLOv5s on custom data validation set
%cd /kaggle/working/yolov5
%time  # Measure time taken
!python val.py --weights runs/train/yolov5s_results/weights/best.pt \
               --data {dataset.location}/data.yaml \
               --img 640 \
               --half


### Visualization of Training Data

In [None]:
# Plot results.txt as results.png
Image(filename='/kaggle/working/yolov5/runs/train/yolov5s_results/results.png', width=1000)  # View results.png

In [None]:
# Print out an augmented training example
Image(filename='/kaggle/working/yolov5/runs/train/yolov5s_results/train_batch0.jpg', width=900)


In [None]:
# Print out an augmented training example
Image(filename='/kaggle/working/yolov5/runs/train/yolov5s_results/train_batch2.jpg', width=900)


In [None]:
Image(filename='/kaggle/working/yolov5/runs/train/yolov5s_results/val_batch0_labels.jpg', width=900)

### Metrics Plots

#### Precision (P):

Precision measures the accuracy of the positive predictions made by the model. It is calculated as the ratio of true positive predictions to the total number of positive predictions made by the model. In object detection, precision indicates how many of the detected objects are relevant (true positives) out of all the objects detected by the model.

In [None]:
# Print out P Curve - Precision

Image(filename='/kaggle/working/yolov5/runs/train/yolov5s_results/P_curve.png', width=900)


#### Recall (R):

Recall measures the ability of the model to correctly detect all relevant instances of objects. It is calculated as the ratio of true positive predictions to the total number of actual positive instances present in the dataset. In object detection, recall indicates how many of the actual objects were detected by the model.

In [None]:
# Print out R Curve - Recall

Image(filename='/kaggle/working/yolov5/runs/train/yolov5s_results/R_curve.png', width=900)


#### Precision-Recall (PR) Curve:

A Precision-Recall curve is a graphical representation of the trade-off between precision and recall for different threshold values used in the object detection model. It is created by plotting precision on the y-axis against recall on the x-axis for various threshold values. The curve helps in understanding how the model's performance varies as the threshold for detection changes.

In [None]:
# Print out PR Curve - Precision Recall

Image(filename='/kaggle/working/yolov5/runs/train/yolov5s_results/PR_curve.png', width=900)


#### F1 Score:

The F1 Score is the harmonic mean of precision and recall. It provides a single metric that balances both precision and recall.

F1 Score ranges from 0 to 1, where a higher value indicates better performance, with 1 being the best possible score.

In [None]:
# Print out F1 score

Image(filename='/kaggle/working/yolov5/runs/train/yolov5s_results/F1_curve.png', width=900)


#### Confusion Matrix

A confusion matrix is a table that is often used to evaluate the performance of a classification model. It summarizes the predictions made by the model on a test dataset in comparison to the actual ground truth labels. However, in the context of object detection tasks such as those performed by YOLOv5, the confusion matrix concept is not as straightforward as in traditional classification tasks where each class is mutually exclusive.

Instead, in object detection tasks, a confusion matrix is typically adapted to account for the bounding boxes and the objects they represent. It may include metrics like:

- **True Positives (TP)**: The model correctly predicts the presence of an object in an image.
- **False Positives (FP)**: The model predicts the presence of an object when there is none.
- **False Negatives (FN)**: The model fails to predict the presence of an object when it is actually present.
- **True Negatives (TN)**: Not usually applicable in object detection tasks.

However, since object detection tasks involve not just binary classification but also localization, the calculation of these metrics can be more complex. The bounding boxes need to be evaluated in terms of their location, size, and overlap with ground truth boxes.

A confusion matrix curve, if it exists, would likely plot the changes in these metrics (TP, FP, FN) as the threshold for detection varies. This could help in understanding how the model's performance changes with different confidence thresholds for object detection.

In [None]:
# Print out confusion matrix
Image(filename='/kaggle/working/yolov5/runs/train/yolov5s_results/confusion_matrix.png', width=900)
