# **AI VIETNAM - Course 2023: Project 1 - Object Dection with YOLOv8**
<hr>

- **Author**: Uyen Nguyen
- **Date**: May 2023
- **Course**: AI Vietnam - Course 2023
- **Module**: Basic Python Introduction

<hr>

## **I. Introduction**

Object Detection is a problem in the field of Computer Vision. Our task is to build a program that returns the coordinates of the bounding box and the class name of objects in the picture that we want to detect.

<figure style="text-align: center;">
  <img src="../images/1.input-output.png" alt="Input/Output of Object Dectection" width="700"/>
  <figcaption>Image 1: Input/Output of Object Dectection for cats and dogs</figcaption>
</figure>

In this project, we will learn about and practice implementing a Python program for Human Detection in an image using an algorithm called YOLOv8.

## **II. Project Implementation**

### **1. Download YOLOv8 Source Code from GitHub**
To use YOLOv8, we need to download the source code to our current code environment. The source code of YOLOv8 is publicly available in the [YOLOv8 GitHub repository](https://github.com/ultralytics/ultralytics). We will clone the code to our notebook as follows:

In [1]:
!git clone https://github.com/ultralytics/ultralytics

Cloning into 'ultralytics'...
remote: Enumerating objects: 25393, done.[K
remote: Counting objects: 100% (13/13), done.[K
remote: Compressing objects: 100% (12/12), done.[K
remote: Total 25393 (delta 2), reused 8 (delta 1), pack-reused 25380[K
Receiving objects: 100% (25393/25393), 14.86 MiB | 10.10 MiB/s, done.
Resolving deltas: 100% (17996/17996), done.


Also, since the YOLOv8 source code is built using various Python libraries, we will need to download the required libraries before running the algorithm. YOLOv8 supports the ultralytics library, which we can install using the pip command as follows:

In [2]:
%cd ultralytics
!pip install ultralytics
import ultralytics

ultralytics.checks()

Ultralytics YOLOv8.2.15 🚀 Python-3.11.4 torch-2.3.0 CPU (Apple M2)
Setup complete ✅ (8 CPUs, 8.0 GB RAM, 224.1/460.4 GB disk)


### **2. Download Pretrained Model**
We can download the pretrained model file from YOLOv8 from [here](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s.pt) using the curl command as shown below. We can also use wget command or download directly from the GitHub and place it into the corresponding folder.

In [None]:
!curl -L -o ./ultralytics/yolo8s.pt https://github.com/ultralytics/assets/releases/download/v0.0.0/yolo8s.pt

### **3. Prepare the Data**
To train YOLOv8 on any dataset, we need two main components:
1. **Dataset Directory:** We will need to prepare a directory containing the dataset (this directory should be inside the YOLOv8 directory) with the following structure:
<figure style="text-align: center;">
  <img src="../images/2.folder-organization.png" alt="Folder Organization in YOLOv8 model" width="400"/>
  <figcaption>Image 2: Dataset Folder Organization in YOLOv8</figcaption>
</figure>

The structure includes:
- **images directory**: Contains image files (.jpg, .png, etc.).
- **labels directory**: Contains .txt files corresponding to each image (with the same filenames as in the images directory).
- **data.yaml file**: A configuration file that specifies the details of the dataset for YOLOv8 training.

For this project, we will use a human dataset that could be downloaded [here](https://drive.usercontent.google.com/download?id=1--0QuKMwj31K-CSvD8oq5fceFweiFPuN&authuser=0) manually or we can use the gdown and unzip commands as follows:


In [16]:
import os
os.makedirs('./data', exist_ok=True)

In [18]:
# We might need to down gdown before hand using pip command
# !pip install gdown
!gdown "https://drive.google.com/u/0/uc?id=1--0QuKMwj31K-CSvD8oq5fceFweiFPuN&export=download" -O ../data/

Downloading...
From (original): https://drive.google.com/u/0/uc?id=1--0QuKMwj31K-CSvD8oq5fceFweiFPuN&export=download
From (redirected): https://drive.google.com/uc?id=1--0QuKMwj31K-CSvD8oq5fceFweiFPuN&export=download&confirm=t&uuid=e217c429-d84c-46af-ae49-0c1d8311e550
To: /Users/amourtu1934/Documents/Personal Projects/object-detection-yolov8/data/human_detection_dataset.zip
100%|██████████████████████████████████████| 2.67G/2.67G [04:26<00:00, 10.0MB/s]
unzip:  cannot find or open /content/ultralytics/human_detection_dataset.zip, /content/ultralytics/human_detection_dataset.zip.zip or /content/ultralytics/human_detection_dataset.zip.ZIP.


In [21]:
!unzip ../data/human_detection_dataset.zip -d ../data/

Archive:  ../data/human_detection_dataset.zip
   creating: ../data/human_detection_dataset/
  inflating: ../data/human_detection_dataset/data.yaml  
   creating: ../data/human_detection_dataset/train/
   creating: ../data/human_detection_dataset/train/images/
  inflating: ../data/human_detection_dataset/train/images/frame318008.00.00-08.05.00.jpg  
  inflating: ../data/human_detection_dataset/train/images/frame6006.25.00-06.30.00.jpg  
  inflating: ../data/human_detection_dataset/train/images/frame348006.55.00-07.00.00.jpg  
  inflating: ../data/human_detection_dataset/train/images/frame42007.25.00-07.30.00.jpg  
  inflating: ../data/human_detection_dataset/train/images/frame480006.45.00-06.50.00.jpg  
  inflating: ../data/human_detection_dataset/train/images/frame174008.00.00-08.05.00.jpg  
  inflating: ../data/human_detection_dataset/train/images/frame126007.05.00-07.10.00.jpg  
  inflating: ../data/human_detection_dataset/train/images/frame126008.05.00-08.10.00.jpg  
  inflating: ..

2. **File .yaml**: We also need to prepare a .yaml file containing information about the dataset mentioned above in the data directory. Here are some sample .yaml files provided by the YOLOv8 authors:
<figure style="text-align: center;">
  <img src="../images/3.yaml-file.png" alt="Information for .yaml file" width="700"/>
  <figcaption>Image 3: Important information field in .yaml file</figcaption>
</figure>

For the human dataset, we will need to create a new .yaml file called *data.yaml* and fill in the correspoding fields that the dataset directory has already provided this for use. We can create this file manually or use the code snippet below to generate it automatically. Note that the .yaml file needs to be placed in the `human_dectection_dataset` directory.


In [22]:
import yaml

dataset_info = {
    "train" : "./train/images",
    "val"   : "./val/images",
    "nc"    : 1,
    "names" : ["Human"]
}

with open("../data/human_detection_dataset/data.yaml", "w+") as f:
    doc = yaml.dump(dataset_info, f, default_flow_style=None, sort_keys=False)

Each dataset has different content and information, so we will need to make adjustments to match the dataset you are using (number of classes, names of the classes, paths to the data directories, etc.).


### **4. Model Training**
After completing the above preparation steps, we will proceed with training the model using the prepared dataset. Execute the following command (for different datasets, you will need to change the corresponding .yaml file name):

In [1]:
# For run time sake, this training command has been sized down to 2 epochs only. We can change to different number of epochs, with the standard of being 20 epochs for higher accuracy
!yolo train model=../ultralytics/yolov8s.pt data=../data/human_detection_dataset/data.yaml epochs=2 imgsz=640

Ultralytics YOLOv8.2.15 🚀 Python-3.11.4 torch-2.3.0 CPU (Apple M2)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=../ultralytics/yolov8s.pt, data=../data/human_detection_dataset/data.yaml, epochs=2, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train2, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True

After the training finished, if we check the `../ultralytics/runs` directory, we will see a new file `./detect/train` appear. This is the output file of YOLOv8.

### **5. Perform Prediction with the Trained Model:** 
To use the trained model on any image, we can use the code snippet below:
```
# With uploaded image
!yolo predict model=<weight_path > source=<image_path >
```
whereas:
* **<weight_path>:** The path to the model weight file after the training process is complete. Note, we need to choose the file `best.pt`
* **<image_path>:** The path to the input image file.

Here is the illustration of implementaing the code above:

In [3]:
# With uploaded image
!yolo predict model='../ultralytics/runs/detect/train/weights/best.pt' source='../images/5.test-predict.png'

Ultralytics YOLOv8.2.15 🚀 Python-3.11.4 torch-2.3.0 CPU (Apple M2)
Model summary (fused): 168 layers, 11125971 parameters, 0 gradients, 28.4 GFLOPs

image 1/1 /Users/amourtu1934/Documents/Personal Projects/object-detection-yolov8/code/../images/5.test-predict.png: 416x640 9 Humans, 138.0ms
Speed: 4.1ms preprocess, 138.0ms inference, 539.1ms postprocess per image at shape (1, 3, 416, 640)
Results saved to [1m/Users/amourtu1934/Documents/Personal Projects/object-detection-yolov8/ultralytics/runs/detect/predict2[0m
💡 Learn more at https://docs.ultralytics.com/modes/predict


Here is the result of the code snippet above:
<figure style="text-align: center;">
  <img src="../ultralytics/runs/detect/predict/5.test-predict.png" alt="Object Detection Result I" width="700"/>
  <figcaption>Image 4: Object Detection result with uploaded image</figcaption>
</figure>

As we can see from the image, bounding boxes with high confidence score have correctly located human in pictures. However, there are still a lot of bounding boxes with low confidence score shown in the picture which does not have any human in those. Also, the algorithm did miss a blurring human in the background. Thus, we can enhance the model by setting higher confidence score threshold and running the algorithm through more epoches to increase learning process.


We can also run our model with an image online as follows:

In [5]:
# With online image
# https://c.files.bbci.co.uk/1260/production/_108240740_beatles-abbeyroad-index-reuters-applecorps.jpg
!yolo predict model='../ultralytics/runs/detect/train/weights/best.pt' source='https://c.files.bbci.co.uk/1260/production/_108240740_beatles-abbeyroad-index-reuters-applecorps.jpg'

Ultralytics YOLOv8.2.15 🚀 Python-3.11.4 torch-2.3.0 CPU (Apple M2)
Model summary (fused): 168 layers, 11125971 parameters, 0 gradients, 28.4 GFLOPs

Downloading https://c.files.bbci.co.uk/1260/production/_108240740_beatles-abbeyroad-index-reuters-applecorps.jpg to '_108240740_beatles-abbeyroad-index-reuters-applecorps.jpg'...
100%|████████████████████████████████████████| 702k/702k [00:00<00:00, 4.10MB/s]
image 1/1 /Users/amourtu1934/Documents/Personal Projects/object-detection-yolov8/code/_108240740_beatles-abbeyroad-index-reuters-applecorps.jpg: 384x640 10 Humans, 157.9ms
Speed: 4.5ms preprocess, 157.9ms inference, 655.3ms postprocess per image at shape (1, 3, 384, 640)
Results saved to [1m/Users/amourtu1934/Documents/Personal Projects/object-detection-yolov8/ultralytics/runs/detect/predict[0m
💡 Learn more at https://docs.ultralytics.com/modes/predict


<figure style="text-align: center;">
  <img src="../ultralytics/runs/detect/predict/_108240740_beatles-abbeyroad-index-reuters-applecorps.jpg" alt="Object Detection Result II" width="700"/>
  <figcaption>Image 5: Object Detection result with online image</figcaption>
</figure>

The model has correctly identified all four members of The Beatles in the picture. However, as mentioned above, there are still null bounding boxes that we can eliminate by setting higher confidence threshold for the model.

Besides image sources, we can also input sources with different parameters, representing various input data.
<figure style="text-align: center;">
  <img src="../images/4.input-support.png" alt="Input Data Types Supported" width="700"/>
  <figcaption>Image 6: Summary of the input data types supported by YOLOv8 for prediction.</figcaption>
</figure>

Now, we will try to run the model on a YouTube video to see the result.

In [9]:
# With youtube video
!yolo predict model='../ultralytics/runs/detect/train/weights/best.pt' source='https://youtu.be/MsXdUtlDVhk'

Ultralytics YOLOv8.2.15 🚀 Python-3.11.4 torch-2.3.0 CPU (Apple M2)
Model summary (fused): 168 layers, 11125971 parameters, 0 gradients, 28.4 GFLOPs

1/1: https://youtu.be/MsXdUtlDVhk... Success ✅ (5744 frames of shape 1920x1080 at 30.00 FPS)

0: 384x640 (no detections), 229.2ms
0: 384x640 3 Humans, 320.0ms
0: 384x640 4 Humans, 211.2ms
0: 384x640 3 Humans, 235.4ms
0: 384x640 4 Humans, 431.1ms
0: 384x640 4 Humans, 372.9ms
0: 384x640 4 Humans, 329.1ms
0: 384x640 4 Humans, 217.9ms
0: 384x640 3 Humans, 228.8ms
0: 384x640 4 Humans, 314.8ms
0: 384x640 3 Humans, 316.3ms
0: 384x640 5 Humans, 349.5ms
0: 384x640 4 Humans, 300.0ms
0: 384x640 4 Humans, 305.4ms
0: 384x640 4 Humans, 242.8ms
0: 384x640 5 Humans, 129.7ms
0: 384x640 4 Humans, 262.3ms
0: 384x640 4 Humans, 283.8ms
0: 384x640 4 Humans, 305.8ms
0: 384x640 2 Humans, 283.9ms
0: 384x640 3 Humans, 234.5ms
0: 384x640 2 Humans, 155.1ms
0: 384x640 3 Humans, 314.0ms
0: 384x640 2 Humans, 276.2ms
0: 384x640 3 Humans, 279.6ms
0: 384x640 3 Humans, 510.

The code snippet above outputs a fast 26-second video detecting humans in all scenes of the input music video. The result can be found at `../ultralytics/runs/detect/predict/MsXdUtlDVhk.mp4`. Generally speaking, the model performs moderately well, mainly detecting the main member in each frame but not the humans in the background.

## **III. Model Evaluation**

Before evaluating the model, we first need to note something about the parameters in the training command. The training command in subsection 4 above has a few default parameters, which we can custoize as desired. Different parameter values can result in different model performance. Here is the descriptions of a few of the parameters:
- **img:** The size of the training images. The training and testing images will be resized to the size that we specify, with a default of 640px.
- **batch**: During the training process, models can either read all the training data at once or read it in batches. With a default of 64, the training dataset will be divided into batches of 64 samples each. We can set different vlaues following the $2^n \, (n \geq 0)$ rule.
- **epochs**: The number of times the entire dataset is passed through during training.
- **data**: The dataset information (.yaml file) we want to use for training.
- **weights**: The pretrained model file used. We can download and use different pretrained model files from this [list](https://docs.ultralytics.com/tasks/detect/#models).

Since the performance of a model can vary with different parameter values, to find the best model quantitatively, we can run the following command:

In [13]:
!yolo val model='../ultralytics/runs/detect/train/weights/best.pt' data='../data/human_detection_dataset/data.yaml'

Ultralytics YOLOv8.2.15 🚀 Python-3.11.4 torch-2.3.0 CPU (Apple M2)
Model summary (fused): 168 layers, 11125971 parameters, 0 gradients, 28.4 GFLOPs
[34m[1mval: [0mScanning /Users/amourtu1934/Documents/Personal Projects/object-detection-yo[0m
[34m[1mval: [0mNew cache created: /Users/amourtu1934/Documents/Personal Projects/object-detection-yolov8/data/human_detection_dataset/val/labels.cache
                 Class     Images  Instances      Box(P          R      mAP50  m
                   all       1642      13171      0.806      0.693      0.782      0.508
Speed: 1.3ms preprocess, 729.0ms inference, 0.0ms loss, 2.2ms postprocess per image
Results saved to [1m/Users/amourtu1934/Documents/Personal Projects/object-detection-yolov8/ultralytics/runs/detect/val2[0m
💡 Learn more at https://docs.ultralytics.com/modes/val


After running the code, we obtain some graphs regarding the performance of our model above, which can be found in `../ultralytics/runs/detect/val2`. There are some hightlights that worth-mentioning:
* **Confusion Matrix (Normalized)**
<figure style="text-align: center;">
  <img src="../ultralytics/runs/detect/val/confusion_matrix_normalized.png" alt="Confusion Matrix" width="500"/>
  <figcaption>Image 7: Normalized Confusion Matrix</figcaption>
</figure>

The normalized confusion matrix presents the proportions of correct and incorrect predictions, offering a clearer perspective on the model's performance relative to the number of instances in each class. The true positive rate (0.76) indicates that the model correctly identifies 76% of the actual human instances. The true negative rate (1.00) shows that the model perfectly identifies all background instances. However, the false positive rate (1.00) suggests that every background instance is incorrectly classified as human, and the false negative rate (0.24) indicates that 24% of human instances are misclassified as background.

* **F1-Confidence Curve**
<figure style="text-align: center;">
  <img src="../ultralytics/runs/detect/val/F1_curve.png" alt="F1 Curve" width="500"/>
  <figcaption>Image 8: F1-Confidence Curve</figcaption>
</figure>

The F1-Confidence curve illustrates how the F1 score, which balances precision and recall, varies with different confidence thresholds. The peak F1 score of 0.75 at a confidence level of 0.358 suggests that this is the optimal threshold for achieving the best trade-off between precision and recall. 

* **Precision-Confidence Curve**
<figure style="text-align: center;">
  <img src="../ultralytics/runs/detect/val/P_curve.png" alt="P Curve" width="500"/>
  <figcaption>Image 9: Precision-Confidence Curve</figcaption>
</figure>

The Precision-Confidence curve shows the relationship between precision and confidence thresholds. The curve reveals that at a confidence level of 0.962, the model achieves a maximum precision of 1.00, meaning that all predictions made at this threshold are correct. This insight is valuable for scenarios where precision is critical, and false positives need to be minimized. By understanding how precision changes with confidence, we can adjust the model's threshold to achieve the desired level of accuracy in specific applications.

### * **Note on Further Dataset and Model Expansion:**
Since YOLOv8 is supervised learning, the samples in the training dataset need to have corresponding labels for each sample. Therefore, to add more data or create a new dataset with new classes, we need to label the data. This labeling has to be done manually. If we want to further expand the dataset for higher accuracy, we can add more data with labeling using labelImg, which could be done via Anacoda.

## **IV. Conclusion**
In this notebook, we explored the process of setting up and using YOLOv8 for object detection. We bagan with an introduction to object detection, followed by downloading the YOLOv8 source code and installing the necessary libraries. We then prepared the dataset by organizing the directory structure and creating a `data.yaml` file. The training process was discussed, including important parameters such as image size, batch size, epochs, dataset paths, and pretrained weights. We also covered model evaluation to optimize performance. Lastly, we emphasized the importance of labeled data and introduced `labelImg` for manual data labeling. By following these steps, we can effectively utilize YOLOv8 for various object detection tasks. 