# Object Detection with YOLO

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Eagleshot/CustomYOLOModel/blob/main/yolo.ipynb)

Hello! In this tutorial, we will look at object detection with YOLO (You Only Look Once). YOLO is a state-of-the-art, real-time object detection algorithm, known for its speed and accuracy. It combines object classification and localization into a single neural network, making it highly efficient.

In this tutorial, we will cover the following topics:

* Introduction to object detection with YOLO.
* Using pre-trained models for object detection.
* Training a custom model on your own dataset.
* Outlook: Deploy your model on different hardware.

By the end of this tutorial, you will have an understanding of how to use YOLO and will be able to apply it to various object detection tasks.<br>
While we won't dive into the algorithm's technical details, you can learn more from the
[original paper](https://arxiv.org/pdf/1506.02640v5) if you are interested. The YOLO architecture was originally released in 2016 and has since been constantly improved and adapted, to work on different tasks, such as object detection, image segmentation, and pose estimation, making it very useful for different tasks.So let's dive in and get started! <br><br>

![banner](https://raw.githubusercontent.com/ultralytics/assets/main/im/banner-tasks.png)

## Installation
First, we need to install and import the required packages. Using the `%` sign in front of the command allows you to run shell commands in the jupyter notebook.

In [None]:
# Install Ultralytics YOLO
%pip install ultralytics
import ultralytics
from ultralytics import YOLO
ultralytics.checks()

import os
import cv2
import matplotlib.pyplot as plt
%matplotlib inline

Ultralytics is a company, that provides a easy to use python package for YOLO, that allows you to use pre-trained models, train your own models as well as deploy them. It is licensed under an open source [AGPL-3.0 license](https://www.ultralytics.com/license) which is perfect for research and educational purposes. However, if you need to use YOLO for a commercial project, you may want to consider using another implementation (e.g. [YoloV9 MIT](https://github.com/WongKinYiu/YOLO)) or purchase a license from Ultralytics.

As we need a graphics card to run YOLO at a reasonable speed, please make sure that the GPU is detected. Otherwise, you may need to change the runtime type in Google Colab. Make sure the output above is similar to the one below.

```python
Ultralytics YOLOv8.2.91 🚀 Python-3.10.12 torch-2.4.0+cu121 CUDA:0 (Tesla T4, 15102MiB)
Setup complete ✅ (2 CPUs, 12.7 GB RAM, 32.6/112.6 GB disk)
```

Before we start with detecting objects, let's quickly look at how YOLO works in python, as it always follows the same pattern:


### 1. Load the model
First of all, you need to load a model to use it for object detection. You can decide which model you want to use, depending on the task you want to perform. You can find a list of available pre-trained models [here](https://docs.ultralytics.com/models/) and [here](https://docs.ultralytics.com/tasks/). These models have been trained on a large dataset of 80 different objects and are accurate enough for many use cases. However, as we will look at later, you can also train or fine-tune your own model and use it the same way.

![yolo-models](https://raw.githubusercontent.com/ultralytics/assets/main/yolov8/yolo-comparison-plots.png)


As you can see in the graph above, the YOLO archtecture is constantly evolving and there are different model generations (e.g. YOLOV8, YOLOV10), that try to improve the detection accuracy and efficiency. The Y-Axis shows the mAP (mean average precision) of the model using a standardized dataset (COCO). This is a measure of the detection accuracy. The X-Axis shows the model size and inference time, which is a measure of the speed of the model. There are different sizes of each model (e.g. nano, small, medium, balanced, large, xlarge). The larger the model, the more accurate it is, while also being slower and requiring more memory. Furthermore, there also is a diminishing return with larger models, so they cannot be infinitely scaled. Depending on your use case and hardware, you need to choose the right model and find the balance between speed and accuracy.

```python
model = YOLO('yolov8n.pt')
```

To import a pre-trained model, you can use the code above and replace the model name with the one you want to use. The model will be downloaded automatically, if it is not already available in the cache. You can also specify a path to a local model (e.g. when using your own model).


### 2. Inference
Inference is the process of using the model to make predictions on new data. You can do this with a wide variety of [input sources](https://docs.ultralytics.com/modes/predict/#inference-sources) like images, videos, live streams, that are supported out of the box. Most popular [image and video formats](https://docs.ultralytics.com/modes/predict/#image-and-video-formats) are supported. Using [inference arguments](https://docs.ultralytics.com/modes/predict/#inference-arguments), you can customize the inference process (e.g. which classes to detect, confidence threshold, etc.) and the visualization. The `predict()` call returns a list of Results objects.

```python
# You can predict on a single image by passing the path or URL to the image
results = model('image.jpg')

# You can also use batch processing by passing a list of images to the model (faster)
results = model(["im1.jpg", "im2.jpg"])

# You can do the same with videos or live streams
results = model('video.mp4')
```

### 3. Process the results
The ```predict()``` call returns a list of Results objects, that contain the detected objects. You can access the detected objects by iterating over the results and accessing the bounding boxes and other information. This is heavily dependent on the task you are trying to solve, so we will look at it later with several exercises. Below is a simple function to plot the detected objects on an image that you can use to visualize the results.

In [None]:
def plot_results(results):
    img = results[0].plot()
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Convert image to RGB
    plt.imshow(img_rgb)  # Display results
    plt.axis('off')  # Hide axes
    plt.show()

Technically, you can also use the command line interface (CLI) to run YOLO with the same functionality. This is useful, if you want to run YOLO from the terminal and don't have a python environment available. You can find more information about the command line interface [here](https://docs.ultralytics.com/usage/cli/). Below is an example, that also does the same as the python code above:

```bash
yolo detect predict model=yolov8n.pt source='https://ultralytics.com/images/bus.jpg'
```
<br>

## Image classification
![classification](https://user-images.githubusercontent.com/26833433/243418606-adf35c62-2e11-405d-84c6-b84e7d013804.png)
With [image classification](https://docs.ultralytics.com/tasks/classify/), you can classify an entire image into a set of predefined classes. The image classifier outputs a class label for an image and a confidence score for that class. This is useful, if you efficiently want to classify an image and don't need to know where in the image the object is located. Classification models have the suffix ```-cls``` in their name.

In [None]:
# Load a pre-trained model
model = YOLO("yolov8m-cls.pt")

# Predict on an image
results = model("https://ultralytics.com/images/bus.jpg")

# Display results (doesn't do a lot here)
plot_results(results)

*Please note that the pre-trained classification models were trained on a different dataset than the detection models, so they may not perform the same.*<br>

## Object detection
![detection](https://user-images.githubusercontent.com/26833433/243418624-5785cb93-74c9-4541-9179-d5c6782d491a.png)
With [object detection](https://docs.ultralytics.com/tasks/detect/), you can identify the class and location of objects in an image or video stream. The object detector outputs bounding boxes and class labels for each detected object as well as a confidence score for each detection. The confidence score indicates how confident the model is that the object is correctly detected. This is useful if you need to know where the object is located in the image.

**Exercise 1:** Modify the code below. Test different model sizes and generations on different images. Do you notice a difference in detection speed and accuracy?
<details>
    <summary>Click here to hide/unhide the answer!</summary>

You can test different models by changing the model name in the code below. You can also test different images by changing the image URL or upload an image to the storage. You can find a list of available models [here](https://docs.ultralytics.com/models/). Example:
```python
model = YOLO('yolov8n.pt')
model = YOLO('yolov8x.pt')
model = YOLO('yolov10n.pt')
```
</details>

**Exercise 2:**  Try to only detect humans with a confidence score of at least 70%.<br>

<details>
    <summary>Click here to hide/unhide the answer!</summary>
You can find the classes online or with the `model.names` attribute:

```python
# Load a pre-trained model
model = YOLO("yolov8n.pt")

# Predict on an image
results = model("https://ultralytics.com/images/zidane.jpg", classes=0, conf=0.7)
model.names

# Plot the results
plot_results(results)
```
</details>

**Exercise 3:** Output all the detected objects with their class name and confidence score. *Hint: They are stored in the bounding boxes in the `results[0].boxes` attribute.*

<details>
    <summary>Click here to hide/unhide the answer!</summary>
  
```python
for box in results[0].boxes:
    cls = box.cls # Class index
    class_label = model.names[int(cls)] # get the class label
    conf = box.conf.item() # Confidence
    print(f"Class: {class_label} - Confidence: {conf:.2f}")
```

</details>

In [None]:
# Load a pre-trained model
from ultralytics import YOLO

model = YOLO("yolov8n.pt")

# Predict on an image
results = model("https://ultralytics.com/images/zidane.jpg")

# Plot the results
plot_results(results)

**Parking spot exercise 🅿️:** Let's try to detect cars in an image and check if they are parked in a specific parking spot. The image below shows a parking spot with multiple cars parked.

**Exercise 1:** Print the number of detected cars in the image below.

<details>
    <summary>Click here to hide/unhide the answer!</summary>
  
```python
# Load a pre-trained model
model = YOLO("yolov8n.pt")

# Predict on an image
results = model("https://upload.wikimedia.org/wikipedia/commons/7/72/Bombala_-_backward_parking_cars.jpg", classes=2)
print(f"Detected {len(results[0])} cars.")

# Plot the results
plot_results(results)
```
</details>

**Exercise 2:** Get the bounding box coordinates and the width and height of the bounding boxes of the detected cars in the image below.

<details>
    <summary>Click here to hide/unhide the answer!</summary>
  
```python
for box in results[0].boxes:
    w = box.xywh[0][2].item() # width
    h = box.xywh[0][3].item() # height
    x = box.xywh[0][0].item() - w/2 # x-coordinate
    y = box.xywh[0][1].item() - h/2 # y-coordinate
    print(f"Box at ({x}, {y}) with width {w} and height {h}")
```
</details>

**Exercise 3:** Make a program that checks, if a car is on the parking spot below the sign on the right using the coordinates of the bounding box.

<details>
    <summary>Click here to hide/unhide the answer!</summary>
  
```python
from matplotlib.patches import Rectangle

# Load a pre-trained model
model = YOLO("yolov8n.pt")

# Predict on an image
results = model("https://upload.wikimedia.org/wikipedia/commons/7/72/Bombala_-_backward_parking_cars.jpg", classes=2)

def plot_box(x, y, w, h, img, title="", color='r'):
    # Define Matplotlib figure and axis
    _, ax = plt.subplots()

    # Display the image
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  # Convert image to RGB
    ax.set_title(title, color=color)
    ax.imshow(img_rgb)
    
    # Add rectangle to plot
    rect = Rectangle((x, y), w, h, linewidth=1, edgecolor=color, facecolor='none')
    ax.add_patch(rect)

    # Display plot
    plt.axis('off')  # Hide axes
    plt.show()

parking_spot = [1350, 600, 200, 175]

plot_box(*parking_spot, results[0].orig_img, title="Parking spot", color='b')

for box in results[0].boxes:
    w = box.xywh[0][2].item() # width
    h = box.xywh[0][3].item() # height
    x = box.xywh[0][0].item() - w/2 # x-coordinate
    y = box.xywh[0][1].item() - h/2 # y-coordinate

    img = results[0].orig_img # Get the original image

    # Check if the two boxes intersect
    if (x < parking_spot[0] + parking_spot[2] and x + w > parking_spot[0] and y < parking_spot[1] + parking_spot[3] and y + h > parking_spot[1]):
        plot_box(x, y, w, h, img, title="Car is inside the parking spot.", color='g')
    else:
        plot_box(x, y, w, h, img, title="Car is not inside the parking spot.", color='r')
```
</details>

In [None]:
# Load a pre-trained model
model = YOLO("yolov8n.pt")

# Predict on an image
results = model("https://upload.wikimedia.org/wikipedia/commons/7/72/Bombala_-_backward_parking_cars.jpg")

# Plot the results
plot_results(results)

## Instance segmentation
![segmentation](https://user-images.githubusercontent.com/26833433/243418644-7df320b8-098d-47f1-85c5-26604d761286.png)
With [instance segmentation](https://docs.ultralytics.com/tasks/segment/#predict), you can identify the class and location of an object in an image or video stream and segment it from the background. This is useful if you need to know the exact position and shape of the object. Segmentation models have the suffix `-seg` in their name.

**Exercise 1:** Do instance segmentation on an image in the code cell below.

<details>
    <summary>Click here to hide/unhide the answer!</summary>
  
```python
# Load a pre-trained model
model = YOLO("yolov8n-seg.pt")

# Predict on an image
results = model("https://ultralytics.com/images/zidane.jpg")

# Plot the results
plot_results(results)
```
</details>

**Exercise 2:** Get and plot the binary segmentation masks of the detected objects in the image below.

<details>
    <summary>Click here to hide/unhide the answer!</summary>
  
```python
import numpy as np

# Get binary segmentation masks
for mask in results[0].masks.xy:
    binary_mask = np.zeros(results[0].orig_shape, dtype=np.uint8)

    contour = mask.astype(np.int32)
    contour = contour.reshape(-1, 1, 2)

    binary_mask = cv2.drawContours(binary_mask, [contour], -1, (255, 255, 255), cv2.FILLED)

    # Plot binary mask
    fig, ax = plt.subplots(1)
    ax.imshow(binary_mask, cmap='gray')
    ax.axis('off')
    plt.show()
```
</details>

**Exercise 3:** Compare the speed of classification, detection and segmentation using the "same" model architecture on the same image. Also try using the CPU instead of the GPU for inference (can be configured in the model call).

<details>
    <summary>Click here to hide/unhide the answer!</summary>
  
```python
# Load a pre-trained model
model = YOLO("yolov8n-seg.pt")

# Predict on an image
results = model("https://ultralytics.com/images/zidane.jpg", device='cpu')

# Plot the results
plot_results(results)
```
</details>

In [None]:
# Insert your code here ✏️

## Object tracking
![tracking](https://github.com/ultralytics/docs/releases/download/0/multi-object-tracking-examples.avif)
With [object tracking](https://docs.ultralytics.com/modes/track/), you can track the location of an object in a video or video stream over time. The object tracker outputs bounding boxes for each frame as well as a unique ID for each object. You can process the output similar to the object detection results. This is useful if you need to know the trajectory of an object or want to count objects over time. Since the tracker is built on top of the detection model, you can use the same models for tracking as well as detection (bounding boxes, segmentation, pose estimation).

**Exercise:** Track objects in a video in the code cell below. Save the video to a file (showing a video in Google Colab is difficult).

<details>
    <summary>Click here to hide/unhide the answer!</summary>
    
```python
# Load a pre-trained model
model = YOLO("yolov8n.pt")

results = model.track("https://www.youtube.com/watch?v=CftLBPI1Ga4", stream_buffer=True, save=True)  # Tracking with default tracker
```
</details>


In [None]:
# Insert your code here ✏️

## Pose estimation
![pose estimation](https://github.com/ultralytics/docs/releases/download/0/pose-estimation-examples.avif)
With [pose estimation](https://docs.ultralytics.com/tasks/pose/), you can identify the pose of a person in an image or video stream. The pose estimator outputs key points for each person detected as well as a confidence score for each key point. This is useful if you need to know the position of a person's body parts (e.g. eyes, shoulders, hips).

**Exercise:** Estimate the pose of a person in an image in the code cell below.

<details>
    <summary>Click here to hide/unhide the answer!</summary>
    
```python
# Load a model
model = YOLO("yolov8n-pose.pt")

# Predict with the model
results = model("https://img.freepik.com/free-photo/group-people-performing-stretching-exercise_1170-116.jpg")

# Display the results
plot_results(results)
```
</details>

In [None]:
# Insert your code here ✏️

## Custom Object Detection 🏔️⛷️
Now that you have mastered the basics of object detection with YOLO, it's time to train your own model! You have been tasked by a ski resort in the Grisons mountain to develop a system that can detect and count the number of skiers on the ski slopes in real-time. Because of this, we need to train our custom YOLO model.<br>

### Transfer learning
While you could train a model from scratch it is often much more efficient to use an existing model and fine-tune it on your own dataset. This process, where a model trained on one task is re-purposed on a second related task, is called transfer learning. Since the model has already learned to detect and classify objects in images, it can be relatively easily be adapteded to detect new objects, that it hasn't seen before. Transfer learning is widely used, because it uses much less training data, trains much faster and often achieves better performance than training a model from scratch.<br>

### Dataset
In order to train our custom YOLO model, we need a dataset of labeled images with the objects we want to classify. This is what we call supervised learning. There are many datasets with thousands of images available online, that are already labeled and can be used or expanded. Some of them are generic, while others are application specific (e.g. medical imaging, aerial imaging, self-driving cars etc.). Here are two examples of popular datasets:
* https://cocodataset.org/#explore
* https://storage.googleapis.com/openimages/web/index.html

The COCO dataset for example, that was used to train the original YOLO model contains images of 80 popular objects. Take a minute to look at the website and see how the images are labelled. For this exercise however, we want to create our own dataset from scratch. Creating your own dataset can be a very time consuming process. Fortunately, there are a lot of images of skiers available online, that we can use for our model.

### Synthetic data
![Unreal GT](https://unrealgt.github.io/images/image_overview.png)

Sometimes it can be difficult or tedious, to get enough labeled images for training your model. This can have several reasons:

* Real data can be hard to collect (e.g. a defect in a production line that only occurs very rarely)
* The amount of different objects to detect is very large
* The amount of objects will grow over time and the model needs to be retrained with new data
* ...

Furthermore, labelling can also be difficult. In some cases, not even experts can agree on the correct label (e.g. for classifying a defect). In these cases, it can help to generate synthetic data. This is data that is artificially created by a computer program (e.g. by rendering a scene). Once setup, synthetic data can be generated in large quantities with different lighting conditions, backgrounds, positioning, etc. which is perfect for training our models. Since the ground truth is known, the labels can be generated automatically. Therefore it can be very useful for training a model and - depending of the application - save you a lot of time. How you generate the data is highly dependent on the task you are trying to solve. You can several example of the usage of synthetic data [here](https://blogs.nvidia.com/blog/what-is-synthetic-data/). Depending on the task, you can also use relatively simple methods to create synthetic data, like in [this study on tomato detection](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5426829/).

As always with training object detection models, it's good to take an iterative approach and test the model with real world data early on and then improve the model with more data and fine-tuning.

## Training your custom model
For this exercise, we will be doing this the traditional way and labelling the images ourselves. I have prepared a small dataset of images of skiers for you to use. Execute the code below to download the dataset:

In [None]:
# Download training data
!wget https://github.com/Eagleshot/CustomYOLOModel/archive/refs/heads/main.zip --quiet
!unzip -q main.zip
!mv /content/CustomYOLOModel-main/ /content/datasets/ # Rename folder
PATH = "/content/datasets/"

Take a look at the dataset we downloaded. The data.yaml file describes the dataset and its location on the hard drive. Since we only have one class, we only need to specify the name of the class (skier).

```yaml
path: ./SkiDataset
train: ./train
val: ./val
test:

names:
  0: skier
```

The dataset is split into a training (70%) and a validation set (30%). The training set is used to train the model and adjust the weights of the model based on the images and corresponding labels. The validation set is used periodically during training to monitor the models performance on unseen data. This is important to prevent overfitting, where the model "memorizes" the training data and doesn't work well anymore on new data (generalization). The validation data itself is not used to train the model. The test data is only used, after the model has finished training, to evaluate the model on completely new data. As we only have a very small dataset, we will not use a test set in this exercise as it does not improve the model performance. We also have several background files (10%) that don't contain any skiers and therefore don't have any labels. They are not necessarily needed, but can help to improve the model performance and reduce false positives. As you may have seen, there are also several images of regular people in the dataset. Do you know why they are in the dataset? Keep them in the back of your mind for later.

Now we look at the images and labels in the dataset. For each image, there is a text file with the labels for the bounding box of the image. The labels are stored in the following format: `class x_center y_center width height`.

```txt
0 0.458187 0.537281 0.405458 0.425439
0 0.692593 0.161550 0.076998 0.130117
0 0.610721 0.155702 0.067251 0.165205
```
The coordinates are normalized to the range [0, 1] and are therefore independent of the image size. Each line in the label file corresponds to one object in the image. As we only have one class, all objects begin with 0.

![label](https://github.com/ultralytics/docs/releases/download/0/two-persons-tie.avif)

To label images yourself, you can use a tool to import the images, draw the bounding boxes and export the labels in the correct format. There are many different tools available - I recommend using [makesense.ai](https://www.makesense.ai/), which is free and open-source. There you can import the images, create the label(s) (in this case only one - skier), draw the bounding boxes and then export the labels (annotations) in the YOLO format using the actions button. Then you can simply import the labels into the correct folder in the dataset.

When creating or searching for images for your own dataset, there are a few things you need to consider: In order to train a robust model, you need a diverse dataset. This means that you want images from different angles, lighting conditions, distances, etc. This will make your model more robust and generalizable. If you know that your model will be used in a specific environment, you should also try to collect images from that environment.

**Exercise 1:** Now its your turn - get some more images of skiers and label them. Put them into the dataset folder and do a train/test split. Label at least 25 images. If you want, you can share the labeled images with me.

**Exercise 2:** Train the model using the code below. This will some time to run. While the model is training, look at the training folder in `/runs/detect/train`. A few files (labels.jpg, labels_correlogram.jpg, train_batch0.jpg etc.) have automatically been created there. Can you figure out what they tain_batch images are showing?

<details>
    <summary>Click here to hide/unhide the answer!</summary>

The train_batch images show the augmented images that are used to train the model. Data augmentation is a technique used to artificially increase the size of the training dataset by applying many different transformations to the images like changing its hue, saturation and value (brightness) of the image as well as rotating, scaling, flipping them and many more. This helps to improve the performance of the model, as the training data is used better. You can also use it to check if the labels are recognized correctly.
</details>

In [None]:
# Load a pretrained model for transfer learning
model = YOLO("yolov8n.pt")

# Train the model for 100 epochs
results = model.train(data = PATH + "data.yaml", epochs=100)

**Exercise 3:** After the model has finished training, look at the files that were generated. There are a few metrics, that are used to evaluate how successfull the training was. Look up what they mean. You will look at them more in-depth later this semester.

<details>
    <summary>Click here to hide/unhide the answer!</summary>

* Intersection over Union (IoU): IoU is a measure that quantifies the overlap between a predicted bounding box and a ground truth bounding box. It plays a fundamental role in evaluating the accuracy of object localization.

* Precision and Recall (P and R): Precision quantifies the proportion of true positives among all positive predictions, assessing the model's capability to avoid false positives. On the other hand, Recall calculates the proportion of true positives among all actual positives, measuring the model's ability to detect all instances of a class.

* Average Precision (AP): AP computes the area under the precision-recall curve, providing a single value that encapsulates the model's precision and recall performance.

* Mean Average Precision (mAP): mAP extends the concept of AP by calculating the average AP values across multiple object classes. This is useful in multi-class object detection scenarios to provide a comprehensive evaluation of the model's performance. Since we only have one class, it is identical to the AP in this example.

* F1 Score: The F1 Score is the harmonic mean of precision and recall, providing a balanced assessment of a model's performance while considering both false positives and false negatives.

See: https://docs.ultralytics.com/guides/yolo-performance-metrics/#object-detection-metrics

</details>

## Testing the model
Now you can import your newly trained model in order to test it. You can use it just like any other YOLO model. The model is saved in the `runs/train/trainX/weights/` directory. Make sure to replace `trainX` with the last training run you did or use the function below to get the latest model:

In [None]:
def getLastModelPath():
    # Get the path of the trained model
    path = f"{os.getcwd()}/runs/detect/"
    folders = os.listdir(path)
    train_folders = [folder for folder in folders if "train" in folder]
    last_train_folder = sorted(train_folders)[-1]
    model_path = f"{path}{last_train_folder}/weights/"
    print(f"Last train folder: {model_path}")
    return model_path

model_path = getLastModelPath()

# Use newly trained model for inference
model = YOLO(f"{model_path}/best.pt")

# Test the model on an image
image_url = "https://www.graubuenden.ch/sites/graubuenden/files/styles/hero_xlarge_2x/public/2022-11/skifahren-davos-klosters-skigebiet-parsenn.jpg"
results = model(image_url)
plot_results(results)

As you can see, the new model should perform well on our test image and should be able to detect our skiers.🥳

**Exercise 1:** Do you know why the model performs well with only these few images? Would it work as well with the same amount of images if we trained it on something completely different (e.g. a paraglider or cyclist)? Why or why not? *Hint: Look at the training data of the original YOLO model.*
<details>
    <summary>Click here to hide/unhide the answer!</summary>
    
In the COCO dataset (https://cocodataset.org/#explore) that was used to train the YOLO model, there are already many images of people and skis, so it is not very difficult for the model to recognize skiers, even with relatively few training images. If you want to detect something completely different, that is not similar to objects in the COCO dataset, you might need a lot more training data, as the model has never seen this object before and has no idea what it looks like.
</details>

**Exercise 2:** Did you figure out why the unlabeled images of regular people were in the dataset? What would happen if we didn't include them in the dataset?
<details>
    <summary>Click here to hide/unhide the answer!</summary>

The images of people were included in the dataset to prevent the model from detecting all people as skiers. Since the original model is able to detect people and skis, without these images, the model would make the association that all people are skiers and vice versa.

</details>

**Exercise 3:** Can you test it on some challenging images (e.g. from this video https://www.youtube.com/watch?v=B5xckyNsWKw)? How does it perform? What are the limitations of the model and how could you improve it? If you want you can also use the model to track a skier in a short video.

<details>
    <summary>Click here to hide/unhide the answer!</summary>
The model performs well on images/videos that are similar to the training data, however if you test it on images/videos that are different from the training data, the performance is not very good. For example with a video from a ski race from the perspective of a drone/helicopter or with different lighting/weather, the skiers are not reliably recognized. In order to improve the model, it's always good to have a diverse dataset with images from different perspectives, lighting conditions, etc. This way the model can generalize better and perform well on unseen data. Large datasets (e.g. the COCO dataset used to train YOLO) can contain tens of thousands of images, which takes a lot of effort to create and label. So generally speaking, you have to test the model and see how it performs on your specific use case and then decide if you need more data to improve the model or not.

Sometimes you can also get creative, as can be seen with synthetic data. For example you could take the COCO dataset and combine the bounding boxes of people and skis to get a dataset of several thousand images of people skiing.
</details>

In [None]:
# Insert your code here ✏️

## Outlook: Deploying the model on hardware
After you have trained your model, you can now deploy it in the field. While training models is computationally very expensive, inference can be done with a lot less compute power. For example, YOLO can even run low-power devices like smartphones, if they have hardware acceleration. The YOLOv8s model runs at 80 FPS on a Raspberry Pi 5 with AI kit using only 6W of power. Without AI accelerator, the model can also run on the CPU, however performance is limited at around [1 FPS](https://docs.ultralytics.com/guides/raspberry-pi/#__tabbed_2_2), which can still be useful, depending on the application.


Depending on the requirements, you can use edge or cloud computing:

Edge computing means running models directly on a local device. This approach offers benefits like improved privacy (since data doesn’t leave the device), reduced latency (quicker response times), and lower ongoing costs. However, edge devices typically have higher power draw and larger upfront cost.

On the other hand, cloud computing refers to using remote servers for processing. Cloud services provide scalable power, ideal for more intensive or burst workloads, but they come with recurring costs, higher latency, a need for constant connectivity and can be a privacy concern. This can be useful, if you want very low cost- and power hardware and have the connectivity, to send data to the cloud for processing or want to use very large models (e.g. ChatGPT).

![onnx](https://www.aurigait.com/wp-content/uploads/2023/01/unnamed-1.png)

As you saw with the exercises, the hardware you use can have a significant impact on model speed. There are many different hardware accelerators for AI (GPU, TPU, NPU, FPGA, …) and unfortunately, a lot of software only works with specific hardware architecture. For example the Ultralytics YOLO library only works with NVIDIA and not AMD graphics cards out of the box. Because of this, there are open standards like ONNX (Open Neural Network Exchange), that can help with converting and makes running models on different hardware a lot easier. You can see the command to convert a model to ONNX below.

In [None]:
# Export model as ONNX
# https://docs.ultralytics.com/modes/export/#how-do-i-export-a-yolov8-model-to-onnx-format
onnx_model_path = model.export(format="onnx")

You can also use the https://netron.app/ tool to visualize the model architecture.

Using hardware accelerators can also sometimes lead to special challenges, for example when you want to do object detection on the GPU and then process the results on your CPU. Additionally, when running models on edge devices, optimization is key to improving performance and efficiency when deploying your model. Using techniques like pruning, half precision or batch processing can significantly speed up operations and are also hardware dependent.

**With this, you have completed the tutorial on object detection with YOLO. I hope you enjoyed the tutorial and learned something new. Feel free to give me feedback on how to improve this course! Good luck with your own projects!** 🚀
