<a href="https://colab.research.google.com/github/tamrachan/ScribeSound/blob/main/train_YOLO_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Training YOLO model  
This notebook is based on [EJ Technology Consoltants's article](https://www.ejtech.io/learn/train-yolo-models) and EdgeElectronics's [GitHub repository](https://github.com/EdjeElectronics/Train-and-Deploy-YOLO-Models).
I have condensed everything for my own learning of how to train a YOLO model.


This notebook assumes that you already have a data.zip folder containing:  
- An `images` folder containing all the images
- A `labels` folder containing the labels in YOLO annotation format
- A `classes.txt` labelmap file that contains all the classes  

# 1. Upload Image Dataset  
First, upload your data.zip by clicking on the "Files" icon on the left menu panel and drag/drop your file.  
We will unzip `data.zip` by running the following code block:



In [None]:
# Unzip images to a 'custom data' folder
!unzip -q /content/data.zip -d /content/custom_data

We will be using Ultralytics to train the YOLO model so we need to install the Ultralytics Python library.

In [None]:
!pip install ultralytics

# 2. Splitting images into train and validation folders
Ultralytics requires a root folder, "data", with two main folders:
- `Train` - the images used to train the model
- `Validation` - the images used to check the model's performance at the end of each training epoch    

In each of these folders is an `images` and `labels` folder which hold the image and annotation files respectively.

In [None]:
# Get Python script by EdjeElectronics to split the data
!wget -O /content/train_val_split.py https://raw.githubusercontent.com/EdjeElectronics/Train-and-Deploy-YOLO-Models/refs/heads/main/utils/train_val_split.py

# Run Python script and create 'custom_data' folder
!python train_val_split.py --datapath="/content/custom_data" --train_pct=0.9

## 3. Create the Ultralytics training configuration YAML file
This file defines the model's classes and specifies the location of your trian and validation data.  
Ensure there is a `custom_data/classes.txt` file before generating a `data.yaml` configuration file, then run the following code:

In [None]:
# Python function to automatically create data.yaml config file
# 1. Reads "classes.txt" file to get list of class names
# 2. Creates data dictionary with correct paths to folders, number of classes, and names of classes
# 3. Writes data in YAML format to data.yaml

import yaml
import os

def create_data_yaml(path_to_classes_txt, path_to_data_yaml):

  # Read class.txt to get class names
  if not os.path.exists(path_to_classes_txt):
    print(f'classes.txt file not found! Please create a classes.txt labelmap and move it to {path_to_classes_txt}')
    return
  with open(path_to_classes_txt, 'r') as f:
    classes = []
    for line in f.readlines():
      if len(line.strip()) == 0: continue
      classes.append(line.strip())
  number_of_classes = len(classes)

  # Create data dictionary
  data = {
      'path': '/content/data',
      'train': 'train/images',
      'val': 'validation/images',
      'nc': number_of_classes,
      'names': classes
  }

  # Write data to YAML file
  with open(path_to_data_yaml, 'w') as f:
    yaml.dump(data, f, sort_keys=False)
  print(f'Created config file at {path_to_data_yaml}')

  return

# Define path to classes.txt and run function
path_to_classes_txt = '/content/custom_data/classes.txt'
path_to_data_yaml = '/content/data.yaml'

create_data_yaml(path_to_classes_txt, path_to_data_yaml)

print('\nFile contents:\n')
!cat /content/data.yaml

## 4. Train the model
Choose a YOLO model size, the number of epochs and the resolution (imgsz) parameter. <br>
<br>


**Model architecture & size (`model`)**  
There are several YOLO11 models sizes available to train, including `yolo11n.pt`, `yolo11s.pt`, `yolo11m.pt`, `yolo11l.pt`, and `yolo11xl.pt`. Larger models run slower but have higher accuracy, while smaller models run faster but have lower accuracy.
This notebook's default will use `yolo11s.pt` as it is a good starting point.

You can also train YOLOv8 or YOLOv5 models by substituting `yolo11` for `yolov8` or `yolov5`.
<br><br>
**Number of epochs (`epochs`)**

In machine learning, one “epoch” is one single pass through the full training dataset. Setting the number of epochs dictates how long the model will train for. The best amount of epochs to use depends on the size of the dataset and the model architecture. If your dataset has less than 200 images, a good starting point is 60 epochs. If your dataset has more than 200 images, a good starting point is 40 epochs.
<br><br>
**Resolution (`imgsz`)**

Resolution has a large impact on the speed and accuracy of the model: a lower resolution model will have higher speed but less accuracy. YOLO models are typically trained and inferenced at a 640x640 resolution. However, if you want your model to run faster or know you will be working with low-resolution images, try using a lower resolution like 480x480.
<br><br>
### 4.1 Run the training
Run the following code block to train your model. If you want to use a different model, number of epochs, or resolution, change `model`, `epochs`, or `imgsz`.

In [None]:
!yolo detect train data=/content/data.yaml model=yolo11s.pt epochs=60 imgsz=640

The training algorithm will parse the images in the training and validation directories and then start training the model. At the end of each training epoch, the program runs the model on the validation dataset and reports the resulting mAP, precision, and recall. As training continues, the mAP should generally increase with each epoch. Training will end once it goes through the number of epochs specified by `epochs`.

> **NOTE:** Make sure to allow training to run to completion, because an optimizer runs at the end of training that strips out unneeded layers from the model.

The best trained model weights will be saved in `content/runs/detect/train/weights/best.pt`. Additional information about training is saved in the `content/runs/detect/train` folder, including a `results.png` file that shows how loss, precision, recall, and mAP progressed over each epoch.

## 5. Test the model
The model has now been trained and it is time to test it!

In [None]:
!yolo detect predict model=runs/detect/train/weights/best.pt source=data/validation/images save=True

In [None]:
import glob
from IPython.display import Image, display
for image_path in glob.glob(f'/content/runs/detect/predict/*.jpg')[:10]:
  display(Image(filename=image_path, height=400))
  print('\n')

The model should draw a box around each object of interest in each test timage. If the results don't meet your expectations you can:
- Increase the number of epochs used for training
- Use a larger moidel size (e.g. `yolo11l.pt`)
- Double check your dataset to make sure there are no labeling errors or conflicting examples
- Add more images to the training set

## 6. Deploy and download the model

First, zip and download the trained model by running the code blocks below.

The code creates a folder named `my_model`, moves the model weights into it, and renames them from `best.pt` to `my_model.pt`. It also adds the training results in case you want to reference them later. It then zips the folder as `my_model.zip`.

In [None]:
# Create "my_model" folder to store model weights and train results
!mkdir /content/my_model
!cp /content/runs/detect/train/weights/best.pt /content/my_model/my_model.pt
!cp -r /content/runs/detect/train /content/my_model

# Zip into "my_model.zip"
%cd my_model
!zip /content/my_model.zip my_model.pt
!zip -r /content/my_model.zip train
%cd /content

In [None]:
# This takes forever for some reason, you can also just download the model from the sidebar
from google.colab import files

files.download('/content/my_model.zip')

## 6. Conclusion
Congratulations! You have successfully trained and deployed a custom YOLO Object Detection Model!