# Thesis: Training an Adapter for Cruise

This notebook documents the workflow for training a YOLO-based adapter model tailored for cruise applications. The process includes dataset preparation, configuration file creation, model training, and result management.

## Install Required Libraries

In this step, we will install the necessary libraries for training and evaluation. This includes the `ultralytics` package, which provides the YOLO implementation used in this workflow.

In [None]:
%%bash
ls /kaggle/input/train-thesis
mkdir -p dataset
cp -r /kaggle/input/*/* dataset/

In [None]:
!pip install -q ultralytics

In [None]:
# Load the TensorBoard notebook extension
%load_ext tensorboard
%tensorboard --logdir models

In [None]:
from ultralytics import YOLO
import os
import cv2
import torch
import matplotlib.pyplot as plt
from pathlib import Path
import warnings
import shutil
import random
warnings.filterwarnings("ignore", category=RuntimeWarning)


## Copy dataset from dataset save with kaggle

## Create YAML Configuration for Training

This section describes how to automatically generate a `data.yaml` configuration file required for YOLO training. The script reads class names from `classes.txt`, sets up dataset paths, and writes the configuration in YAML format.

In [None]:
# Python function to automatically create data.yaml config file
# 1. Reads "classes.txt" file to get list of class names
# 2. Creates data dictionary with correct paths to folders, number of classes, and names of classes
# 3. Writes data in YAML format to data.yaml

import yaml
import os

def create_data_yaml(path_to_classes_txt, path_to_data_yaml):

  # Read class.txt to get class names
  if not os.path.exists(path_to_classes_txt):
    print(f'classes.txt file not found! Please create a classes.txt labelmap and move it to {path_to_classes_txt}')
    return
  with open(path_to_classes_txt, 'r') as f:
    classes = []
    for line in f.readlines():
      if len(line.strip()) == 0: continue
      classes.append(line.strip())
  number_of_classes = len(classes)

  # Create data dictionary
  data = {
      'path': 'dataset',
      'train': 'train/images',
      'val': 'valid/images',
      'test': 'test/images',
      'nc': number_of_classes,
      'names': classes
  }

  # Write data to YAML file
  with open(path_to_data_yaml, 'w') as f:
    yaml.dump(data, f, sort_keys=False)
  print(f'Created config file at {path_to_data_yaml}')

  return

# Define path to classes.txt and run function
path_to_classes_txt = 'dataset/classes.txt'
path_to_data_yaml = 'data.yaml'

create_data_yaml(path_to_classes_txt, path_to_data_yaml)

print('\nFile contents:\n')
!cat data.yaml

# Data Visualization

This section demonstrates how to visualize the training data with bounding boxes. The code will:

1. Load a random image from the training dataset
2. Read its corresponding label file
3. Draw bounding boxes and class labels on the image
4. Display the annotated image using matplotlib

This visualization helps verify that:
- Images are loading correctly
- Label files are properly formatted
- Bounding box coordinates are accurate
- Class IDs are valid

The visualization uses:
- OpenCV for image processing and drawing
- Matplotlib for display
- Green bounding boxes with class labels
- RGB color format for proper display

You can run the next cell to see a random training example with its annotations.


In [None]:
def load_classes(classes_path):
    """
    Load class names from classes.txt file.
    
    Args:
        classes_path (str): Path to the classes.txt file
    
    Returns:
        list: List of class names
    """
    if not os.path.exists(classes_path):
        print(f"Classes file {classes_path} not found!")
        return []
    
    with open(classes_path, 'r') as f:
        classes = [line.strip() for line in f.readlines()]
    return classes

In [None]:
def visualize_random_samples(data_path, num_samples=6):
    """
    Visualize random samples from the dataset with bounding boxes in a 2x3 subplot.

    Args:
        data_path (str): Path to the dataset directory
        num_samples (int): Number of random samples to visualize (default=6 for 2x3 grid)
    """
    # Load class names
    classes_path = os.path.join(data_path, 'classes.txt')
    class_names = load_classes(classes_path)
    if not class_names:
        print("No class names loaded, using default class IDs.")
    
    # Load the images and labels
    images_path = os.path.join(data_path, 'train', 'images')
    labels_path = os.path.join(data_path, 'train', 'labels')

    # Get list of image files
    image_files = [f for f in os.listdir(images_path) if f.endswith(('.jpg', '.jpeg', '.png'))]

    # Select random images
    selected_images = random.sample(image_files, min(num_samples, len(image_files)))

    # Create a 2x3 subplot grid
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    axes = axes.ravel()  # Flatten the 2x3 array for easier iteration

    for idx, random_image in enumerate(selected_images):
        image_path = os.path.join(images_path, random_image)
        label_path = os.path.join(labels_path, random_image.replace('.jpg', '.txt')
                                 .replace('.jpeg', '.txt').replace('.png', '.txt'))

        # Read and process the image
        img = cv2.imread(image_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

        # Read labels and draw boxes
        num_objects = 0
        if os.path.exists(label_path):
            with open(label_path, 'r') as f:
                lines = f.readlines()

            height, width = img.shape[:2]
            for line in lines:
                class_id, x_center, y_center, w, h = map(float, line.strip().split())
                class_id = int(class_id)
                
                # Get class name or fallback to class ID
                class_label = class_names[class_id] if class_id < len(class_names) else f'Class {class_id}'

                # Convert normalized coordinates to pixel coordinates
                x1 = int((x_center - w/2) * width)
                y1 = int((y_center - h/2) * height)
                x2 = int((x_center + w/2) * width)
                y2 = int((y_center + h/2) * height)

                # Draw rectangle
                cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)

                # Add class label
                cv2.putText(img, class_label, (x1, y1-10),
                           cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)
                num_objects += 1

        # Display image in subplot
        axes[idx].imshow(img)
        axes[idx].set_title(f'Image: {random_image}\nObjects: {num_objects}')
        axes[idx].axis('off')

        # Print image details
        print(f'Image {idx+1}:')
        print(f'Image shape: {img.shape}')
        print(f'Image path: {image_path}')
        print(f'Number of objects detected: {num_objects}\n')

    # Hide empty subplots if fewer images than num_samples
    for idx in range(len(selected_images), 6):
        axes[idx].axis('off')
        axes[idx].set_visible(False)

    # Adjust layout to prevent overlap
    plt.tight_layout()
    plt.show()

In [None]:
data_path = 'dataset'
visualize_random_samples(data_path, num_samples=6)

## Start YOLO model training

The model and training parameters are defined in the cell below.
Please run the next cell to begin training.

In [None]:
# Load pretrained model (better starting point than from scratch)
model = YOLO("yolo11n.pt")  # or "yolo11n.pt" for standard YOLOv11

In [None]:
# # Train the model with optimized parameters
model.train(
    data="data.yaml",       # Path to dataset config file
    epochs=250,             # Number of training epochs
    imgsz=640,              # Input image size (square: 640x640)
    batch=16,               # Batch size (adjust based on GPU memory)
    device=[0, 1],          # Use GPU 0 and GPU 1
    patience=50,            # Stop early if no improvement in 50 epochs
    project="models",
    optimizer="auto",       # Let YOLO choose the best optimizer
        lr0=0.005                # Initial learning rate
)


## Training Metrics Analysis

After training, we can analyze the model's performance metrics to understand its effectiveness. The following metrics are particularly important:

### Key Performance Indicators
- **mAP (mean Average Precision)**: Overall detection accuracy
- **Precision**: Ratio of true positives to all detections
- **Recall**: Ratio of true positives to all ground truth objects
- **F1-Score**: Harmonic mean of precision and recall

### Training Progress
- **Loss Curves**: Monitor training and validation loss
- **Learning Rate**: Track learning rate adjustments
- **Confusion Matrix**: Analyze detection errors

### Model Efficiency
- **Inference Speed**: Frames per second (FPS)
- **Model Size**: Memory footprint
- **FLOPs**: Computational complexity

The metrics will be visualized in the next cell to help evaluate the model's performance.


In [None]:
# Run validation and get detailed metrics
metrics = model.val()

# Extract and print key performance metrics
print("\n=== Model Performance Metrics ===")
print(f"mAP@0.5:        {metrics.box.map50:.4f}")
print(f"mAP@0.5:0.95:   {metrics.box.map:.4f}")
print(f"Precision (mp): {metrics.box.mp:.4f}")
print(f"Recall (mr):    {metrics.box.mr:.4f}")
print(f"F1-Score (avg): {sum(metrics.box.f1) / len(metrics.box.f1):.4f}")

# Print per-class metrics safely
print("\n=== Per-Class Metrics ===")
num_classes_with_metrics = len(metrics.box.p)
for i, cls_name in model.names.items():
    if i >= num_classes_with_metrics:
        print(f"{cls_name}: (no detection results)")
        continue
    precision = metrics.box.p[i]
    recall = metrics.box.r[i]
    f1_score = metrics.box.f1[i]
    print(f"{cls_name}:")
    print(f"  Precision: {precision:.4f}")
    print(f"  Recall:    {recall:.4f}")
    print(f"  F1-Score:  {f1_score:.4f}")

# Calculate and print inference speed
print("\n=== Inference Speed ===")
inference_time = metrics.speed['inference']  # milliseconds per image
print(f"Average inference time: {inference_time:.2f} ms")
print(f"FPS: {1000 / inference_time:.1f}")


## Copy Training Results to Save Server

This section demonstrates how to securely copy the `runs` directory containing training results to a remote save server. This ensures that your experiment outputs are backed up and accessible for further analysis or sharing.

In [None]:
!pip install -q gdown
!gdown 'https://drive.google.com/uc?id=1nQ0_w3uG8McFgPxt-kVS1RPW1aKpb2YS'
!chmod 400 /kaggle/working/gcp-key
# !ssh -i /kaggle/working/gcp-key -o StrictHostKeyChecking=no trung@34.142.148.134 "rm -rf /home/trung/runs"
!scp -i /kaggle/working/gcp-key -o StrictHostKeyChecking=no -r models trung@34.142.148.134:/home/trung/models
!echo "Done!"

## Test Results Analysis

### Test Images Directory Structure



In [None]:
from pathlib import Path
import cv2
import matplotlib.pyplot as plt
from ultralytics import YOLO

# Load model
model = YOLO('models/train/weights/best.pt')

# Define test images directory
test_dir = Path('dataset/test/images')
if not test_dir.exists():
    print(f"Test directory {test_dir} not found!")
    exit()

# Get all image files
image_files = []
for ext in ['*.jpg', '*.jpeg', '*.png']:
    image_files.extend(list(test_dir.glob(ext)))

# Select up to 6 images for visualization
n_images_to_show = min(6, len(image_files))
selected_images = image_files[:n_images_to_show]

# Create a 2x3 subplot grid
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

# Process each selected image
for idx, img_path in enumerate(selected_images):
    # Read and process image
    img = cv2.imread(str(img_path))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    # Run inference with batch=1
    results = model.predict(source=img, batch=1, verbose=False)
    result = results[0]

    # Draw boxes on the image
    annotated_img = result.plot()

    # Display image
    axes[idx].imshow(annotated_img)
    axes[idx].set_title(f'{img_path.name}\nDetections: {len(result.boxes)}')
    axes[idx].axis('off')

# Hide any unused subplots
for idx in range(len(selected_images), 6):
    axes[idx].axis('off')

plt.tight_layout()
plt.show()

# === Detection Statistics for all test images ===
print("\n=== Detection Statistics ===")
total_detections = 0
class_counts = {}

for img_path in image_files:
    results = model.predict(source=str(img_path), batch=1, verbose=False)
    result = results[0]

    for box in result.boxes:
        cls = int(box.cls[0].item())  # get class index
        cls_name = model.names.get(cls, f'class_{cls}')
        class_counts[cls_name] = class_counts.get(cls_name, 0) + 1
        total_detections += 1

print(f"Total images processed: {len(image_files)}")
print(f"Total detections: {total_detections}\n")

print("Detections per class:")
for cls_name, count in class_counts.items():
    print(f"  {cls_name}: {count}")


In [None]:
!rm -rf models.zip
!zip -r models.zip models