<a href="https://colab.research.google.com/github/vijju-rasala/Bits-Capstone/blob/main/Capstone_Project_Helmet_Violation_Detection_Group_latest2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Helmet Violation Detection from Indian CCTV Video

**Problem statement:**
Detect and flag two-wheeler helmet violations (helmetless riding) from traffic camera frames in Indian cities in real-time.

**Description:**
Create a computer vision system using YOLOv8 and object tracking to detect two-wheeler riders and classify helmet usage. Optionally perform license plate OCR for enforcement.

---

## 1. Setup and Data Loading

This section covers mounting Google Drive to access the dataset and extracting the data from the zip file.

### Mount Google Drive

Mount your Google Drive to access files stored there persistently. This is necessary because files uploaded directly to Colab's temporary storage (`/content/sample_data/`) are deleted when the runtime ends.

In [None]:
from google.colab import drive

# Mount Google Drive at /content/drive
# This will prompt you to authorize Colab to access your Google Drive files.
drive.mount('/content/drive')

# Note: Ensure your Google Drive path does not contain spaces if mounting to a custom directory.
# Mounting to the default '/content/drive' is generally recommended.

Mounted at /content/drive


### Extract Dataset from Zip File

Extract the contents of the dataset zip file stored in your Google Drive to a designated folder within your Drive. This ensures the unzipped data is also persistent.

In [None]:
import zipfile
import os

# Define the path to your zip file in Google Drive
# Ensure this path correctly points to where you uploaded archive2.zip in your Drive.
zip_file_path = '/content/drive/My Drive/capstone_helmet_detection/archive2.zip'

# Define the directory within Google Drive where the contents will be extracted
# This folder will be created if it doesn't exist.
extraction_path = '/content/drive/My Drive/capstone_helmet_detection/unzipped_archive'

# Create the extraction directory if it doesn't exist
os.makedirs(extraction_path, exist_ok=True)

# Extract the zip file
try:
    with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
        # Extract all contents to the specified extraction_path
        zip_ref.extractall(extraction_path)
    print(f"Successfully extracted {zip_file_path} to {extraction_path}")
except FileNotFoundError:
    print(f"Error: Zip file not found at {zip_file_path}")
    print("Please ensure the zip file is in the correct location in your Google Drive.")
except zipfile.BadZipFile:
    print(f"Error: The file at {zip_file_path} is not a valid zip file.")
except Exception as e:
    print(f"An error occurred during extraction: {e}")

# Define the base directory for the extracted data based on the likely internal structure of the zip
# If the zip extracts into a subfolder like 'archive' within unzipped_archive, adjust this path.
# Based on previous successful runs, the path seems to be unzipped_archive/archive/train and unzipped_archive/archive/val
extracted_data_base_dir = os.path.join(extraction_path, 'archive')

Successfully extracted /content/drive/My Drive/capstone_helmet_detection/archive2.zip to /content/drive/My Drive/capstone_helmet_detection/unzipped_archive


## 2. Data Exploration and Analysis

This section explores the dataset's characteristics, including file counts, image sizes, class distribution, and bounding box quality.

### Analyze Class Distribution

Analyze the distribution of object classes within the dataset's annotations. This helps identify class imbalance, which might require specific handling during model training.

In [None]:
import os

annotations_dir = os.path.join(extracted_data_base_dir, 'train', 'labels')

# Define the directory containing the annotation files
# Uses the annotations_dir defined previously
# annotations_dir = os.path.join(extracted_data_base_dir, 'train', 'labels')

# Dictionary to store class counts
class_counts = {}

# Iterate through each annotation file
if os.path.exists(annotations_dir):
    for filename in os.listdir(annotations_dir):
        # Process only .txt files (assuming YOLO format annotations)
        if filename.lower().endswith('.txt'):
            annotation_path = os.path.join(annotations_dir, filename)
            try:
                # Read each line in the annotation file
                with open(annotation_path, 'r') as f:
                    for line in f:
                        try:
                            # The first value in each line is the class_id (YOLO format)
                            class_id = int(line.split()[0])
                            # Increment the count for this class_id
                            if class_id in class_counts:
                                class_counts[class_id] += 1
                            else:
                                class_counts[class_id] = 1
                        except (ValueError, IndexError):
                            # Handle lines that might not be in the expected format
                            print(f"Skipping malformed line in {filename}: {line.strip()}")
            except Exception as e:
                print(f"Could not read annotation file {filename}: {e}")
else:
    print(f"Error: Annotations directory not found at {annotations_dir}")


# Print the class distribution
print("Class distribution across all annotations:")
# Sort by class ID for consistent output
for class_id in sorted(class_counts.keys()):
    print(f"Class {class_id}: {class_counts[class_id]}")

# Store class counts for later use if needed
# class_counts_dict = class_counts

Class distribution across all annotations:
Class 0: 1327
Class 1: 791
Class 2: 1787
Class 3: 293
Class 4: 1750


### Define Class Names

Based on manual inspection or dataset documentation, define the mapping of class IDs to human-readable names. This is crucial for interpreting results and configuring YOLOv8.

In [None]:
# Define the class names based on the mapping you provided:
# 0 → Vehicle number
# 1 → Rider without helmet
# 2 → Rider with valid helmet
# 3 → Rider with invalid helmet
# 4 → Motorbike with a person

class_names = ['Vehicle number', 'Rider without helmet', 'Rider with valid helmet', 'Rider with invalid helmet', 'Motorbike with a person']
num_classes = len(class_names) # Number of classes

print("Defined class names:")
for i, name in enumerate(class_names):
    print(f"Class {i}: {name}")

# Store class names and number of classes for later use
# class_names_list = class_names
# num_classes_int = num_classes

Defined class names:
Class 0: Vehicle number
Class 1: Rider without helmet
Class 2: Rider with valid helmet
Class 3: Rider with invalid helmet
Class 4: Motorbike with a person


---

## 3. Data Augmentation

This section focuses on applying data augmentation techniques to the dataset to increase its size and variability, which can help improve model performance, especially for detecting small objects and addressing class imbalance.

### Install Augmentation Library

Install a suitable data augmentation library for object detection (e.g., Albumentations) and its dependencies.

In [None]:
# Install Albumentations and opencv-python-headless
# opencv-python-headless is used by Albumentations for image processing
%pip install albumentations opencv-python-headless

# Note: You might need to restart the runtime after installation if prompted.



_____________________________________________________________________


Defined data.yaml path: /content/drive/MyDrive/capstone_helmet_detection/unzipped_archive/archive/data.yaml


In [None]:
from ultralytics import YOLO

# Load a YOLOv8 model. Choose a smaller model like 'yolov8n.pt' (nano) or 'yolov8s.pt' (small)
# for faster training and lower resource usage compared to 'yolov8m.pt' (medium) or larger models.
# 'yolov8s.pt' offers a good balance between speed and performance.
model = YOLO('yolov8n.pt') # Using the small version of YOLOv8

print(f"YOLO model loaded: {model.model.yaml['nc']} classes, {model.model.yaml['ch']} channels")

# Assuming data_yaml_path_final is defined from a previous cell pointing to your data.yaml
# You can optionally specify project and name for the training run to organize results
results = model.train(data=data_yaml_path_final, epochs=10, imgsz=416, batch=16, project='helmet_detection_training', name='yolov8s_augmented_data') # Reduced epochs for faster testing

YOLO model loaded: 80 classes, 3 channels
Ultralytics 8.3.203 🚀 Python-3.12.11 torch-2.8.0+cu126 CUDA:0 (Tesla T4, 15095MiB)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=/content/drive/MyDrive/capstone_helmet_detection/unzipped_archive/archive/data.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=416, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=yolov8s_augmented_data4, nbs

In [None]:
%pip install ultralytics



# Task -- 06/10 - 10 AM
Train the YOLOv8 model without augmentation using the configuration file "/content/drive/MyDrive/capstone_helmet_detection/unzipped_archive/archive/data.yaml" and evaluate its performance. Then, train the model with default augmentation using the same configuration and evaluate its performance. Finally, collect and compare the results from both training runs.

## Train with defualt yolov8 augmentation

### Subtask:
Train the YOLOv8 model with the specified parameters but with `augment=true` to enable all augmentations.


In [None]:
# Assuming data_yaml_path_final is defined from a previous cell pointing to your data.yaml
# Assuming class_names is defined from a previous cell
# The YOLO model is already loaded in the previous step

# Train the model with specified parameters and default augmentation enabled
results_with_aug = model.train(
    data=data_yaml_path_final,  # Path to your data.yaml configuration file
    epochs=10,                  # Number of training epochs
    imgsz=640,                  # Image size for training
    batch=16,                   # Batch size
    augment=True,               # Enable default data augmentation
    project='helmet_detection_training', # Project name to group runs
    name='yolov8n_with_augmentation'      # Name for this specific run
)

print("Training with augmentation completed.")

Ultralytics 8.3.203 🚀 Python-3.12.11 torch-2.8.0+cu126 CUDA:0 (Tesla T4, 15095MiB)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, augment=True, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=/content/drive/MyDrive/capstone_helmet_detection/unzipped_archive/archive/data.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=416, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=yolov8n_with_augmentation, nbs=64, nms=False, opset=None, optimize=Fals

### Evaluate model with default augmentation

**Subtask:** Evaluate the performance of the model trained with default augmentations using `model.val()` to get detailed metrics, including per-class results.

In [None]:
from ultralytics import YOLO
import os

# Define the path to the best weights of the model trained with default augmentation
# This path is based on the 'project' and 'name' parameters used during training in cell 5a24a196
# Ensure this path matches where your weights were saved
weights_path_with_aug = '/content/helmet_detection_training/yolov8n_with_augmentation/weights/best.pt'

# Check if the weights file exists
if os.path.exists(weights_path_with_aug):
    # Load the trained model
    model_with_aug = YOLO(weights_path_with_aug)
    print(f"Model trained with default augmentation loaded successfully from {weights_path_with_aug}")
else:
    print(f"Error: Trained weights not found at {weights_path_with_aug}")
    print("Please verify the path to the saved weights.")
    model_with_aug = None # Set to None if loading fails

Model trained with default augmentation loaded successfully from /content/helmet_detection_training/yolov8n_with_augmentation/weights/best.pt


### Classifing the images from the model

In [None]:
# TODO - modify the code according to the output executed earlier.


# from ultralytics import YOLO
# import cv2

# model = YOLO('runs/detect/helmet_exp/weights/best.pt')

# img_path = 'test.jpg'
# results = model.predict(img_path, imgsz=640, conf=0.25, iou=0.45)

# # results is a list; results[0].boxes contains x1,y1,x2,y2,score,class
# res = results[0]
# for box in res.boxes:
#     x1, y1, x2, y2 = box.xyxy[0].cpu().numpy().astype(int)
#     cls = int(box.cls[0].cpu().numpy())
#     conf = float(box.conf[0].cpu().numpy())
#     label = model.names[cls]
#     # draw
#     img = cv2.imread(img_path)
#     cv2.rectangle(img, (x1,y1),(x2,y2),(0,255,0),2)
#     cv2.putText(img, f"{label} {conf:.2f}", (x1,y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,255,0),2)
# cv2.imwrite('out.jpg', img)


## Choosing Augmentations Based on Data Variations

When selecting data augmentation techniques for this helmet violation detection project, it's important to consider the variations expected in real-world Indian CCTV traffic footage. Augmentations can help the model become more robust to these variations.

Here are some common variations and relevant Albumentations transforms to simulate them:

1.  **Varying Lighting Conditions:**
    *   **What you might see:** Bright sunlight, shadows, dusk/dawn, headlights at night, glare.
    *   **Relevant Augmentations:**
        *   `A.RandomBrightnessContrast`: Adjusts the overall brightness and contrast.
        *   `A.CLAHE`: Improves contrast in different regions.
        *   `A.RandomGamma`: Changes the intensity of colors.
        *   `A.HueSaturationValue`: Alters color tones, saturation, and brightness.

2.  **Different Angles and Perspectives:**
    *   **What you might see:** Riders and vehicles viewed from different angles (side, front, back, slightly elevated or lowered camera).
    *   **Relevant Augmentations:**
        *   `A.ShiftScaleRotate`: Applies random shifts, scaling, and rotations.
        *   `A.Perspective`: Applies a random perspective transformation.

3.  **Blurriness:**
    *   **What you might see:** Motion blur, focus issues, camera shake.
    *   **Relevant Augmentations:**
        *   `A.Blur`, `A.MedianBlur`, `A.MotionBlur`.

4.  **Occlusions and Partial Visibility:**
    *   **What you might see:** Parts of the rider or vehicle being blocked.
    *   **Relevant Augmentations:**
        *   `A.CoarseDropout`: Randomly drops rectangular regions.

5.  **Noise:**
    *   **What you might see:** Electrical noise, compression artifacts.
    *   **Relevant Augmentations:**
        *   `A.GaussNoise`.

By choosing augmentations that mimic these real-world conditions, we can help the model generalize better to unseen data. You can experiment with different combinations and parameters of these augmentations to find what works best for your specific dataset and target performance.

### Define custom augmentation pipeline

**Subtask:** Create a custom data augmentation pipeline using Albumentations with specific hyperparameters.

In [None]:
from ultralytics import YOLO

# Load a fresh YOLOv8 model for this training run
# Using 'yolov8n.pt' as before
model_with_custom_aug_blur = YOLO('yolov8n.pt')

# Assuming data_yaml_path_final is defined from a previous cell pointing to your data.yaml
# Assuming custom_augmentation_pipeline_blur is defined from cell 88a7490b

print("Starting training with custom augmentation focusing on blur...")

# Train the model with specified parameters and the custom augmentation pipeline
# With Albumentations installed and augment=True, YOLOv8 will automatically use the defined pipeline
results_train_custom_aug_blur = model_with_custom_aug_blur.train(
    data=data_yaml_path_final,        # Path to your data.yaml configuration file
    epochs=10,                        # Number of training epochs (can adjust as needed)
    imgsz=640,                        # Image size for training
    batch=16,                         # Batch size
    augment=True,                     # Enable augmentation (will use the custom pipeline if defined)
    project='helmet_detection_training', # Project name to group runs
    name='yolov8n_custom_augmentation_blur' # Name for this specific run
)

print("Training with custom augmentation focusing on blur completed.")

Starting training with custom augmentation focusing on blur...
Ultralytics 8.3.203 🚀 Python-3.12.11 torch-2.8.0+cu126 CUDA:0 (Tesla T4, 15095MiB)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, augment=True, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=/content/drive/MyDrive/capstone_helmet_detection/unzipped_archive/archive/data.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=416, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=yolov8n_

In [None]:
# Assuming the model trained with custom augmentation focusing on blur is available as 'model_with_custom_aug_blur'
# and the data.yaml path is available as 'data_yaml_path_final'

print("Evaluating model trained with custom augmentation focusing on blur...")

# Run validation on the trained model
results_val_custom_aug_blur = model_with_custom_aug_blur.val(
    data=data_yaml_path_final,  # Path to your data.yaml configuration file
    imgsz=640,                  # Image size for validation (should match training size)
    batch=16,                   # Batch size for validation
)

print("Evaluation with custom augmentation focusing on blur completed.")

# Store the validation results for comparison
# results_val_custom_aug_blur_metrics = results_val_custom_aug_blur.results_dict # This gives overall metrics
# To get per-class metrics, we might need to access other attributes or parse the output

Evaluating model trained with custom augmentation focusing on blur...
Ultralytics 8.3.203 🚀 Python-3.12.11 torch-2.8.0+cu126 CUDA:0 (Tesla T4, 15095MiB)
Model summary (fused): 72 layers, 3,006,623 parameters, 0 gradients, 8.1 GFLOPs
[34m[1mval: [0mFast image access ✅ (ping: 0.5±0.2 ms, read: 40.7±10.2 MB/s, size: 68.3 KB)
[K[34m[1mval: [0mScanning /content/drive/MyDrive/capstone_helmet_detection/unzipped_archive/archive/valid/labels.cache... 142 images, 0 backgrounds, 0 corrupt: 100% ━━━━━━━━━━━━ 142/142 189.2Kit/s 0.0s
[K                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 9/9 2.7it/s 3.4s
                   all        142       1064      0.754      0.492      0.552      0.252
                     0        117        258      0.625      0.492      0.521      0.186
                     1         59        179      0.712      0.427      0.561      0.218
                     2        114        274      0.658      0.664      

### Train with custom augmentation (Focusing on Blur)

**Subtask:** Train the YOLOv8 model using the custom augmentation pipeline that focuses on blur transformations.

In [None]:
import albumentations as A

# Define a custom augmentation pipeline with fewer transforms
# You can add, remove, or modify transforms and their parameters based on your experimentation
custom_augmentation_pipeline = A.Compose([
    A.RandomBrightnessContrast(p=0.3),  # Adjust brightness and contrast
    A.ShiftScaleRotate(shift_limit=0.06, scale_limit=0.06, rotate_limit=20, p=0.3), # Apply affine transformations
    A.HorizontalFlip(p=0.5), # Flip horizontally
    A.Blur(blur_limit=5, p=0.1), # Apply a bit of blur
], bbox_params=A.BboxParams(format='yolo', label_fields=['class_labels'])) # Important: Specify bbox_params for object detection

# Define a dummy list of class labels to use with the pipeline (required by BboxParams)
# This list should correspond to the class_ids in your annotations
# Assuming class_names is defined and contains your class names in order of class_id
# For Albumentations, we just need a list of dummy labels with the same length as the number of bboxes
# We will generate these dummy labels when applying the augmentation to images and annotations
# This dummy list is not directly used by the pipeline but is required by BboxParams setup
dummy_class_labels = [0] * len(class_names) # Create a list of dummy labels (e.g., all zeros)


print("Custom augmentation pipeline refined with fewer transforms.")
print("Current pipeline transforms:")
for transform in custom_augmentation_pipeline.transforms:
    print(f"- {transform.__class__.__name__} with parameters: {transform.get_params()}")

Custom augmentation pipeline refined with fewer transforms.
Current pipeline transforms:
- RandomBrightnessContrast with parameters: {}
- ShiftScaleRotate with parameters: {}
- HorizontalFlip with parameters: {}
- Blur with parameters: {'kernel': 5}


  original_init(self, **validated_kwargs)


### Train with custom augmentation

**Subtask:** Train the YOLOv8 model using the defined custom augmentation pipeline.

### **Custom** Augmentation Pipeline: Focusing on Blur

Based on the observation that the dataset might contain a significant number of blurry images, this custom augmentation pipeline focuses on applying various blur transformations to improve the model's robustness to blur.

*(Add your specific reasoning here for choosing these blur transforms and their parameters.)*

For this experiment, I am specifically tuning the following blur-related augmentations:

*   **`A.Blur`:** *(Explain parameters and reasoning, e.g., "Using a higher `blur_limit` to simulate more significant blur.")*
*   **`A.MedianBlur`:** *(Explanation)*
*   **`A.MotionBlur`:** *(Explanation)*
*   *(Add other relevant blur transforms if used)*

I will be comparing the performance of the model trained with this blur-focused pipeline against the previous runs to see if specifically addressing blur improves detection accuracy, especially for objects that might be easily affected by blur.

In [None]:
import albumentations as A

# Define a custom augmentation pipeline focusing on blur
# Adjust parameters and add/remove transforms based on your experimentation
custom_augmentation_pipeline_blur = A.Compose([
    A.Blur(blur_limit=7, p=0.5),  # Apply Blur with higher probability and intensity
    A.MedianBlur(blur_limit=7, p=0.5), # Apply MedianBlur
    A.MotionBlur(blur_limit=(3, 7), p=0.5), # Apply MotionBlur
    A.RandomBrightnessContrast(p=0.2), # Keep some basic transforms
    A.HorizontalFlip(p=0.5),
], bbox_params=A.BboxParams(format='yolo', label_fields=['class_labels'])) # Important: Specify bbox_params for object detection

# Define a dummy list of class labels (required by BboxParams)
dummy_class_labels_blur = [0] * len(class_names) # Assuming class_names is defined

print("Custom augmentation pipeline focusing on blur defined.")
print("Current blur-focused pipeline transforms:")
for transform in custom_augmentation_pipeline_blur.transforms:
    print(f"- {transform.__class__.__name__} with parameters: {transform.get_params()}")

Custom augmentation pipeline focusing on blur defined.
Current blur-focused pipeline transforms:
- Blur with parameters: {'kernel': 7}
- MedianBlur with parameters: {'kernel': 3}
- MotionBlur with parameters: {'kernel': array([[          0,           0,           0,           0,           0,           0,           0],
       [          0,           0,           0,           0,           0,         0.2,         0.2],
       [          0,           0,         0.2,         0.2,         0.2,           0,           0],
       [          0,           0,           0,           0,           0,           0,           0],
       [          0,           0,           0,           0,           0,           0,           0],
       [          0,           0,           0,           0,           0,           0,           0],
       [          0,           0,           0,           0,           0,           0,           0]], dtype=float32)}
- RandomBrightnessContrast with parameters: {}
- HorizontalFlip

In [None]:
from ultralytics import YOLO

# Load a fresh YOLOv8 model for this training run
# Using 'yolov8n.pt' as before
model_with_custom_aug = YOLO('yolov8n.pt')

# Assuming data_yaml_path_final is defined from a previous cell pointing to your data.yaml
# Assuming custom_augmentation_pipeline is defined from cell 92f2c85f

print("Starting training with custom augmentation...")

# Train the model with specified parameters and the custom augmentation pipeline
# With Albumentations installed and augment=True, YOLOv8 will automatically use the defined pipeline
results_train_custom_aug = model_with_custom_aug.train(
    data=data_yaml_path_final,        # Path to your data.yaml configuration file
    epochs=10,                        # Number of training epochs (can adjust as needed)
    imgsz=416,                        # Image size for training
    batch=16,                         # Batch size
    augment=True,                     # Enable augmentation (will use the custom pipeline if defined)
    project='helmet_detection_training', # Project name to group runs
    name='yolov8n_custom_augmentation'    # Name for this specific run
)

print("Training with custom augmentation completed.")

Starting training with custom augmentation...
Ultralytics 8.3.203 🚀 Python-3.12.11 torch-2.8.0+cu126 CUDA:0 (Tesla T4, 15095MiB)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, augment=True, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=/content/drive/MyDrive/capstone_helmet_detection/unzipped_archive/archive/data.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=416, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=yolov8n_custom_augmentati

### Evaluate model with custom augmentation

**Subtask:** Run `model.val()` on the model trained with custom augmentations to get detailed metrics.

In [None]:
# Assuming the model trained with custom augmentation is available as 'model_with_custom_aug'
# and the data.yaml path is available as 'data_yaml_path_final'

print("Evaluating model trained with custom augmentation...")

# Run validation on the trained model
results_val_custom_aug = model_with_custom_aug.val(
    data=data_yaml_path_final,  # Path to your data.yaml configuration file
    imgsz=416,                  # Image size for validation (should match training size)
    batch=16,                   # Batch size for validation
)

print("Evaluation with custom augmentation completed.")

# Store the validation results for comparison
# results_val_custom_aug_metrics = results_val_custom_aug.results_dict # This gives overall metrics
# To get per-class metrics, we might need to access other attributes or parse the output

Evaluating model trained with custom augmentation...
Ultralytics 8.3.203 🚀 Python-3.12.11 torch-2.8.0+cu126 CUDA:0 (Tesla T4, 15095MiB)
Model summary (fused): 72 layers, 3,006,623 parameters, 0 gradients, 8.1 GFLOPs
[34m[1mval: [0mFast image access ✅ (ping: 0.7±0.4 ms, read: 30.1±12.5 MB/s, size: 68.3 KB)
[K[34m[1mval: [0mScanning /content/drive/MyDrive/capstone_helmet_detection/unzipped_archive/archive/valid/labels.cache... 142 images, 0 backgrounds, 0 corrupt: 100% ━━━━━━━━━━━━ 142/142 184.6Kit/s 0.0s
[K                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 9/9 2.6it/s 3.5s
                   all        142       1064      0.754      0.492      0.552      0.252
                     0        117        258      0.625      0.492      0.521      0.186
                     1         59        179      0.712      0.427      0.561      0.218
                     2        114        274      0.658      0.664      0.687      0.311


### Experimenting with CutMix Augmentation

Adding CutMix augmentation can help the model learn to combine information from different images and potentially improve robustness to partial visibility and object localization.

### Train with Custom Augmentation (Attempting CutMix)

**Subtask:** Train the YOLOv8 model using the custom augmentation pipeline that includes an attempt at CutMix.

In [None]:
import albumentations as A
import random
import numpy as np
import cv2

# Define a custom augmentation pipeline including CutMix
# Note: CutMix requires specific handling of bounding boxes and labels.
# Albumentations' implementation is designed to work with object detection formats.

# Define a dummy list of class labels (required by BboxParams) - assuming class_names is defined
dummy_class_labels = [0] * len(class_names)


# Custom CutMix transform to handle object detection
# Albumentations' A.CutMix is designed for classification, but we can use A.Mosaic/A.Mixup or a custom approach for detection
# For object detection, techniques like Mosaic and Mixup (which are sometimes used interchangeably with CutMix concepts)
# are more commonly applied *before* the Albumentations pipeline within the data loader.
# However, Albumentations does have experimental transforms like A.Mixup or A.Mosaic,
# or you can build a custom transform.

# Let's demonstrate a simple approach using A.Mixup which is similar in principle and supported by Albumentations for detection
# Note: A.Mixup is typically applied to batches, not individual images.
# Integrating Mixup/Mosaic correctly into a YOLOv8 data loader requires modifying the data loading pipeline,
# which is beyond a simple Albumentations Compose.

# A more practical approach for Albumentations Compose for detection is to use transforms that modify individual images and their bboxes:
custom_augmentation_pipeline_with_cutmix_concept = A.Compose([
    A.RandomBrightnessContrast(p=0.2),
    A.ShiftScaleRotate(shift_limit=0.05, scale_limit=0.05, rotate_limit=15, p=0.2),
    A.HorizontalFlip(p=0.5),
    A.Blur(blur_limit=7, p=0.5),  # Apply Blur with higher probability and intensity
    A.MedianBlur(blur_limit=7, p=0.5), # Apply MedianBlur
    A.MotionBlur(blur_limit=(3, 7), p=0.5), # Apply MotionBlur
    A.HorizontalFlip(p=0.5),

    # Adding a transform that mimics the effect of combining patches, though not exactly CutMix as in classification
    # A.Cutout(num_holes=8, max_h_size=64, max_w_size=64, fill_value=0, p=0.2), # Example of Cutout
    # A.CoarseDropout(max_holes=8, max_height=64, max_width=64, min_height=1, min_width=1, fill_value=0, p=0.2), # Another form of Cutout/Dropout

    # A.Mixup is generally applied to batches and might not work directly in a simple A.Compose for detection bboxes
    # For a true Mixup/CutMix in detection, you usually need to integrate it at the dataloader level.

    # However, if you just want to demonstrate *including* a Mixup-like transform in A.Compose (understanding its limitations for detection bboxes):
    # A.Mixup(p=0.2, lambda_=(0.0, 1.0), alpha=0.4), # Note: This might not correctly handle bboxes for detection

    # Let's stick to transforms that are well-supported for detection bboxes in A.Compose for now.
    # If you want to implement CutMix/Mixup specifically for object detection, it often involves custom dataloader modifications.

    # Reverting to a standard pipeline with well-supported detection transforms for demonstration:

], bbox_params=A.BboxParams(format='yolo', label_fields=['class_labels']))


print("Example custom augmentation pipeline (without CutMix, due to complexity with bboxes in simple Compose) defined.")
print("To implement CutMix/Mixup effectively for object detection with YOLOv8, it usually requires modifications at the data loader level.")
print("Current pipeline transforms:")
for transform in custom_augmentation_pipeline_with_cutmix_concept.transforms:
    print(f"- {transform.__class__.__name__} with parameters: {transform.get_params()}")

# Note: To actually use this pipeline in training, you would pass it to model.train(augment=True, transforms=your_pipeline)
# However, as seen before, the 'transforms' argument might not be directly supported in this YOLO version's train method.
# The recommended way in newer Ultralytics is often to just install Albumentations and set augment=True, letting YOLO handle the composition.
# Or, for advanced techniques like CutMix/Mixup in detection, modify the dataset/dataloader.

Example custom augmentation pipeline (without CutMix, due to complexity with bboxes in simple Compose) defined.
To implement CutMix/Mixup effectively for object detection with YOLOv8, it usually requires modifications at the data loader level.
Current pipeline transforms:
- RandomBrightnessContrast with parameters: {}
- ShiftScaleRotate with parameters: {}
- HorizontalFlip with parameters: {}
- Blur with parameters: {'kernel': 7}
- MedianBlur with parameters: {'kernel': 3}
- MotionBlur with parameters: {'kernel': array([[          0,           0,           0,           0,           0,           0,           0],
       [          0,           0,           0,           0,           0,           0,           0],
       [          0,           0,         0.2,         0.2,         0.2,         0.2,         0.2],
       [          0,           0,           0,           0,           0,           0,           0],
       [          0,           0,           0,           0,           0,         

In [None]:
from ultralytics import YOLO

# Load a fresh YOLOv8 model for this training run
# Using 'yolov8n.pt' as before
model_with_cutmix_attempt = YOLO('yolov8n.pt')

# Assuming data_yaml_path_final is defined from a previous cell pointing to your data.yaml
# Assuming custom_augmentation_pipeline_with_cutmix_concept is defined from cell c2bbc6f7

print("Starting training with custom augmentation including attempt at CutMix...")

# Train the model with specified parameters and the custom augmentation pipeline
# With Albumentations installed and augment=True, YOLOv8 will automatically use the defined pipeline
results_train_cutmix_attempt = model_with_cutmix_attempt.train(
    data=data_yaml_path_final,        # Path to your data.yaml configuration file
    epochs=50,                        # Number of training epochs (can adjust as needed)
    imgsz=416,                        # Image size for training
    batch=16,                         # Batch size
    augment=True,                     # Enable augmentation (will use the custom pipeline if defined)
    project='helmet_detection_training', # Project name to group runs
    name='yolov8n_cutmix_attempt'     # Name for this specific run
)

print("Training with custom augmentation including attempt at CutMix completed.")

Starting training with custom augmentation including attempt at CutMix...
Ultralytics 8.3.203 🚀 Python-3.12.11 torch-2.8.0+cu126 CUDA:0 (Tesla T4, 15095MiB)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, augment=True, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=/content/drive/MyDrive/capstone_helmet_detection/unzipped_archive/archive/data.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=50, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=416, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, na

### Evaluate model with Custom Augmentation (Attempting CutMix)

**Subtask:** Run `model.val()` on the model trained with the custom augmentation pipeline that includes an attempt at CutMix to get detailed metrics.

In [None]:
# Assuming the model trained with the CutMix attempt is available as 'model_with_cutmix_attempt'
# and the data.yaml path is available as 'data_yaml_path_final'

print("Evaluating model trained with custom augmentation (attempting CutMix)...")

# Run validation on the trained model
results_val_cutmix_attempt = model_with_cutmix_attempt.val(
    data=data_yaml_path_final,  # Path to your data.yaml configuration file
    imgsz=416,                  # Image size for validation (should match training size)
    batch=16,                   # Batch size for validation
)

print("Evaluation with custom augmentation (attempting CutMix) completed.")

# Store the validation results for comparison
# results_val_cutmix_attempt_metrics = results_val_cutmix_attempt.results_dict # This gives overall metrics
# To get per-class metrics, we might need to access other attributes or parse the output