# YOLOv8 Microplastics Detection

This notebook will guide you through training a YOLOv8 object detection model to detect microplastics using your dataset.

> **Troubleshooting DataLoader Errors**: If you encounter `DataLoader worker exited unexpectedly` errors, this notebook has been updated to fix these issues by:
> 1. Setting workers=0 to avoid multiprocessing issues
> 2. Reducing batch size to prevent memory overflows
> 3. Using CPU instead of GPU for more stable processing
> 4. Disabling caching to prevent file access conflicts

> **Important Update**: Your dataset is in detection format (bounding boxes), not segmentation format. The notebook has been updated to use YOLOv8 detection instead of segmentation.

In [None]:
# 1. Install Required Libraries
%pip install ultralytics

# The following line will download the YOLOv8 detection model if it doesn't exist
# Uncomment if you need to download it
# !python -c "from ultralytics import YOLO; YOLO('yolov8n.pt')"

Note: you may need to restart the kernel to use updated packages.


: 

## 2. Dataset Structure and Format

Your dataset should be structured as:
- data/train/images/, data/train/labels/
- data/valid/images/, data/valid/labels/
- data/test/images/, data/test/labels/

### Label Format for Object Detection
YOLOv8 detection labels contain normalized bounding box coordinates for each object:
```
<class-id> <x_center> <y_center> <width> <height>
```
Where:
- `class-id`: The object class (0 for microplastics)
- `x_center, y_center`: Normalized center coordinates of the bounding box (0-1)
- `width, height`: Normalized width and height of the bounding box (0-1)

Each object in an image has its own line in the label file. The dataset already contains these labels.

In [None]:
# 3. Training YOLOv8 Detection Model
import os
from pathlib import Path
import torch
from ultralytics import YOLO
import yaml

DATASET_YAML = 'data.yaml'

def verify_dataset(yaml_path):
    if not os.path.exists(yaml_path):
        print(f"ERROR: Dataset YAML file not found: {yaml_path}")
        return False
    try:
        with open(yaml_path, 'r') as f:
            data_cfg = yaml.safe_load(f)
    except Exception as e:
        print(f"ERROR: Failed to load YAML: {e}")
        return False
    if 'path' not in data_cfg:
        print("ERROR: 'path' key missing in YAML file.")
        return False
    base_path = Path(data_cfg['path'])
    print(f"Base path: {base_path}")
    all_ok = True
    for split in ['train', 'val', 'test']:
        if split in data_cfg:
            split_path = base_path / data_cfg[split]
            print(f"{split.capitalize()} path: {split_path}")
            if not split_path.exists():
                print(f"WARNING: {split} path does not exist: {split_path}")
                try:
                    os.makedirs(split_path, exist_ok=True)
                    print(f"Created directory: {split_path}")
                except Exception as e:
                    print(f"ERROR: Could not create directory {split_path}: {e}")
                    all_ok = False
            else:
                img_count = len(list(split_path.glob('*.jpg'))) + len(list(split_path.glob('*.png')))
                print(f"  Found {img_count} images in {split} folder")
                if img_count == 0:
                    print(f"WARNING: No images found in {split_path}")
    return all_ok

# Device selection
if torch.cuda.is_available():
    device = torch.device('cuda:0')
    print(f"CUDA available: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device('cpu')
    print("No GPU available, using CPU instead.")

print("Verifying dataset paths...")
dataset_ok = verify_dataset(DATASET_YAML)
if not dataset_ok:
    print("Dataset verification failed. Please check your dataset structure and YAML file.")
else:
    # Load YOLOv8 detection model
    try:
        model = YOLO('yolov8n.pt')
    except Exception as e:
        print(f"ERROR: Could not load YOLOv8 model: {e}")
        model = None
    if model is not None:
        try:
            print("Starting training...")
            results = model.train(
                data=DATASET_YAML,
                epochs=10,             # Increased epochs for better learning
                imgsz=640,             # Input image size
                batch=8,               # Reduced batch size to prevent memory issues
                workers=0,             # Set workers to 0 to avoid DataLoader multiprocessing issues
                mosaic=1.0,            # Mosaic augmentation
                scale=0.5,             # Scale augmentation
                perspective=0.0,       # No perspective augmentation for small objects
                flipud=0.5,            # Flip up-down augmentation
                fliplr=0.5,            # Flip left-right augmentation
                hsv_h=0.015,           # HSV hue augmentation (reduced for consistency)
                hsv_s=0.7,             # HSV saturation augmentation
                hsv_v=0.4,             # HSV value augmentation
                patience=50,           # Early stopping patience
                device=str(device),    # Use GPU if available, otherwise CPU
                project='runs/detect', # Project directory
                name='train',          # Run name
                exist_ok=True,         # Overwrite existing directory
                cache=False            # Disable cache to prevent potential issues
            )
            print("Training completed successfully.")
        except Exception as e:
            print(f"ERROR during training: {e}")

CUDA available: NVIDIA GeForce RTX 3050 Ti Laptop GPU
Verifying dataset paths...
Base path: C:\Users\blasi\CS-ML\FINAL_PROJ
Train path: C:\Users\blasi\CS-ML\FINAL_PROJ\data\train\images
  Found 3226 images in train folder
Val path: C:\Users\blasi\CS-ML\FINAL_PROJ\data\valid\images
  Found 928 images in val folder
Test path: C:\Users\blasi\CS-ML\FINAL_PROJ\data\test\images
  Found 453 images in test folder
Starting training...
Starting training...
New https://pypi.org/project/ultralytics/8.3.134 available  Update with 'pip install -U ultralytics'
Ultralytics 8.3.133  Python-3.11.7 torch-2.5.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3050 Ti Laptop GPU, 4096MiB)
New https://pypi.org/project/ultralytics/8.3.134 available  Update with 'pip install -U ultralytics'
Ultralytics 8.3.133  Python-3.11.7 torch-2.5.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3050 Ti Laptop GPU, 4096MiB)
[34m[1mengine\trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=8, bgr=0.0, box=7.5, ca

[34m[1mtrain: [0mScanning C:\Users\blasi\CS-ML\FINAL_PROJ\data\train\labels.cache... 3226 images, 0 backgrounds, 0 corrupt: 100%|██████████| 3226/3226 [00:00<?, ?it/s]

[34m[1mval: [0mFast image access  (ping: 0.00.0 ms, read: 329.971.8 MB/s, size: 32.7 KB)



[34m[1mval: [0mScanning C:\Users\blasi\CS-ML\FINAL_PROJ\data\valid\labels.cache... 928 images, 0 backgrounds, 0 corrupt: 100%|██████████| 928/928 [00:00<?, ?it/s]



Plotting labels to runs\detect\train\labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.002, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to [1mruns\detect\train[0m
Starting training for 10 epochs...
Closing dataloader mosaic

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.002, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to [1mruns\detect\trai

       1/10      1.03G      2.146      3.641      1.584         47        640:   1%|          | 5/404 [00:01<02:39,  2.50it/s]

In [None]:
# 3.1 Training Visualization
from IPython.display import display, Image
from pathlib import Path
import time
import os

# Function to display training progress during or after training
def show_training_plots():
    results_path = Path('runs/detect/train')  # Changed from segment to detect
    
    # Check if the directory exists first
    if not results_path.exists():
        print(f"Warning: Results directory not found at {results_path}")
        return
    
    # Results plots
    plots = {
        'Training Loss': results_path / 'results.png',
        'Validation Confusion Matrix': results_path / 'val_confusion_matrix_normalized.png',
        'PR Curve': results_path / 'PR_curve.png'
    }
    
    found_plots = False
    for title, plot_path in plots.items():
        if plot_path.exists():
            found_plots = True
            print(f"\n{title}:")
            try:
                display(Image(str(plot_path)))
            except Exception as e:
                print(f"Error displaying {title}: {str(e)}")
        else:
            print(f"\n{title} plot not found at {plot_path}")
    
    if not found_plots:
        print("No training plots found. Training may not have completed successfully.")

# Call this function after training completes
# Uncomment to run after training
# show_training_plots()

In [None]:
# 4. Evaluate on Test Set

# Function to safely perform validation
def safe_validation(model, yaml_path, split='test'):
    try:
        # Evaluate the model with error handling
        metrics = model.val(
            data=yaml_path,
            split=split,
            conf=0.25,           # Confidence threshold
            iou=0.5,             # IoU threshold
            max_det=300,         # Maximum detections per image
            verbose=True         # Print detailed metrics
        )
        return metrics
    except Exception as e:
        print(f"Error during validation: {str(e)}")
        return None

# Run validation
metrics = safe_validation(model, DATASET_YAML)

# Print summary metrics if validation was successful
if metrics is not None:
    try:
        print(f"Precision: {metrics.box.maps.mean():.4f}")
        print(f"mAP@0.5: {metrics.box.map50:.4f}")
        print(f"mAP@0.5:0.95: {metrics.box.map:.4f}")
    except Exception as e:
        print(f"Error displaying metrics: {str(e)}")
else:
    print("Validation failed. Cannot display metrics.")

In [None]:
# 4.1 Visualize Evaluation Results
import matplotlib.pyplot as plt
from pathlib import Path

# Get confusion matrix plot if available
conf_matrix = Path('runs/detect/train/confusion_matrix.png')  # Changed from segment to detect
if conf_matrix.exists():
    plt.figure(figsize=(10, 10))
    img = plt.imread(conf_matrix)
    plt.imshow(img)
    plt.axis('off')
    plt.title('Confusion Matrix')
    plt.show()

# Get PR curve if available
pr_curve = Path('runs/detect/train/PR_curve.png')  # Changed from segment to detect
if pr_curve.exists():
    plt.figure(figsize=(10, 6))
    img = plt.imread(pr_curve)
    plt.imshow(img)
    plt.axis('off')
    plt.title('Precision-Recall Curve')
    plt.show()

In [None]:
# 5. Inference Example
import matplotlib.pyplot as plt
import numpy as np
import os

# Function to safely load and run inference on an image
def safe_inference(model, image_path, conf=0.25):
    try:
        if not os.path.exists(image_path):
            print(f"Warning: Image not found: {image_path}")
            return None
            
        # Run inference with error handling
        results = model(image_path, conf=conf)
        return results
    except Exception as e:
        print(f"Error during inference on {image_path}: {str(e)}")
        return None

# Run inference on test images
test_image = 'data/test/images/1_jpg.rf.cde0320b040f0984f45350362147d2b2.jpg'
results = safe_inference(model, test_image)

# Only proceed if inference was successful
if results is not None:
    # Plot results
    try:
        fig, ax = plt.subplots(1, 2, figsize=(16, 8))

        # Original image with detections
        ax[0].imshow(plt.imread(test_image))
        ax[0].set_title('Original Image')
        ax[0].axis('off')

        # Image with bounding boxes
        results[0].plot(boxes=True, labels=True)  # Removed masks=True since we're using detection
        ax[1].imshow(results[0].orig_img)
        ax[1].set_title('Detected Microplastics')
        ax[1].axis('off')

        plt.tight_layout()
        plt.show()

        # Print detection statistics
        print(f"Detected {len(results[0])} microplastics with confidence >0.25")
    except Exception as e:
        print(f"Error plotting results: {str(e)}")

# Run inference on a few more images if available
try:
    # Get list of test images
    import glob
    test_images = glob.glob('data/test/images/*.jpg')[:3]  # Get first 3 test images to avoid memory issues
    
    # Run inference on multiple images
    if len(test_images) > 1:
        for img_path in test_images[1:]:  # Skip the first one as we already processed it
            img_results = safe_inference(model, img_path)
            if img_results is not None:
                try:
                    img_results[0].show()
                    print(f"Detected {len(img_results[0])} microplastics in {os.path.basename(img_path)}")
                except Exception as e:
                    print(f"Error showing results for {img_path}: {str(e)}")
            
except Exception as e:
    print(f"Error processing additional images: {e}")

---

## Next Steps
- You can adjust the number of epochs, image size, or model variant (e.g., yolov8m.pt) as needed.
- Run the notebook cells to train and evaluate your model.

## Understanding the Results

### Metrics Explanation
- **mAP (mean Average Precision)**: The primary metric for object detection performance
- **Precision**: How accurate the positive detections are
- **Recall**: The ability of the model to find all microplastics in the image
- **IoU (Intersection over Union)**: Measures how well the predicted bounding boxes overlap with the ground truth

### Model Improvements

To improve model performance:

1. **Try larger models**: Replace `yolov8n.pt` with:
   - `yolov8s.pt` (small) 
   - `yolov8m.pt` (medium)
   - `yolov8l.pt` (large)
   - `yolov8x.pt` (extra large)

2. **Data augmentation**: Add more augmentations to prevent overfitting:
   ```python
   model.train(
       # Other parameters
       augment=True,
       mixup=0.1,
       copy_paste=0.1
   )
   ```

3. **Optimization**: Try different optimizers:
   ```python
   model.train(
       # Other parameters
       optimizer="AdamW",
       lr0=0.001
   )
   ```

4. **Export model**: Save the model for deployment:
   ```python
   model.export(format="onnx") # or "torchscript", "openvino", etc.
   ```