# üöó YOLOv8 Road Hazard Detection - Complete Colab Notebook
## AICTE Project: AI-Powered Road Monitoring System

**This notebook trains a YOLOv8 model to detect 8 types of road hazards:**
1. Pothole
2. Crack
3. Debris
4. Waterlogging
5. Damaged Sign
6. Vegetation Overgrowth
7. Construction
8. Accident Site

---

## üì¶ Step 1: Setup Environment

In [None]:
# Install required packages
!pip install -q ultralytics roboflow
!pip install -q opencv-python-headless

print("‚úÖ All packages installed!")

In [None]:
# Import libraries
import os
from ultralytics import YOLO
import cv2
import matplotlib.pyplot as plt
from IPython.display import Image, display
import yaml
from google.colab import files
import zipfile
import shutil

print("‚úÖ Libraries imported successfully!")
print(f"üì¶ Ultralytics YOLO version: {YOLO.__version__ if hasattr(YOLO, '__version__') else 'Latest'}")

## üìÇ Step 2: Upload Your Dataset

**Option A:** Upload your own labeled dataset (YOLO format)  
**Option B:** Use sample/demo dataset from Roboflow  
**Option C:** Download public road damage datasets

### Option A: Upload Your Own Dataset

In [None]:
# Upload dataset ZIP file
# Dataset should be in YOLO format:
# dataset.zip/
#   ‚îú‚îÄ‚îÄ images/
#   ‚îÇ   ‚îú‚îÄ‚îÄ train/
#   ‚îÇ   ‚îú‚îÄ‚îÄ val/
#   ‚îÇ   ‚îî‚îÄ‚îÄ test/
#   ‚îú‚îÄ‚îÄ labels/
#   ‚îÇ   ‚îú‚îÄ‚îÄ train/
#   ‚îÇ   ‚îú‚îÄ‚îÄ val/
#   ‚îÇ   ‚îî‚îÄ‚îÄ test/
#   ‚îî‚îÄ‚îÄ data.yaml

print("üì§ Upload your dataset ZIP file...")
uploaded = files.upload()

# Extract ZIP
for filename in uploaded.keys():
    print(f"Extracting {filename}...")
    with zipfile.ZipFile(filename, 'r') as zip_ref:
        zip_ref.extractall('dataset')
    print(f"‚úÖ Extracted to 'dataset' folder")

# List contents
!ls -lh dataset/

### Option B: Use Roboflow Dataset (Recommended for Testing)

In [None]:
# Use a public road damage dataset from Roboflow
# Sign up at https://roboflow.com and get your API key

# Example: Pothole detection dataset
# !pip install roboflow
# from roboflow import Roboflow
# rf = Roboflow(api_key="YOUR_API_KEY")
# project = rf.workspace("YOUR_WORKSPACE").project("YOUR_PROJECT")
# dataset = project.version(1).download("yolov8")

print("‚ÑπÔ∏è If using Roboflow, uncomment and add your API key above")
print("‚ÑπÔ∏è Or use Option C to download public datasets")

### Option C: Download Public Datasets

In [None]:
# Download RDD2020 (Road Damage Dataset)
# This is a sample - replace with actual dataset URL

# Example: Download from Kaggle (requires Kaggle API key)
# !pip install kaggle
# !kaggle datasets download -d atulyakumar98/pothole-detection-dataset
# !unzip pothole-detection-dataset.zip -d dataset/

print("‚ÑπÔ∏è For Kaggle datasets:")
print("1. Go to https://www.kaggle.com/settings")
print("2. Create API token")
print("3. Upload kaggle.json to Colab")
print("4. Run: !mkdir -p ~/.kaggle && mv kaggle.json ~/.kaggle/")

## üèóÔ∏è Step 3: Create Dataset Configuration

Create `data.yaml` if not included in your dataset

In [None]:
# Create data.yaml configuration
data_yaml = {
    'path': '/content/dataset',  # dataset root dir
    'train': 'images/train',  # train images
    'val': 'images/val',  # val images
    'test': 'images/test',  # test images (optional)
    
    'names': {
        0: 'pothole',
        1: 'crack',
        2: 'debris',
        3: 'waterlogging',
        4: 'damaged_sign',
        5: 'vegetation_overgrowth',
        6: 'construction',
        7: 'accident_site'
    }
}

# Save to file
with open('/content/dataset/data.yaml', 'w') as f:
    yaml.dump(data_yaml, f, default_flow_style=False)

print("‚úÖ data.yaml created!")
print("\nConfiguration:")
!cat /content/dataset/data.yaml

## üìä Step 4: Verify Dataset

In [None]:
# Check dataset structure
import os

dataset_path = '/content/dataset'

# Count images and labels
def count_files(path, extension):
    return len([f for f in os.listdir(path) if f.endswith(extension)]) if os.path.exists(path) else 0

print("üìä Dataset Statistics:\n")
print("="*60)

for split in ['train', 'val', 'test']:
    img_path = os.path.join(dataset_path, 'images', split)
    lbl_path = os.path.join(dataset_path, 'labels', split)
    
    n_images = count_files(img_path, ('.jpg', '.jpeg', '.png'))
    n_labels = count_files(lbl_path, '.txt')
    
    print(f"{split.upper():>8} set: {n_images:>5} images, {n_labels:>5} labels")

print("="*60)

In [None]:
# Visualize sample images with labels
import random

def show_sample_images(n_samples=6):
    img_dir = '/content/dataset/images/train'
    lbl_dir = '/content/dataset/labels/train'
    
    if not os.path.exists(img_dir):
        print("‚ö†Ô∏è No training images found!")
        return
    
    images = [f for f in os.listdir(img_dir) if f.endswith(('.jpg', '.jpeg', '.png'))]
    samples = random.sample(images, min(n_samples, len(images)))
    
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    axes = axes.flatten()
    
    class_names = list(data_yaml['names'].values())
    
    for idx, img_name in enumerate(samples):
        # Read image
        img_path = os.path.join(img_dir, img_name)
        img = cv2.imread(img_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        
        # Read labels
        lbl_path = os.path.join(lbl_dir, img_name.replace('.jpg', '.txt').replace('.png', '.txt'))
        
        if os.path.exists(lbl_path):
            with open(lbl_path, 'r') as f:
                labels = f.readlines()
            
            # Draw bounding boxes
            h, w = img.shape[:2]
            for label in labels:
                parts = label.strip().split()
                if len(parts) >= 5:
                    cls_id = int(parts[0])
                    x_center, y_center, width, height = map(float, parts[1:5])
                    
                    # Convert YOLO format to pixel coordinates
                    x1 = int((x_center - width/2) * w)
                    y1 = int((y_center - height/2) * h)
                    x2 = int((x_center + width/2) * w)
                    y2 = int((y_center + height/2) * h)
                    
                    # Draw rectangle
                    cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 0), 2)
                    
                    # Add label
                    label_text = class_names[cls_id] if cls_id < len(class_names) else f"Class {cls_id}"
                    cv2.putText(img, label_text, (x1, y1-5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)
        
        axes[idx].imshow(img)
        axes[idx].axis('off')
        axes[idx].set_title(img_name, fontsize=10)
    
    plt.tight_layout()
    plt.show()
    print("‚úÖ Sample images with annotations displayed")

show_sample_images(6)

## üöÄ Step 5: Train YOLOv8 Model

In [None]:
# Initialize YOLOv8 model
# Options: yolov8n.pt (nano - fastest), yolov8s.pt (small), yolov8m.pt (medium), yolov8l.pt (large), yolov8x.pt (extra large)

model = YOLO('yolov8n.pt')  # Start with nano for faster training

print("‚úÖ YOLOv8 model loaded!")
print(f"   Model: yolov8n.pt (nano - optimized for speed)")

In [None]:
# Train the model
print("üöÄ Starting training...\n")
print("="*70)
print("Training Configuration:")
print("  ‚Ä¢ Model: YOLOv8n (nano)")
print("  ‚Ä¢ Epochs: 50")
print("  ‚Ä¢ Image Size: 640x640")
print("  ‚Ä¢ Batch Size: 16")
print("  ‚Ä¢ Device: GPU (if available)")
print("="*70 + "\n")

results = model.train(
    data='/content/dataset/data.yaml',
    epochs=50,
    imgsz=640,
    batch=16,
    name='road_hazard_detection',
    project='runs/detect',
    patience=10,  # Early stopping
    save=True,
    plots=True,
    verbose=True
)

print("\n‚úÖ Training completed!")

## üìà Step 6: View Training Results

In [None]:
# Display training curves
import glob

print("üìä Training Results:\n")

# Find the latest training run
run_dir = 'runs/detect/road_hazard_detection'

if os.path.exists(run_dir):
    # Display results
    result_img = os.path.join(run_dir, 'results.png')
    if os.path.exists(result_img):
        print("Training Curves:")
        display(Image(filename=result_img))
    
    # Display confusion matrix
    conf_matrix = os.path.join(run_dir, 'confusion_matrix.png')
    if os.path.exists(conf_matrix):
        print("\nConfusion Matrix:")
        display(Image(filename=conf_matrix))
    
    # Display sample predictions
    val_batch = os.path.join(run_dir, 'val_batch0_pred.jpg')
    if os.path.exists(val_batch):
        print("\nSample Predictions:")
        display(Image(filename=val_batch))
else:
    print("‚ö†Ô∏è Results directory not found")

## ‚úÖ Step 7: Evaluate Model

In [None]:
# Load best model
best_model_path = 'runs/detect/road_hazard_detection/weights/best.pt'
best_model = YOLO(best_model_path)

print("‚úÖ Best model loaded!")
print(f"   Path: {best_model_path}")

In [None]:
# Validate on test set
print("üß™ Evaluating on validation set...\n")

metrics = best_model.val()

print("\n" + "="*70)
print("üìä EVALUATION METRICS")
print("="*70)
print(f"mAP@0.5:       {metrics.box.map50:.4f}")
print(f"mAP@0.5:0.95:  {metrics.box.map:.4f}")
print(f"Precision:     {metrics.box.mp:.4f}")
print(f"Recall:        {metrics.box.mr:.4f}")
print("="*70)

## üéØ Step 8: Test on Sample Images

In [None]:
# Upload test images
print("üì§ Upload test images to detect road hazards...")
test_images = files.upload()

print(f"\n‚úÖ Uploaded {len(test_images)} test images")

In [None]:
# Run inference on test images
fig, axes = plt.subplots(len(test_images), 1, figsize=(12, 6*len(test_images)))

if len(test_images) == 1:
    axes = [axes]

for idx, (filename, _) in enumerate(test_images.items()):
    # Run detection
    results = best_model(filename, conf=0.5)
    
    # Get annotated image
    annotated_img = results[0].plot()
    
    # Convert BGR to RGB
    annotated_img = cv2.cvtColor(annotated_img, cv2.COLOR_BGR2RGB)
    
    # Display
    axes[idx].imshow(annotated_img)
    axes[idx].axis('off')
    axes[idx].set_title(f'Detection Results: {filename}', fontsize=14, fontweight='bold')
    
    # Print detections
    print(f"\nüìä Detections for {filename}:")
    print("-" * 60)
    
    boxes = results[0].boxes
    if len(boxes) > 0:
        for box in boxes:
            cls_id = int(box.cls[0])
            conf = float(box.conf[0])
            cls_name = data_yaml['names'][cls_id]
            print(f"  ‚Ä¢ {cls_name}: {conf:.2%} confidence")
    else:
        print("  No hazards detected")

plt.tight_layout()
plt.show()

print("\n‚úÖ Inference completed!")

## üíæ Step 9: Export Model

In [None]:
# Export to different formats
print("üì¶ Exporting model to various formats...\n")

# Export to ONNX (for deployment)
best_model.export(format='onnx')
print("‚úÖ Exported to ONNX format")

# Export to TorchScript
best_model.export(format='torchscript')
print("‚úÖ Exported to TorchScript format")

print("\nüìÅ Exported models are in: runs/detect/road_hazard_detection/weights/")

In [None]:
# Download trained model
print("üì• Downloading trained model...\n")

# Zip the weights folder
!zip -r trained_model.zip runs/detect/road_hazard_detection/weights/

print("\n‚úÖ Model packaged!")
print("\nDownload 'trained_model.zip' to use in your deployment:")

files.download('trained_model.zip')

print("\nüìå Usage in deployment:")
print("   python yolov8_realtime_detection.py --model path/to/best.pt")

## üìä Step 10: Generate Report

In [None]:
# Generate training report
print("="*70)
print("üìä TRAINING REPORT")
print("="*70)
print("\nüéØ Model Configuration:")
print(f"   Model Architecture: YOLOv8n (nano)")
print(f"   Number of Classes: 8")
print(f"   Classes: {', '.join(data_yaml['names'].values())}")
print(f"   Image Size: 640x640")
print(f"   Epochs: 50")
print(f"   Batch Size: 16")

print("\nüìä Performance Metrics:")
print(f"   mAP@0.5:       {metrics.box.map50:.4f} ({metrics.box.map50*100:.2f}%)")
print(f"   mAP@0.5:0.95:  {metrics.box.map:.4f} ({metrics.box.map*100:.2f}%)")
print(f"   Precision:     {metrics.box.mp:.4f}")
print(f"   Recall:        {metrics.box.mr:.4f}")

print("\nüìÅ Model Files:")
print(f"   Best Weights: runs/detect/road_hazard_detection/weights/best.pt")
print(f"   Last Weights: runs/detect/road_hazard_detection/weights/last.pt")

print("\nüöÄ Deployment:")
print("   1. Download trained_model.zip")
print("   2. Extract best.pt")
print("   3. Use with: python yolov8_realtime_detection.py --model best.pt")

print("\n‚úÖ Training Complete!")
print("="*70)

## üéì Next Steps

1. **Download the trained model** (`trained_model.zip`)
2. **Extract** `best.pt` from the zip file
3. **Deploy** using the project scripts:
   ```bash
   python yolov8_realtime_detection.py --model best.pt --camera 0
   ```
4. **Integrate** with full system:
   ```bash
   python integrated_deployment.py --model best.pt
   ```

---

## üìö Resources

- **YOLOv8 Documentation:** https://docs.ultralytics.com
- **Dataset Labeling:** Use LabelImg or Roboflow
- **Public Datasets:**
  - RDD2020: https://github.com/sekilab/RoadDamageDetector
  - Pothole Detection: https://www.kaggle.com/datasets/atulyakumar98/pothole-detection-dataset

---

**Project Status: Model Training Complete ‚úÖ**

All 4 AICTE objectives can now be demonstrated with this trained model!