# üß± LEGO Assembly Error Detection - Training Notebook

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tanamujaya/lego_assembly_detection/blob/main/LEGO_Assembly_Error_Detection_Training.ipynb)

This notebook trains a YOLO-based computer vision model to detect assembly errors in LEGO models.

**Features:**
- YOLOv8 object detection (nano version optimized for Raspberry Pi)
- Automatic train/val/test split (70/15/15)
- Complete evaluation metrics and visualizations
- Model export for Raspberry Pi deployment
- Download all results in a single package

**Author:** Tanaka Mujaya  
**Project:** Bachelor's Thesis - HS Rhein-Waal

---
## 1. Setup Environment

First, let's check if we're running on GPU and install the required packages.

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Install required packages
!pip install ultralytics --quiet
!pip install kaggle --quiet
!pip install scikit-learn --quiet

print("‚úÖ Packages installed successfully!")

In [None]:
# Import libraries
import os
import json
import shutil
import zipfile
import random
from pathlib import Path
from datetime import datetime

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image as PILImage
from IPython.display import Image, display

from ultralytics import YOLO
from sklearn.model_selection import train_test_split

print("‚úÖ Libraries imported successfully!")

---
## 2. Download Dataset from Kaggle

The dataset is hosted on Kaggle. You'll need to upload your Kaggle API credentials.

In [None]:
# Upload your kaggle.json file
# Go to https://www.kaggle.com/settings -> API -> Create New Token
# This downloads a kaggle.json file

from google.colab import files

print("Please upload your kaggle.json file:")
uploaded = files.upload()
print(f"Uploaded files: {list(uploaded.keys())}")

In [None]:
# Setup Kaggle credentials
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/kaggle.json 2>/dev/null || cp Kaggle.json ~/.kaggle/kaggle.json 2>/dev/null
!chmod 600 ~/.kaggle/kaggle.json

# Verify credentials
!echo "Checking credentials..."
!cat ~/.kaggle/kaggle.json
print("\n‚úÖ Kaggle credentials configured!")

In [None]:
# Download the dataset
KAGGLE_DATASET = "tanakamujaya/lad-dataset-5-0"

!kaggle datasets download -d {KAGGLE_DATASET} -p /content/data --unzip --force

print("‚úÖ Dataset downloaded!")

In [None]:
# Check the dataset structure
print("=== Dataset Structure ===")
!find /content/data -type d

---
## 3. Configuration

Set up training parameters and paths.

In [None]:
# ============================================================================
# CONFIGURATION - Modify these settings as needed
# ============================================================================

# Paths
BASE_DIR = Path("/content")
DATA_DIR = BASE_DIR / "data"
MODELS_DIR = BASE_DIR / "models"
RESULTS_DIR = BASE_DIR / "results"
YOLO_DATASET_DIR = BASE_DIR / "yolo_dataset"
DOWNLOAD_DIR = BASE_DIR / "download_package"

# Create directories
MODELS_DIR.mkdir(exist_ok=True)
RESULTS_DIR.mkdir(exist_ok=True)
DOWNLOAD_DIR.mkdir(exist_ok=True)

# Dataset configuration
DATASET_CONFIG = {
    'train_split': 0.70,
    'val_split': 0.15,
    'test_split': 0.15,
    'image_size': 416,  # Can use 320 for faster training, 640 for better accuracy
    'random_seed': 42
}

# Model configuration
MODEL_CONFIG = {
    'variant': 'yolov8n',  # nano version (fastest, smallest)
    'pretrained': True,
    'num_classes': 2,  # correct_assembly, assembly_error
}

# Training configuration
TRAINING_CONFIG = {
    'epochs': 40,
    'batch_size': 16,
    'learning_rate': 0.001,
    'patience': 20,  # Early stopping
    'device': 0,  # GPU (use 'cpu' if no GPU)
    'workers': 2,
}

# Class names
CLASS_NAMES = ['correct_assembly', 'assembly_error']

print("‚úÖ Configuration set!")
print(f"   Model: {MODEL_CONFIG['variant']}")
print(f"   Image size: {DATASET_CONFIG['image_size']}")
print(f"   Epochs: {TRAINING_CONFIG['epochs']}")
print(f"   Batch size: {TRAINING_CONFIG['batch_size']}")
print(f"   Train/Val/Test split: {int(DATASET_CONFIG['train_split']*100)}/{int(DATASET_CONFIG['val_split']*100)}/{int(DATASET_CONFIG['test_split']*100)}")

---
## 4. Prepare Dataset

Find the dataset, split into train/val/test, and create YOLO format structure.

In [None]:
def find_dataset(data_dir):
    """
    Find the images and labels folders in the dataset.
    Handles various folder structures and naming conventions.
    """
    data_dir = Path(data_dir)
    
    print("üîç Searching for dataset...")
    
    # Look for images folder (case-insensitive)
    images_dir = None
    labels_dir = None
    
    # Search for Images/images folder
    for folder in data_dir.rglob('*'):
        if folder.is_dir():
            folder_name_lower = folder.name.lower()
            if folder_name_lower == 'images':
                images_dir = folder
            elif folder_name_lower == 'labels':
                labels_dir = folder
    
    if images_dir and labels_dir:
        print(f"   ‚úÖ Found dataset!")
        print(f"   Images: {images_dir}")
        print(f"   Labels: {labels_dir}")
        return images_dir, labels_dir
    
    raise FileNotFoundError(f"Could not find Images/Labels folders in {data_dir}")


def get_image_label_pairs(images_dir, labels_dir):
    """
    Get matching image and label file pairs.
    """
    images_dir = Path(images_dir)
    labels_dir = Path(labels_dir)
    
    image_extensions = {'.jpg', '.jpeg', '.png', '.bmp'}
    pairs = []
    
    for img_path in images_dir.iterdir():
        if img_path.suffix.lower() in image_extensions:
            label_path = labels_dir / f"{img_path.stem}.txt"
            if label_path.exists():
                pairs.append((img_path, label_path))
    
    return pairs


# Find the dataset
images_dir, labels_dir = find_dataset(DATA_DIR)

# Get all image-label pairs
all_pairs = get_image_label_pairs(images_dir, labels_dir)
print(f"\nüìä Found {len(all_pairs)} image-label pairs")

In [None]:
def create_yolo_dataset(pairs, output_dir, train_ratio=0.7, val_ratio=0.15, test_ratio=0.15, seed=42):
    """
    Split dataset and create YOLO format directory structure.
    """
    output_dir = Path(output_dir)
    
    # Remove old dataset if exists
    if output_dir.exists():
        shutil.rmtree(output_dir)
    
    # Create directory structure
    for split in ['train', 'val', 'test']:
        (output_dir / split / 'images').mkdir(parents=True, exist_ok=True)
        (output_dir / split / 'labels').mkdir(parents=True, exist_ok=True)
    
    # Split dataset
    random.seed(seed)
    shuffled_pairs = pairs.copy()
    random.shuffle(shuffled_pairs)
    
    n_total = len(shuffled_pairs)
    n_train = int(n_total * train_ratio)
    n_val = int(n_total * val_ratio)
    
    train_pairs = shuffled_pairs[:n_train]
    val_pairs = shuffled_pairs[n_train:n_train + n_val]
    test_pairs = shuffled_pairs[n_train + n_val:]
    
    splits = {
        'train': train_pairs,
        'val': val_pairs,
        'test': test_pairs
    }
    
    # Copy files to respective directories
    for split_name, split_pairs in splits.items():
        print(f"   Processing {split_name}: {len(split_pairs)} samples")
        for img_path, label_path in split_pairs:
            shutil.copy(img_path, output_dir / split_name / 'images' / img_path.name)
            shutil.copy(label_path, output_dir / split_name / 'labels' / label_path.name)
    
    # Create dataset.yaml
    yaml_content = f"""# LEGO Assembly Error Detection Dataset
# Auto-generated by training notebook

path: {output_dir}
train: train/images
val: val/images
test: test/images

# Classes
names:
  0: correct_assembly
  1: assembly_error

# Number of classes
nc: 2
"""
    
    yaml_path = output_dir / 'dataset.yaml'
    with open(yaml_path, 'w') as f:
        f.write(yaml_content)
    
    return yaml_path, splits


# Create YOLO dataset structure
print("üìÅ Creating YOLO dataset structure...")

yaml_path, splits = create_yolo_dataset(
    all_pairs,
    YOLO_DATASET_DIR,
    train_ratio=DATASET_CONFIG['train_split'],
    val_ratio=DATASET_CONFIG['val_split'],
    test_ratio=DATASET_CONFIG['test_split'],
    seed=DATASET_CONFIG['random_seed']
)

train_count = len(splits['train'])
val_count = len(splits['val'])
test_count = len(splits['test'])
total_count = train_count + val_count + test_count

print(f"\n‚úÖ Dataset prepared!")
print(f"   Train: {train_count} images ({train_count/total_count*100:.1f}%)")
print(f"   Val:   {val_count} images ({val_count/total_count*100:.1f}%)")
print(f"   Test:  {test_count} images ({test_count/total_count*100:.1f}%)")
print(f"   Total: {total_count} images")
print(f"\n   YAML config: {yaml_path}")

In [None]:
# Preview some training images
train_images_dir = YOLO_DATASET_DIR / 'train' / 'images'
all_images = list(train_images_dir.iterdir())
sample_images = random.sample(all_images, min(4, len(all_images)))

fig, axes = plt.subplots(1, len(sample_images), figsize=(16, 4))
if len(sample_images) == 1:
    axes = [axes]

for ax, img_path in zip(axes, sample_images):
    img = PILImage.open(img_path)
    ax.imshow(img)
    ax.set_title(img_path.name[:20] + '...')
    ax.axis('off')

plt.suptitle('Sample Training Images', fontsize=14)
plt.tight_layout()
plt.show()

---
## 5. Train the Model

Now let's train the YOLOv8 model on our dataset.

In [None]:
# Initialize model
print(f"üöÄ Loading {MODEL_CONFIG['variant']} model...")

model = YOLO(f"{MODEL_CONFIG['variant']}.pt")

print("‚úÖ Model loaded!")

In [None]:
# Train the model
print("üèãÔ∏è Starting training...")
print(f"   This may take a while depending on your GPU.")
print(f"   Epochs: {TRAINING_CONFIG['epochs']}")
print(f"   Batch size: {TRAINING_CONFIG['batch_size']}")
print()

# Training arguments
train_args = {
    'data': str(yaml_path),
    'epochs': TRAINING_CONFIG['epochs'],
    'batch': TRAINING_CONFIG['batch_size'],
    'imgsz': DATASET_CONFIG['image_size'],
    'device': TRAINING_CONFIG['device'],
    'workers': TRAINING_CONFIG['workers'],
    'patience': TRAINING_CONFIG['patience'],
    'save': True,
    'project': str(RESULTS_DIR),
    'name': 'train',
    'exist_ok': True,
    'pretrained': MODEL_CONFIG['pretrained'],
    'optimizer': 'Adam',
    'lr0': TRAINING_CONFIG['learning_rate'],
    'verbose': True,
    'plots': True,
    # Augmentation
    'hsv_h': 0.015,
    'hsv_s': 0.7,
    'hsv_v': 0.4,
    'degrees': 10,
    'translate': 0.1,
    'scale': 0.5,
    'fliplr': 0.5,
    'mosaic': 1.0,
    'mixup': 0.1,
}

# Start training
results = model.train(**train_args)

print("\n‚úÖ Training complete!")

---
## 6. Evaluate Results

Let's look at all the training metrics and evaluation results.

In [None]:
# Define results directory
train_results_dir = RESULTS_DIR / 'train'

print("üìÅ Training outputs generated:")
print("=" * 50)
for item in sorted(train_results_dir.iterdir()):
    if item.is_file():
        size_kb = item.stat().st_size / 1024
        print(f"   {item.name:<40} ({size_kb:.1f} KB)")
    else:
        print(f"   {item.name}/ (folder)")
print("=" * 50)

In [None]:
# Display Training Results Summary (results.png)
print("\nüìà Training Results Summary")
print("-" * 50)

results_img = train_results_dir / 'results.png'
if results_img.exists():
    display(Image(filename=str(results_img), width=900))
else:
    print("results.png not found")

In [None]:
# Display Confusion Matrix
print("\nüìä Confusion Matrix")
print("-" * 50)

confusion_matrix_img = train_results_dir / 'confusion_matrix.png'
if confusion_matrix_img.exists():
    display(Image(filename=str(confusion_matrix_img), width=600))
else:
    print("confusion_matrix.png not found")

In [None]:
# Display Normalized Confusion Matrix
print("\nüìä Normalized Confusion Matrix")
print("-" * 50)

confusion_matrix_norm_img = train_results_dir / 'confusion_matrix_normalized.png'
if confusion_matrix_norm_img.exists():
    display(Image(filename=str(confusion_matrix_norm_img), width=600))
else:
    print("confusion_matrix_normalized.png not found")

In [None]:
# Display F1 Curve
print("\nüìà F1-Confidence Curve (BoxF1_curve)")
print("-" * 50)

f1_curve_img = train_results_dir / 'BoxF1_curve.png'
if f1_curve_img.exists():
    display(Image(filename=str(f1_curve_img), width=600))
else:
    print("BoxF1_curve.png not found")

In [None]:
# Display Precision Curve
print("\nüìà Precision-Confidence Curve (BoxP_curve)")
print("-" * 50)

p_curve_img = train_results_dir / 'BoxP_curve.png'
if p_curve_img.exists():
    display(Image(filename=str(p_curve_img), width=600))
else:
    print("BoxP_curve.png not found")

In [None]:
# Display Recall Curve
print("\nüìà Recall-Confidence Curve (BoxR_curve)")
print("-" * 50)

r_curve_img = train_results_dir / 'BoxR_curve.png'
if r_curve_img.exists():
    display(Image(filename=str(r_curve_img), width=600))
else:
    print("BoxR_curve.png not found")

In [None]:
# Display PR Curve
print("\nüìà Precision-Recall Curve (BoxPR_curve)")
print("-" * 50)

pr_curve_img = train_results_dir / 'BoxPR_curve.png'
if pr_curve_img.exists():
    display(Image(filename=str(pr_curve_img), width=600))
else:
    print("BoxPR_curve.png not found")

In [None]:
# Display Labels Distribution
print("\nüìä Labels Distribution")
print("-" * 50)

labels_img = train_results_dir / 'labels.jpg'
if labels_img.exists():
    display(Image(filename=str(labels_img), width=800))
else:
    print("labels.jpg not found")

In [None]:
# Display Training Batch Samples
print("\nüñºÔ∏è Training Batch Samples")
print("-" * 50)

for i in range(3):
    batch_img = train_results_dir / f'train_batch{i}.jpg'
    if batch_img.exists():
        print(f"\nTrain Batch {i}:")
        display(Image(filename=str(batch_img), width=800))

In [None]:
# Display Validation Batch Predictions
print("\nüñºÔ∏è Validation Predictions vs Labels")
print("-" * 50)

for i in range(3):
    val_labels_img = train_results_dir / f'val_batch{i}_labels.jpg'
    val_pred_img = train_results_dir / f'val_batch{i}_pred.jpg'
    
    if val_labels_img.exists() and val_pred_img.exists():
        print(f"\nValidation Batch {i} - Labels:")
        display(Image(filename=str(val_labels_img), width=800))
        print(f"Validation Batch {i} - Predictions:")
        display(Image(filename=str(val_pred_img), width=800))

In [None]:
# Display Training Results CSV (per epoch metrics)
print("\nüìã Training Metrics Per Epoch (results.csv)")
print("-" * 50)

import pandas as pd

results_csv = train_results_dir / 'results.csv'
if results_csv.exists():
    df = pd.read_csv(results_csv)
    # Clean up column names (remove leading spaces)
    df.columns = df.columns.str.strip()
    
    print(f"\nTotal epochs: {len(df)}")
    print(f"\nColumns available:")
    for col in df.columns:
        print(f"   - {col}")
    
    print("\nüìä Full Training History:")
    display(df)
else:
    print("results.csv not found")

In [None]:
# Display Training Arguments
print("\n‚öôÔ∏è Training Arguments (args.yaml)")
print("-" * 50)

args_yaml = train_results_dir / 'args.yaml'
if args_yaml.exists():
    with open(args_yaml, 'r') as f:
        print(f.read())
else:
    print("args.yaml not found")

In [None]:
# Evaluate on test set
print("\nüìä Evaluating on Test Set...")
print("=" * 50)

# Load best model
best_model_path = train_results_dir / 'weights' / 'best.pt'
best_model = YOLO(str(best_model_path))

# Run evaluation on test set
test_results = best_model.val(
    data=str(yaml_path),
    split='test',
    verbose=True
)

print("\n" + "=" * 50)
print("üìà TEST SET RESULTS")
print("=" * 50)
print(f"Precision:    {test_results.results_dict['metrics/precision(B)']:.4f}")
print(f"Recall:       {test_results.results_dict['metrics/recall(B)']:.4f}")
print(f"mAP@0.5:      {test_results.results_dict['metrics/mAP50(B)']:.4f}")
print(f"mAP@0.5:0.95: {test_results.results_dict['metrics/mAP50-95(B)']:.4f}")
print("=" * 50)

---
## 7. Test Inference

Let's run inference on some test images to see the model in action.

In [None]:
# Run inference on random test images
test_images_dir = YOLO_DATASET_DIR / 'test' / 'images'
test_images = list(test_images_dir.iterdir())
sample_test_images = random.sample(test_images, min(4, len(test_images)))

print("üîç Running inference on sample test images...\n")

# Run prediction
predictions = best_model.predict(
    source=sample_test_images,
    save=True,
    project=str(RESULTS_DIR),
    name='test_predictions',
    exist_ok=True,
    conf=0.5
)

# Display predictions
pred_dir = RESULTS_DIR / 'test_predictions'
pred_images = [f for f in pred_dir.iterdir() if f.suffix.lower() in {'.jpg', '.jpeg', '.png'}]

if pred_images:
    fig, axes = plt.subplots(1, len(pred_images), figsize=(16, 4))
    if len(pred_images) == 1:
        axes = [axes]

    for ax, img_path in zip(axes, pred_images):
        img = PILImage.open(img_path)
        ax.imshow(img)
        ax.set_title(img_path.name[:25])
        ax.axis('off')

    plt.suptitle('Model Predictions on Test Images', fontsize=14)
    plt.tight_layout()
    plt.show()
else:
    print("No prediction images found")

---
## 8. Export and Download Everything

Save the trained model and download all results in a comprehensive package.

In [None]:
# Copy best model to models directory
final_model_path = MODELS_DIR / 'lego_detector_best.pt'
shutil.copy(best_model_path, final_model_path)

# Also copy last model
last_model_path = train_results_dir / 'weights' / 'last.pt'
if last_model_path.exists():
    shutil.copy(last_model_path, MODELS_DIR / 'lego_detector_last.pt')

print(f"‚úÖ Best model saved to: {final_model_path}")
print(f"   Model size: {final_model_path.stat().st_size / 1024 / 1024:.2f} MB")

In [None]:
# Export to ONNX format (for optimized inference on Raspberry Pi)
print("üì¶ Exporting to ONNX format...")

onnx_path = best_model.export(
    format='onnx',
    imgsz=DATASET_CONFIG['image_size'],
    simplify=True
)

print(f"‚úÖ ONNX model saved to: {onnx_path}")

In [None]:
# Create comprehensive download package with ALL results
print("üì¶ Creating comprehensive download package...")
print("=" * 50)

# Clear and recreate download directory
if DOWNLOAD_DIR.exists():
    shutil.rmtree(DOWNLOAD_DIR)
DOWNLOAD_DIR.mkdir(exist_ok=True)

# Create subdirectories
(DOWNLOAD_DIR / 'models').mkdir(exist_ok=True)
(DOWNLOAD_DIR / 'metrics').mkdir(exist_ok=True)
(DOWNLOAD_DIR / 'curves').mkdir(exist_ok=True)
(DOWNLOAD_DIR / 'visualizations').mkdir(exist_ok=True)
(DOWNLOAD_DIR / 'batch_samples').mkdir(exist_ok=True)

# Copy models
print("\nüìÅ Copying models...")
shutil.copy(final_model_path, DOWNLOAD_DIR / 'models' / 'best.pt')
if (MODELS_DIR / 'lego_detector_last.pt').exists():
    shutil.copy(MODELS_DIR / 'lego_detector_last.pt', DOWNLOAD_DIR / 'models' / 'last.pt')
if Path(onnx_path).exists():
    shutil.copy(onnx_path, DOWNLOAD_DIR / 'models' / Path(onnx_path).name)
print("   ‚úÖ Models copied")

# Copy metrics files (CSV and YAML)
print("\nüìÅ Copying metrics files...")
metrics_files = ['results.csv', 'args.yaml']
for f in metrics_files:
    src = train_results_dir / f
    if src.exists():
        shutil.copy(src, DOWNLOAD_DIR / 'metrics' / f)
        print(f"   ‚úÖ {f}")

# Copy curve images
print("\nüìÅ Copying curve plots...")
curve_files = [
    'BoxF1_curve.png',
    'BoxP_curve.png', 
    'BoxR_curve.png',
    'BoxPR_curve.png',
    'results.png'
]
for f in curve_files:
    src = train_results_dir / f
    if src.exists():
        shutil.copy(src, DOWNLOAD_DIR / 'curves' / f)
        print(f"   ‚úÖ {f}")

# Copy visualization images
print("\nüìÅ Copying visualizations...")
viz_files = [
    'confusion_matrix.png',
    'confusion_matrix_normalized.png',
    'labels.jpg',
    'labels_correlogram.jpg'
]
for f in viz_files:
    src = train_results_dir / f
    if src.exists():
        shutil.copy(src, DOWNLOAD_DIR / 'visualizations' / f)
        print(f"   ‚úÖ {f}")

# Copy batch samples
print("\nüìÅ Copying batch samples...")
for f in train_results_dir.iterdir():
    if 'batch' in f.name and f.suffix in ['.jpg', '.png']:
        shutil.copy(f, DOWNLOAD_DIR / 'batch_samples' / f.name)
        print(f"   ‚úÖ {f.name}")

# Create a summary text file
print("\nüìÅ Creating summary file...")
summary_content = f"""LEGO Assembly Error Detection - Training Results Summary
{'=' * 60}
Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}

DATASET
{'-' * 40}
Total images: {total_count}
Train: {train_count} ({train_count/total_count*100:.1f}%)
Val: {val_count} ({val_count/total_count*100:.1f}%)
Test: {test_count} ({test_count/total_count*100:.1f}%)

MODEL
{'-' * 40}
Architecture: {MODEL_CONFIG['variant']}
Image size: {DATASET_CONFIG['image_size']}
Classes: {CLASS_NAMES}

TRAINING CONFIG
{'-' * 40}
Epochs: {TRAINING_CONFIG['epochs']}
Batch size: {TRAINING_CONFIG['batch_size']}
Learning rate: {TRAINING_CONFIG['learning_rate']}
Optimizer: Adam
Early stopping patience: {TRAINING_CONFIG['patience']}

TEST SET RESULTS
{'-' * 40}
Precision: {test_results.results_dict['metrics/precision(B)']:.4f}
Recall: {test_results.results_dict['metrics/recall(B)']:.4f}
mAP@0.5: {test_results.results_dict['metrics/mAP50(B)']:.4f}
mAP@0.5:0.95: {test_results.results_dict['metrics/mAP50-95(B)']:.4f}

PACKAGE CONTENTS
{'-' * 40}
models/
  - best.pt (Best model weights)
  - last.pt (Last epoch weights)
  - *.onnx (ONNX export for deployment)

metrics/
  - results.csv (Per-epoch training metrics)
  - args.yaml (Training configuration)

curves/
  - results.png (Training curves summary)
  - BoxF1_curve.png (F1-Confidence curve)
  - BoxP_curve.png (Precision-Confidence curve)
  - BoxR_curve.png (Recall-Confidence curve)
  - BoxPR_curve.png (Precision-Recall curve)

visualizations/
  - confusion_matrix.png
  - confusion_matrix_normalized.png
  - labels.jpg (Label distribution)

batch_samples/
  - train_batch*.jpg (Training batch visualizations)
  - val_batch*_labels.jpg (Validation ground truth)
  - val_batch*_pred.jpg (Validation predictions)

{'=' * 60}
GitHub: https://github.com/tanamujaya/lego_assembly_detection
Dataset: https://www.kaggle.com/datasets/tanakamujaya/lad-dataset-5-0
"""

with open(DOWNLOAD_DIR / 'SUMMARY.txt', 'w') as f:
    f.write(summary_content)
print("   ‚úÖ SUMMARY.txt")

print("\n" + "=" * 50)
print("‚úÖ Download package created!")

In [None]:
# Create zip file for download
print("üì¶ Creating zip archive...")

zip_filename = f"lego_detection_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.zip"
zip_path = BASE_DIR / zip_filename

shutil.make_archive(
    str(zip_path).replace('.zip', ''),
    'zip',
    DOWNLOAD_DIR
)

zip_size_mb = zip_path.stat().st_size / 1024 / 1024
print(f"\n‚úÖ Zip archive created: {zip_filename}")
print(f"   Size: {zip_size_mb:.2f} MB")

In [None]:
# Download the complete package
from google.colab import files

print("üì• Starting download...")
print("\nPackage contents:")
print("  üìÅ models/ - Trained model weights (best.pt, last.pt, ONNX)")
print("  üìÅ metrics/ - Training metrics CSV and configuration")
print("  üìÅ curves/ - All training curves and plots")
print("  üìÅ visualizations/ - Confusion matrices and label distributions")
print("  üìÅ batch_samples/ - Training and validation batch visualizations")
print("  üìÑ SUMMARY.txt - Results summary")
print()

files.download(str(zip_path))

---
## 9. Summary

Training complete! Here's what was accomplished:

In [None]:
# Print final summary
print("=" * 60)
print("üéâ TRAINING COMPLETE!")
print("=" * 60)

print(f"\nüìä Dataset:")
print(f"   Train: {train_count} images")
print(f"   Val:   {val_count} images")
print(f"   Test:  {test_count} images")
print(f"   Total: {total_count} images")

print(f"\nü§ñ Model:")
print(f"   Architecture: {MODEL_CONFIG['variant']}")
print(f"   Image size: {DATASET_CONFIG['image_size']}")
print(f"   Classes: {CLASS_NAMES}")

print(f"\nüìà Test Set Results:")
print(f"   Precision:    {test_results.results_dict['metrics/precision(B)']:.4f}")
print(f"   Recall:       {test_results.results_dict['metrics/recall(B)']:.4f}")
print(f"   mAP@0.5:      {test_results.results_dict['metrics/mAP50(B)']:.4f}")
print(f"   mAP@0.5:0.95: {test_results.results_dict['metrics/mAP50-95(B)']:.4f}")

print(f"\nüì¶ Download Package:")
print(f"   {zip_filename} ({zip_size_mb:.2f} MB)")

print("\n" + "=" * 60)
print("Next steps:")
print("1. Download the results package (click link above)")
print("2. Extract and review all metrics")
print("3. Deploy best.pt to Raspberry Pi 4B")
print("4. Run inference using inference.py")
print("=" * 60)

---

## üìö Resources

- **GitHub Repository:** [github.com/tanamujaya/lego_assembly_detection](https://github.com/tanamujaya/lego_assembly_detection)
- **Dataset:** [Kaggle - LAD Dataset 5.0](https://www.kaggle.com/datasets/tanakamujaya/lad-dataset-5-0)
- **YOLOv8 Documentation:** [docs.ultralytics.com](https://docs.ultralytics.com)

---

*Created as part of Bachelor's Thesis at HS Rhein-Waal University of Applied Sciences*