# üî¨ YOLOv11 Training - TBX11K Tuberculosis Detection
## CSE475 Machine Learning Lab Assignment

---

**Student:** Shahriar Khan, Rifah Tamannah, Khalid Mahmud Joy, Tanvir Rahman  
**Institution:** East West University  
**Model:** YOLOv11n (Nano)  
**Dataset:** TBX11K Small Dataset (800 images total)  
**Training Epochs:** 30  
**Optimization:** Configured for small dataset with aggressive augmentation

---

### üìã Notebook Overview

This notebook trains **YOLOv11n** model for tuberculosis detection with:
- ‚úÖ AGGRESSIVE augmentation
- ‚úÖ Larger image size
- ‚úÖ Smaller batch size
- ‚úÖ Conservative learning rate
- ‚úÖ Strong regularization
- ‚úÖ Comprehensive visualizations
- ‚úÖ Training curves and metrics
- ‚úÖ Confusion matrix analysis
- ‚úÖ Sample predictions

### ‚ö†Ô∏è Dataset
- **Training:** 600 images
- **Validation:** 200 images
- **Total:** 800 images

## üì¶ Section 1: Environment Setup

In [17]:
# Installation cell - Run ONCE, then RESTART kernel
print("üîß Installing compatible packages for Kaggle...")
print("=" * 80)

# Fix NumPy/Matplotlib compatibility
!pip install -q "numpy<2.0" --force-reinstall

# Fix OpenCV compatibility
!pip uninstall -y opencv-python opencv-python-headless opencv-contrib-python 2>/dev/null
!pip install -q opencv-python-headless==4.8.1.78

# Install YOLO
!pip install -q --no-deps ultralytics
!pip install -q pillow tqdm pyyaml

print("=" * 80)
print("‚úÖ Installation complete!")
print("‚ö†Ô∏è  RESTART KERNEL NOW: Run ‚Üí Restart Session")
print("=" * 80)

üîß Installing compatible packages for Kaggle...
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ultralytics 8.3.224 requires opencv-python>=4.6.0, which is not installed.
ultralytics 8.3.224 requires ultralytics-thop>=2.0.18, which is not installed.
dopamine-rl 4.1.2 requires opencv-python>=3.4.8.29, which is not installed.
bigframes 2.12.0 requires google-cloud-bigquery-storage<3.0.0,>=2.30.0, which is not installed.
gensim 4.3.3 requires scipy<1.14.0,>=1.7.0, but you have scipy 1.15.3 which is incompatible.
datasets 4.1.1 requires pyarrow>=21.0.0, but you have pyarrow 19.0.1 which is incompatible.
onnx 1.18.0 requires protobuf>=4.25.1, but you have protobuf 3.20.3 which is incompatible.
cesium 0.12.4 requires numpy<3.0,>=2.0, but you have numpy 1.26.4 which is incompatible.
google-colab 1.0.0 requires google-auth==2.38.0, but you have google-auth 2.40.

In [18]:
# Core Libraries
import os
import sys
import json
import time
import random
import warnings
from pathlib import Path
from datetime import datetime

# Data Processing
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import seaborn as sns

# Computer Vision
import cv2
from PIL import Image

# Deep Learning
import torch
from ultralytics import YOLO

# Warnings
warnings.filterwarnings('ignore')

# Set random seeds
random.seed(42)
np.random.seed(42)
torch.manual_seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)

# Display settings
plt.style.use('default')
sns.set_palette("husl")

print("‚úÖ All libraries imported successfully!")
print(f"üì¶ PyTorch version: {torch.__version__}")
print(f"üñ•Ô∏è  CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"üéÆ GPU: {torch.cuda.get_device_name(0)}")
    print(f"üíæ GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")

‚úÖ All libraries imported successfully!
üì¶ PyTorch version: 2.6.0+cu124
üñ•Ô∏è  CUDA available: True
üéÆ GPU: Tesla T4
üíæ GPU Memory: 14.74 GB


## ‚öôÔ∏è Section 2: Configuration

In [19]:
# Create data.yaml file (no external dependencies)
data_yaml_content = """# TBX11K Dataset Configuration
path: /kaggle/input/tbx11k-small/tbx11k-small-balanced
train: images/train
val: images/val

nc: 3

names:
  0: Active Tuberculosis
  1: Obsolete Pulmonary TB
  2: Pulmonary Tuberculosis
"""

# Write to data.yaml
with open('data.yaml', 'w') as f:
    f.write(data_yaml_content)

print("‚úÖ data.yaml created successfully!")
print("\nüìÑ Content:")
with open('data.yaml', 'r') as f:
    print(f.read())

‚úÖ data.yaml created successfully!

üìÑ Content:
# TBX11K Dataset Configuration
path: /kaggle/input/tbx11k-small/tbx11k-small-balanced
train: images/train
val: images/val

nc: 3

names:
  0: Active Tuberculosis
  1: Obsolete Pulmonary TB
  2: Pulmonary Tuberculosis



In [20]:
class YOLOv11Config:
    """Configuration for YOLOv11 training on TBX11K dataset"""
    
    # ========== DATASET PATHS (KAGGLE OPTIMIZED) ==========
    # Update this to match your Kaggle dataset name after upload
    DATASET_NAME = 'tbx11k-small/tbx11k-small-balanced'  # Change this to your uploaded dataset name
    DATASET_PATH = f'/kaggle/input/{DATASET_NAME}'
    DATA_YAML = '/kaggle/working/data.yaml'
    
    # ========== OUTPUT PATHS ==========
    OUTPUT_DIR = Path('/kaggle/working')
    MODEL_DIR = OUTPUT_DIR / 'yolov11_model'
    PLOTS_DIR = OUTPUT_DIR / 'yolov11_plots'
    RESULTS_DIR = OUTPUT_DIR / 'yolov11_results'
    
    # Create directories
    for directory in [MODEL_DIR, PLOTS_DIR, RESULTS_DIR]:
        directory.mkdir(parents=True, exist_ok=True)
    
    # ========== MODEL CONFIGURATION ==========
    MODEL_NAME = 'YOLOv11n'
    MODEL_WEIGHTS = 'yolo11n.pt'
    
    # ========== TRAINING HYPERPARAMETERS ==========
    IMG_SIZE = 640
    BATCH_SIZE = 8
    EPOCHS = 1
    PATIENCE = 15
    WORKERS = 0
    DEVICE = 0
    
    # ========== OPTIMIZER SETTINGS ==========
    OPTIMIZER = 'AdamW'
    LR0 = 0.0005
    LRF = 0.005
    MOMENTUM = 0.937
    WEIGHT_DECAY = 0.001
    WARMUP_EPOCHS = 5
    WARMUP_MOMENTUM = 0.8
    WARMUP_BIAS_LR = 0.1
    
    # ========== LOSS WEIGHTS ==========
    BOX = 7.5
    CLS = 1.5
    DFL = 1.5
    
    # ========== AUGMENTATION ==========
    DEGREES = 25.0
    TRANSLATE = 0.2
    SCALE = 0.5
    SHEAR = 10.0
    PERSPECTIVE = 0.001
    FLIPUD = 0.0
    FLIPLR = 0.5
    MOSAIC = 1.0
    MIXUP = 0.3
    COPY_PASTE = 0.3
    HSV_H = 0.0
    HSV_S = 0.0
    HSV_V = 0.6
    ERASING = 0.5
    
    # ========== REGULARIZATION ==========
    DROPOUT = 0.3
    LABEL_SMOOTHING = 0.1
    
    # ========== INFERENCE ==========
    CONF_THRESHOLD = 0.20
    IOU_THRESHOLD = 0.45
    
    # ========== DATASET INFO ==========
    NUM_CLASSES = 3
    CLASS_NAMES = {
        0: 'Active Tuberculosis',
        1: 'Obsolete Pulmonary TB',
        2: 'Pulmonary Tuberculosis'
    }
    
    # ========== VISUALIZATION ==========
    DPI = 150
    FIGSIZE = (15, 10)

config = YOLOv11Config()

print("=" * 80)
print("‚öôÔ∏è  YOLOv11 CONFIGURATION")
print("=" * 80)
print(f"üìÅ Dataset: {config.DATASET_PATH}")
print(f"üìÑ Data YAML: {config.DATA_YAML}")
print(f"ü§ñ Model: {config.MODEL_NAME}")
print(f"üñºÔ∏è  Image Size: {config.IMG_SIZE}x{config.IMG_SIZE}")
print(f"üì¶ Batch Size: {config.BATCH_SIZE}")
print(f"üîÑ Epochs: {config.EPOCHS}")
print(f"‚è±Ô∏è  Patience: {config.PATIENCE}")
print(f"üéØ Classes: {config.NUM_CLASSES}")
print(f"üíæ Output: {config.OUTPUT_DIR}")
print("=" * 80)
print("‚ö†Ô∏è  IMPORTANT: Update DATASET_NAME in config to match your Kaggle dataset!")
print("=" * 80)

‚öôÔ∏è  YOLOv11 CONFIGURATION
üìÅ Dataset: /kaggle/input/tbx11k-small/tbx11k-small-balanced
üìÑ Data YAML: /kaggle/working/data.yaml
ü§ñ Model: YOLOv11n
üñºÔ∏è  Image Size: 640x640
üì¶ Batch Size: 8
üîÑ Epochs: 1
‚è±Ô∏è  Patience: 15
üéØ Classes: 3
üíæ Output: /kaggle/working
‚ö†Ô∏è  IMPORTANT: Update DATASET_NAME in config to match your Kaggle dataset!


## üìä Section 3: Dataset Verification

In [21]:
# Verify dataset structure
print("üîç Verifying dataset structure...\n")

dataset_path = Path(config.DATASET_PATH)

# Check main directories
train_img_dir = dataset_path / 'images' / 'train'
train_lbl_dir = dataset_path / 'labels' / 'train'
val_img_dir = dataset_path / 'images' / 'val'
val_lbl_dir = dataset_path / 'labels' / 'val'

# Count files
train_images = list(train_img_dir.glob('*.png')) if train_img_dir.exists() else []
train_labels = list(train_lbl_dir.glob('*.txt')) if train_lbl_dir.exists() else []
val_images = list(val_img_dir.glob('*.png')) if val_img_dir.exists() else []
val_labels = list(val_lbl_dir.glob('*.txt')) if val_lbl_dir.exists() else []

print("üìÇ Dataset Structure:")
print(f"  ‚îú‚îÄ Training Images: {len(train_images)}")
print(f"  ‚îú‚îÄ Training Labels: {len(train_labels)}")
print(f"  ‚îú‚îÄ Validation Images: {len(val_images)}")
print(f"  ‚îî‚îÄ Validation Labels: {len(val_labels)}")

# Check data.yaml
data_yaml_path = Path(config.DATA_YAML)
if data_yaml_path.exists():
    print(f"\n‚úÖ data.yaml found: {data_yaml_path}")
    with open(data_yaml_path, 'r') as f:
        print("\nüìÑ data.yaml content:")
        print(f.read())
else:
    print(f"\n‚ö†Ô∏è  data.yaml not found at: {data_yaml_path}")

üîç Verifying dataset structure...

üìÇ Dataset Structure:
  ‚îú‚îÄ Training Images: 600
  ‚îú‚îÄ Training Labels: 600
  ‚îú‚îÄ Validation Images: 200
  ‚îî‚îÄ Validation Labels: 200

‚úÖ data.yaml found: /kaggle/working/data.yaml

üìÑ data.yaml content:
# TBX11K Dataset Configuration
path: /kaggle/input/tbx11k-small/tbx11k-small-balanced
train: images/train
val: images/val

nc: 3

names:
  0: Active Tuberculosis
  1: Obsolete Pulmonary TB
  2: Pulmonary Tuberculosis



In [None]:
def ensemble_predict(models, image_path, conf_threshold=0.25, iou_threshold=0.45):
    """
    Ensemble prediction using weighted voting from multiple models.
    
    Args:
        models: List of YOLO models
        image_path: Path to input image
        conf_threshold: Confidence threshold for predictions
        iou_threshold: IoU threshold for NMS
    
    Returns:
        Averaged predictions from all models
    """
    all_boxes = []
    all_scores = []
    all_classes = []
    
    # Get predictions from each model
    for model in models:
        results = model.predict(
            source=image_path,
            conf=conf_threshold,
            iou=iou_threshold,
            verbose=False
        )[0]
        
        if len(results.boxes) > 0:
            boxes = results.boxes.xyxy.cpu().numpy()
            scores = results.boxes.conf.cpu().numpy()
            classes = results.boxes.cls.cpu().numpy()
            
            all_boxes.append(boxes)
            all_scores.append(scores)
            all_classes.append(classes)
    
    if len(all_boxes) == 0:
        return None, None, None
    
    # Concatenate all predictions
    all_boxes = np.concatenate(all_boxes, axis=0)
    all_scores = np.concatenate(all_scores, axis=0)
    all_classes = np.concatenate(all_classes, axis=0)
    
    # Apply NMS to ensemble predictions
    from torchvision.ops import nms
    import torch
    
    boxes_tensor = torch.from_numpy(all_boxes)
    scores_tensor = torch.from_numpy(all_scores)
    
    keep_indices = nms(boxes_tensor, scores_tensor, iou_threshold)
    keep_indices = keep_indices.cpu().numpy()
    
    final_boxes = all_boxes[keep_indices]
    final_scores = all_scores[keep_indices]
    final_classes = all_classes[keep_indices]
    
    return final_boxes, final_scores, final_classes


# Test ensemble on sample images
print("="*80)
print("üîÆ TESTING K-FOLD ENSEMBLE")
print("="*80 + "\n")

# Get sample validation images from first fold
test_images = list((KFOLD_DIR / 'fold_1' / 'images' / 'val').glob('*.png'))[:3]

for img_path in test_images:
    print(f"\nüì∏ Processing: {img_path.name}")
    
    # Get ensemble predictions
    boxes, scores, classes = ensemble_predict(
        fold_models, 
        str(img_path),
        conf_threshold=0.25,
        iou_threshold=0.45
    )
    
    if boxes is not None:
        print(f"   ‚Ä¢ Detected {len(boxes)} objects")
        print(f"   ‚Ä¢ Avg confidence: {scores.mean():.3f}")
        print(f"   ‚Ä¢ Max confidence: {scores.max():.3f}")
    else:
        print(f"   ‚Ä¢ No detections")

# Compare best single model vs ensemble
print(f"\n{'='*80}")
print("üìä ENSEMBLE vs BEST SINGLE MODEL COMPARISON")
print(f"{'='*80}\n")

# Get best model
best_fold_idx = results_df['mAP50'].idxmax()
best_model = fold_models[best_fold_idx]

print(f"Best Single Model: Fold {best_fold_idx + 1}")
print(f"   ‚Ä¢ mAP@0.5: {results_df.loc[best_fold_idx, 'mAP50']:.4f}")
print(f"   ‚Ä¢ Precision: {results_df.loc[best_fold_idx, 'precision']:.4f}")
print(f"   ‚Ä¢ Recall: {results_df.loc[best_fold_idx, 'recall']:.4f}")
print(f"   ‚Ä¢ F1: {results_df.loc[best_fold_idx, 'f1']:.4f}")

print(f"\nüí° Ensemble combines predictions from all {len(fold_models)} models")
print("   Expected improvement: +2-5% mAP boost typically")

print(f"\n{'='*80}")
print("‚úÖ K-FOLD CROSS-VALIDATION SETUP COMPLETE!")
print(f"{'='*80}")
print(f"\nüìå Summary:")
print(f"   ‚Ä¢ Trained {len(fold_models)} models successfully")
print(f"   ‚Ä¢ Mean mAP@0.5: {results_df['mAP50'].mean():.4f} ¬± {results_df['mAP50'].std():.4f}")
print(f"   ‚Ä¢ Best single model: Fold {best_fold_idx + 1} ({results_df.loc[best_fold_idx, 'mAP50']:.4f})")
print(f"   ‚Ä¢ All models saved for ensemble predictions")
print(f"   ‚Ä¢ Total training time: {results_df['time_min'].sum():.1f} minutes")

## üîÆ Section 3.7: K-Fold Ensemble Predictions
Combine predictions from all fold models for improved accuracy

In [None]:
# Create results DataFrame
results_df = pd.DataFrame(fold_results)
print("\n" + "="*80)
print("üìä K-FOLD CROSS-VALIDATION RESULTS")
print("="*80 + "\n")
print(results_df.to_string(index=False))

# Calculate statistics
print(f"\n{'='*80}")
print("üìà AGGREGATE STATISTICS")
print(f"{'='*80}")
metrics = ['mAP50', 'mAP50_95', 'precision', 'recall', 'f1']
for metric in metrics:
    mean_val = results_df[metric].mean()
    std_val = results_df[metric].std()
    print(f"{metric:12s}: {mean_val:.4f} ¬± {std_val:.4f}")

# Find best fold
best_fold_idx = results_df['mAP50'].idxmax() + 1
best_mAP = results_df.loc[results_df['mAP50'].idxmax(), 'mAP50']
print(f"\nüèÜ Best Fold: Fold {best_fold_idx} (mAP@0.5 = {best_mAP:.4f})")

# Visualize results
fig, axes = plt.subplots(2, 3, figsize=(18, 10))
fig.suptitle('YOLOv11 K-Fold Cross-Validation Results', fontsize=16, fontweight='bold')

# Plot each metric
for idx, metric in enumerate(metrics):
    ax = axes[idx // 3, idx % 3]
    bars = ax.bar(results_df['fold'], results_df[metric], color='skyblue', edgecolor='navy', alpha=0.7)
    ax.axhline(results_df[metric].mean(), color='red', linestyle='--', linewidth=2, label='Mean')
    ax.set_xlabel('Fold', fontweight='bold')
    ax.set_ylabel(metric.upper(), fontweight='bold')
    ax.set_title(f'{metric.upper()}: {results_df[metric].mean():.4f} ¬± {results_df[metric].std():.4f}')
    ax.legend()
    ax.grid(alpha=0.3)
    
    # Highlight best fold
    best_idx = results_df[metric].idxmax()
    bars[best_idx].set_color('gold')
    bars[best_idx].set_edgecolor('darkgoldenrod')
    bars[best_idx].set_linewidth(2)

# Training time
ax = axes[1, 2]
ax.bar(results_df['fold'], results_df['time_min'], color='lightcoral', edgecolor='darkred', alpha=0.7)
ax.set_xlabel('Fold', fontweight='bold')
ax.set_ylabel('Training Time (min)', fontweight='bold')
ax.set_title(f'Training Time per Fold (Total: {results_df["time_min"].sum():.1f} min)')
ax.grid(alpha=0.3)

plt.tight_layout()
plt.savefig(config.PLOTS_DIR / 'kfold_results.png', dpi=300, bbox_inches='tight')
plt.show()

# Save results
results_df.to_csv(config.RESULTS_DIR / 'kfold_results.csv', index=False)
print(f"\nüíæ Results saved to {config.RESULTS_DIR / 'kfold_results.csv'}")

## üìä Section 3.6: K-Fold Results Analysis
Analyze cross-validation metrics across all folds

In [None]:
print("=" * 80)
print(f"üöÄ STARTING {N_FOLDS}-FOLD CROSS-VALIDATION TRAINING")
print("=" * 80)

# Storage for fold results
fold_results = []
fold_models = []
total_training_time = 0

for fold_idx in range(1, N_FOLDS + 1):
    print(f"\n{'='*80}")
    print(f"üìä FOLD {fold_idx}/{N_FOLDS}")
    print(f"{'='*80}\n")
    
    # Get fold data
    fold_dir = KFOLD_DIR / f'fold_{fold_idx}'
    fold_data_yaml = str(fold_dir / 'data.yaml')
    
    # Initialize model
    model = YOLO(config.MODEL_WEIGHTS)
    
    # Train
    fold_start = time.time()
    try:
        results = model.train(
            data=fold_data_yaml,
            epochs=config.EPOCHS,
            imgsz=config.IMG_SIZE,
            batch=config.BATCH_SIZE,
            device=config.DEVICE,
            workers=config.WORKERS,
            patience=config.PATIENCE,
            project=str(config.MODEL_DIR),
            name=f'fold_{fold_idx}',
            exist_ok=True,
            optimizer=config.OPTIMIZER,
            lr0=config.LR0,
            lrf=config.LRF,
            momentum=config.MOMENTUM,
            weight_decay=config.WEIGHT_DECAY,
            warmup_epochs=config.WARMUP_EPOCHS,
            warmup_momentum=config.WARMUP_MOMENTUM,
            warmup_bias_lr=config.WARMUP_BIAS_LR,
            box=config.BOX,
            cls=config.CLS,
            dfl=config.DFL,
            dropout=config.DROPOUT,
            label_smoothing=config.LABEL_SMOOTHING,
            degrees=config.DEGREES,
            translate=config.TRANSLATE,
            scale=config.SCALE,
            shear=config.SHEAR,
            perspective=config.PERSPECTIVE,
            mosaic=config.MOSAIC,
            mixup=config.MIXUP,
            copy_paste=config.COPY_PASTE,
            fliplr=config.FLIPLR,
            flipud=config.FLIPUD,
            hsv_h=config.HSV_H,
            hsv_s=config.HSV_S,
            hsv_v=config.HSV_V,
            erasing=config.ERASING,
            amp=False,
            plots=True,
            verbose=False
        )
        
        fold_time = time.time() - fold_start
        total_training_time += fold_time
        
        # Load best model and validate
        best_model_path = config.MODEL_DIR / f'fold_{fold_idx}' / 'weights' / 'best.pt'
        fold_model = YOLO(str(best_model_path))
        
        val_results = fold_model.val(
            data=fold_data_yaml,
            split='val',
            imgsz=config.IMG_SIZE,
            batch=config.BATCH_SIZE,
            device=config.DEVICE,
            verbose=False
        )
        
        # Store results
        fold_result = {
            'fold': fold_idx,
            'mAP50': float(val_results.box.map50),
            'mAP50_95': float(val_results.box.map),
            'precision': float(val_results.box.mp),
            'recall': float(val_results.box.mr),
            'f1': float(2 * val_results.box.mp * val_results.box.mr / (val_results.box.mp + val_results.box.mr + 1e-6)),
            'time_min': fold_time / 60,
            'model_path': str(best_model_path)
        }
        
        fold_results.append(fold_result)
        fold_models.append(fold_model)
        
        print(f"\n‚úÖ Fold {fold_idx} Complete!")
        print(f"   ‚Ä¢ mAP@0.5: {fold_result['mAP50']:.4f}")
        print(f"   ‚Ä¢ mAP@0.5:0.95: {fold_result['mAP50_95']:.4f}")
        print(f"   ‚Ä¢ Precision: {fold_result['precision']:.4f}")
        print(f"   ‚Ä¢ Recall: {fold_result['recall']:.4f}")
        print(f"   ‚Ä¢ F1 Score: {fold_result['f1']:.4f}")
        print(f"   ‚Ä¢ Time: {fold_result['time_min']:.1f} min")
        
    except Exception as e:
        print(f"‚ùå Error in Fold {fold_idx}: {str(e)}")
        continue

training_time = total_training_time
print(f"\n{'='*80}")
print(f"üéâ K-FOLD TRAINING COMPLETE!")
print(f"{'='*80}")
print(f"‚è±Ô∏è  Total training time: {training_time/60:.1f} minutes")
print(f"üìä Trained {len(fold_results)} models successfully")

In [None]:
# Create K-Fold splits
kfold = KFold(n_splits=N_FOLDS, shuffle=True, random_state=42)

print(f"\nüìÅ Creating {N_FOLDS} fold directories...\n")

for fold_idx, (train_indices, val_indices) in enumerate(kfold.split(all_images), 1):
    print(f"Processing Fold {fold_idx}/{N_FOLDS}...")
    
    # Create fold directories
    fold_dir = KFOLD_DIR / f'fold_{fold_idx}'
    (fold_dir / 'images' / 'train').mkdir(parents=True, exist_ok=True)
    (fold_dir / 'images' / 'val').mkdir(parents=True, exist_ok=True)
    (fold_dir / 'labels' / 'train').mkdir(parents=True, exist_ok=True)
    (fold_dir / 'labels' / 'val').mkdir(parents=True, exist_ok=True)
    
    # Copy training images and labels
    for idx in train_indices:
        img_path = all_images[idx]
        
        # Copy image
        dst_img = fold_dir / 'images' / 'train' / img_path.name
        shutil.copy2(img_path, dst_img)
        
        # Copy label
        if 'train' in str(img_path):
            lbl_path = dataset_path / 'labels' / 'train' / f"{img_path.stem}.txt"
        else:
            lbl_path = dataset_path / 'labels' / 'val' / f"{img_path.stem}.txt"
        
        if lbl_path.exists():
            dst_lbl = fold_dir / 'labels' / 'train' / f"{img_path.stem}.txt"
            shutil.copy2(lbl_path, dst_lbl)
    
    # Copy validation images and labels
    for idx in val_indices:
        img_path = all_images[idx]
        
        # Copy image
        dst_img = fold_dir / 'images' / 'val' / img_path.name
        shutil.copy2(img_path, dst_img)
        
        # Copy label
        if 'train' in str(img_path):
            lbl_path = dataset_path / 'labels' / 'train' / f"{img_path.stem}.txt"
        else:
            lbl_path = dataset_path / 'labels' / 'val' / f"{img_path.stem}.txt"
        
        if lbl_path.exists():
            dst_lbl = fold_dir / 'labels' / 'val' / f"{img_path.stem}.txt"
            shutil.copy2(lbl_path, dst_lbl)
    
    # Create data.yaml for this fold
    data_yaml_content = f"""# YOLOv11 K-Fold Data Configuration - Fold {fold_idx}
path: {fold_dir}
train: images/train
val: images/val

nc: {config.NUM_CLASSES}
names:
  0: {config.CLASS_NAMES[0]}
  1: {config.CLASS_NAMES[1]}
  2: {config.CLASS_NAMES[2]}
"""
    
    with open(fold_dir / 'data.yaml', 'w') as f:
        f.write(data_yaml_content)
    
    print(f"  ‚úì Fold {fold_idx}: {len(train_indices)} train, {len(val_indices)} val images")

print(f"\n{'='*80}")
print(f"‚úÖ K-Fold splits created successfully!")
print(f"{'='*80}")

In [None]:
from sklearn.model_selection import KFold
import shutil
from tqdm import tqdm

# K-Fold Configuration
N_FOLDS = 5
KFOLD_DIR = config.OUTPUT_DIR / 'kfold_splits'
KFOLD_DIR.mkdir(exist_ok=True)

# Get all images (combine train + val for K-Fold)
dataset_path = Path(config.DATASET_PATH)
train_imgs = list((dataset_path / 'images' / 'train').glob('*.png'))
val_imgs = list((dataset_path / 'images' / 'val').glob('*.png'))
all_images = train_imgs + val_imgs

print(f"üîÑ K-FOLD CROSS-VALIDATION SETUP")
print(f"{'='*80}")
print(f"  ‚Ä¢ Number of folds: {N_FOLDS}")
print(f"  ‚Ä¢ Total images: {len(all_images)}")
print(f"  ‚Ä¢ Images per fold (train): ~{len(all_images) * (N_FOLDS-1) / N_FOLDS:.0f}")
print(f"  ‚Ä¢ Images per fold (val): ~{len(all_images) / N_FOLDS:.0f}")
print(f"  ‚Ä¢ Output directory: {KFOLD_DIR}")
print(f"{'='*80}")

## üîÑ Section 3.5: K-Fold Cross-Validation Setup

To maximize the small dataset (800 images), we'll use **5-Fold Cross-Validation**:
- Each fold trains on ~640 images, validates on ~160 images
- All 800 images used for both training and validation across folds
- Provides more robust performance estimates
- Enables ensemble predictions from 5 models

## üìà Section 4: Model Selection & Validation
Use the best performing fold model for validation and inference

In [None]:
# Use best fold model from K-Fold CV
best_fold_idx = results_df['mAP50'].idxmax()
best_model = fold_models[best_fold_idx]

print("=" * 80)
print("üìä VALIDATING BEST FOLD MODEL")
print("=" * 80)

print(f"\nüèÜ Using Best Fold Model: Fold {best_fold_idx + 1}")
print(f"   ‚Ä¢ mAP@0.5: {results_df.loc[best_fold_idx, 'mAP50']:.4f}")
print(f"   ‚Ä¢ Model Path: {results_df.loc[best_fold_idx, 'model_path']}")

# Validate on fold's validation set
fold_data_yaml = str(KFOLD_DIR / f'fold_{best_fold_idx + 1}' / 'data.yaml')

print("\n‚è≥ Running validation...\n")
val_results = best_model.val(
    data=fold_data_yaml,
    split='val',
    imgsz=config.IMG_SIZE,
    batch=config.BATCH_SIZE,
    conf=config.CONF_THRESHOLD,
    iou=config.IOU_THRESHOLD,
    device=config.DEVICE,
    workers=config.WORKERS,
    plots=True,
    save_json=True,
    project=str(config.RESULTS_DIR),
    name='validation',
    exist_ok=True
)

# Extract metrics
print("\n" + "=" * 80)
print("üìà VALIDATION RESULTS")
print("=" * 80)
print(f"  ‚Ä¢ mAP@0.5:     {val_results.box.map50:.4f}")
print(f"  ‚Ä¢ mAP@0.5:0.95: {val_results.box.map:.4f}")
print(f"  ‚Ä¢ Precision:    {val_results.box.mp:.4f}")
print(f"  ‚Ä¢ Recall:       {val_results.box.mr:.4f}")
print(f"  ‚Ä¢ Fitness:      {val_results.fitness:.4f}")
print("=" * 80)

# Save metrics to JSON
metrics = {
    'model': config.MODEL_NAME,
    'epochs': config.EPOCHS,
    'training_time_minutes': training_time / 60,
    'best_fold': int(best_fold_idx + 1),
    'mAP50': float(val_results.box.map50),
    'mAP50_95': float(val_results.box.map),
    'precision': float(val_results.box.mp),
    'recall': float(val_results.box.mr),
    'fitness': float(val_results.fitness),
    'kfold_mean_mAP50': float(results_df['mAP50'].mean()),
    'kfold_std_mAP50': float(results_df['mAP50'].std()),
}

metrics_file = config.RESULTS_DIR / 'yolov11_metrics.json'
with open(metrics_file, 'w') as f:
    json.dump(metrics, f, indent=2)

print(f"\nüíæ Metrics saved to: {metrics_file}")

## üìä Section 5: Training Curves (Best Fold Model)

In [None]:
# Read training results from best fold
best_fold_num = best_fold_idx + 1
results_csv = config.MODEL_DIR / f'fold_{best_fold_num}' / 'results.csv'

if results_csv.exists():
    df = pd.read_csv(results_csv)
    df.columns = df.columns.str.strip()
    
    # Create comprehensive training curves
    fig, axes = plt.subplots(3, 3, figsize=(20, 15))
    fig.suptitle(f'YOLOv11 Training Curves - Best Fold (Fold {best_fold_num})', fontsize=18, fontweight='bold')
    
    epochs = df['epoch'] if 'epoch' in df.columns else range(len(df))
    
    # Plot 1: mAP@0.5
    if 'metrics/mAP50(B)' in df.columns:
        axes[0, 0].plot(epochs, df['metrics/mAP50(B)'], linewidth=2.5, color='blue', label='mAP@0.5')
        axes[0, 0].fill_between(epochs, df['metrics/mAP50(B)'], alpha=0.3, color='blue')
        axes[0, 0].set_title('mAP@0.5', fontsize=14, fontweight='bold')
        axes[0, 0].set_xlabel('Epoch')
        axes[0, 0].set_ylabel('mAP@0.5')
        axes[0, 0].grid(True, alpha=0.3)
        axes[0, 0].legend()
    
    # Plot 2: mAP@0.5:0.95
    if 'metrics/mAP50-95(B)' in df.columns:
        axes[0, 1].plot(epochs, df['metrics/mAP50-95(B)'], linewidth=2.5, color='green', label='mAP@0.5:0.95')
        axes[0, 1].fill_between(epochs, df['metrics/mAP50-95(B)'], alpha=0.3, color='green')
        axes[0, 1].set_title('mAP@0.5:0.95', fontsize=14, fontweight='bold')
        axes[0, 1].set_xlabel('Epoch')
        axes[0, 1].set_ylabel('mAP@0.5:0.95')
        axes[0, 1].grid(True, alpha=0.3)
        axes[0, 1].legend()
    
    # Plot 3: Precision
    if 'metrics/precision(B)' in df.columns:
        axes[0, 2].plot(epochs, df['metrics/precision(B)'], linewidth=2.5, color='orange', label='Precision')
        axes[0, 2].fill_between(epochs, df['metrics/precision(B)'], alpha=0.3, color='orange')
        axes[0, 2].set_title('Precision', fontsize=14, fontweight='bold')
        axes[0, 2].set_xlabel('Epoch')
        axes[0, 2].set_ylabel('Precision')
        axes[0, 2].grid(True, alpha=0.3)
        axes[0, 2].legend()
    
    # Plot 4: Recall
    if 'metrics/recall(B)' in df.columns:
        axes[1, 0].plot(epochs, df['metrics/recall(B)'], linewidth=2.5, color='red', label='Recall')
        axes[1, 0].fill_between(epochs, df['metrics/recall(B)'], alpha=0.3, color='red')
        axes[1, 0].set_title('Recall', fontsize=14, fontweight='bold')
        axes[1, 0].set_xlabel('Epoch')
        axes[1, 0].set_ylabel('Recall')
        axes[1, 0].grid(True, alpha=0.3)
        axes[1, 0].legend()
    
    # Plot 5: Box Loss
    if 'train/box_loss' in df.columns and 'val/box_loss' in df.columns:
        axes[1, 1].plot(epochs, df['train/box_loss'], linewidth=2, label='Train', color='blue')
        axes[1, 1].plot(epochs, df['val/box_loss'], linewidth=2, label='Val', color='red')
        axes[1, 1].set_title('Box Loss', fontsize=14, fontweight='bold')
        axes[1, 1].set_xlabel('Epoch')
        axes[1, 1].set_ylabel('Loss')
        axes[1, 1].grid(True, alpha=0.3)
        axes[1, 1].legend()
    
    # Plot 6: Class Loss
    if 'train/cls_loss' in df.columns and 'val/cls_loss' in df.columns:
        axes[1, 2].plot(epochs, df['train/cls_loss'], linewidth=2, label='Train', color='blue')
        axes[1, 2].plot(epochs, df['val/cls_loss'], linewidth=2, label='Val', color='red')
        axes[1, 2].set_title('Classification Loss', fontsize=14, fontweight='bold')
        axes[1, 2].set_xlabel('Epoch')
        axes[1, 2].set_ylabel('Loss')
        axes[1, 2].grid(True, alpha=0.3)
        axes[1, 2].legend()
    
    # Plot 7: DFL Loss
    if 'train/dfl_loss' in df.columns and 'val/dfl_loss' in df.columns:
        axes[2, 0].plot(epochs, df['train/dfl_loss'], linewidth=2, label='Train', color='blue')
        axes[2, 0].plot(epochs, df['val/dfl_loss'], linewidth=2, label='Val', color='red')
        axes[2, 0].set_title('DFL Loss', fontsize=14, fontweight='bold')
        axes[2, 0].set_xlabel('Epoch')
        axes[2, 0].set_ylabel('Loss')
        axes[2, 0].grid(True, alpha=0.3)
        axes[2, 0].legend()
    
    # Plot 8: F1 Score (calculated)
    if 'metrics/precision(B)' in df.columns and 'metrics/recall(B)' in df.columns:
        precision = df['metrics/precision(B)']
        recall = df['metrics/recall(B)']
        f1 = 2 * (precision * recall) / (precision + recall + 1e-6)
        axes[2, 1].plot(epochs, f1, linewidth=2.5, color='purple', label='F1 Score')
        axes[2, 1].fill_between(epochs, f1, alpha=0.3, color='purple')
        axes[2, 1].set_title('F1 Score', fontsize=14, fontweight='bold')
        axes[2, 1].set_xlabel('Epoch')
        axes[2, 1].set_ylabel('F1 Score')
        axes[2, 1].grid(True, alpha=0.3)
        axes[2, 1].legend()
    
    # Plot 9: Learning Rate
    if 'lr/pg0' in df.columns:
        axes[2, 2].plot(epochs, df['lr/pg0'], linewidth=2, color='brown', label='Learning Rate')
        axes[2, 2].set_title('Learning Rate Schedule', fontsize=14, fontweight='bold')
        axes[2, 2].set_xlabel('Epoch')
        axes[2, 2].set_ylabel('Learning Rate')
        axes[2, 2].grid(True, alpha=0.3)
        axes[2, 2].legend()
    
    plt.tight_layout()
    save_path = config.PLOTS_DIR / 'training_curves_best_fold.png'
    plt.savefig(save_path, dpi=config.DPI, bbox_inches='tight')
    plt.show()
    
    print(f"‚úÖ Training curves saved to: {save_path}")
else:
    print(f"‚ö†Ô∏è  Results CSV not found at: {results_csv}")

## üéØ Section 6: Confusion Matrix (Best Fold)

In [None]:
# Check for confusion matrix
confusion_matrix_path = config.RESULTS_DIR / 'validation' / 'confusion_matrix.png'

if confusion_matrix_path.exists():
    print("üìä Displaying Confusion Matrix\n")
    
    img = Image.open(confusion_matrix_path)
    
    fig, ax = plt.subplots(1, 1, figsize=(12, 10))
    ax.imshow(img)
    ax.axis('off')
    ax.set_title('YOLOv11 - Confusion Matrix (Normalized)', fontsize=16, fontweight='bold', pad=20)
    
    plt.tight_layout()
    save_path = config.PLOTS_DIR / 'confusion_matrix_display.png'
    plt.savefig(save_path, dpi=config.DPI, bbox_inches='tight')
    plt.show()
    
    print(f"‚úÖ Confusion matrix saved to: {save_path}")
else:
    print(f"‚ö†Ô∏è  Confusion matrix not found at: {confusion_matrix_path}")

## üñºÔ∏è Section 7: Sample Predictions (Best Fold Model)

In [None]:
print("=" * 80)
print("üñºÔ∏è  GENERATING SAMPLE PREDICTIONS")
print("=" * 80)

# Get validation images from best fold
best_fold_num = best_fold_idx + 1
val_img_dir = KFOLD_DIR / f'fold_{best_fold_num}' / 'images' / 'val'
val_lbl_dir = KFOLD_DIR / f'fold_{best_fold_num}' / 'labels' / 'val'

val_images_with_labels = []
for img_path in val_img_dir.glob('*.png'):
    label_path = val_lbl_dir / f"{img_path.stem}.txt"
    if label_path.exists() and label_path.stat().st_size > 0:
        val_images_with_labels.append(img_path)

# Select random samples
num_samples = min(9, len(val_images_with_labels))
selected_samples = random.sample(val_images_with_labels, num_samples)

print(f"\nüì∏ Generating predictions for {num_samples} samples from Fold {best_fold_num}...\n")

# Create prediction grid
fig, axes = plt.subplots(3, 3, figsize=(20, 20))
axes = axes.flatten()
fig.suptitle(f'YOLOv11 - Sample Predictions (Best Fold {best_fold_num})', fontsize=18, fontweight='bold')

for idx, img_path in enumerate(selected_samples):
    # Run prediction
    results = best_model.predict(
        source=str(img_path),
        conf=config.CONF_THRESHOLD,
        iou=config.IOU_THRESHOLD,
        imgsz=config.IMG_SIZE,
        device=config.DEVICE,
        verbose=False
    )
    
    # Get annotated image
    annotated = results[0].plot()
    annotated_rgb = cv2.cvtColor(annotated, cv2.COLOR_BGR2RGB)
    
    # Display
    axes[idx].imshow(annotated_rgb)
    num_detections = len(results[0].boxes)
    axes[idx].set_title(f'{img_path.stem}\n({num_detections} detections)', 
                       fontsize=11, fontweight='bold')
    axes[idx].axis('off')

plt.tight_layout()
save_path = config.PLOTS_DIR / 'sample_predictions.png'
plt.savefig(save_path, dpi=config.DPI, bbox_inches='tight')
plt.show()

print(f"‚úÖ Sample predictions saved to: {save_path}")

## üìä Section 9: Metrics Summary (K-Fold + Best Fold)

## üé® Section 8: Enhanced Detection Visualizations (Best Fold)

In [None]:
# ============================================================================
# VISUALIZATION 1: Prediction Grid with ALL Images (Even Without Detections)
# ============================================================================
print("=" * 80)
print("üñºÔ∏è  CREATING COMPREHENSIVE PREDICTION VISUALIZATIONS")
print("=" * 80)

# Use best fold's validation images
best_fold_num = best_fold_idx + 1
val_img_dir = KFOLD_DIR / f'fold_{best_fold_num}' / 'images' / 'val'
all_val_images = list(val_img_dir.glob('*.png'))

if len(all_val_images) == 0:
    print("‚ùå No validation images found!")
else:
    num_samples = min(9, len(all_val_images))
    selected_samples = random.sample(all_val_images, num_samples)
    
    print(f"\nüì∏ Generating predictions for {num_samples} validation images from Fold {best_fold_num}...\n")
    
    fig, axes = plt.subplots(3, 3, figsize=(20, 20))
    axes = axes.flatten()
    fig.suptitle(f'YOLOv11 - Sample Predictions (Fold {best_fold_num})', fontsize=18, fontweight='bold', y=0.995)
    
    for idx, img_path in enumerate(selected_samples):
        try:
            results = best_model.predict(
                source=str(img_path),
                conf=config.CONF_THRESHOLD,
                iou=config.IOU_THRESHOLD,
                imgsz=config.IMG_SIZE,
                device=config.DEVICE,
                verbose=False
            )
            
            annotated = results[0].plot()
            annotated_rgb = cv2.cvtColor(annotated, cv2.COLOR_BGR2RGB)
            
            axes[idx].imshow(annotated_rgb)
            num_detections = len(results[0].boxes)
            
            if num_detections > 0:
                title_color = 'green'
                title = f'{img_path.stem}\n‚úì {num_detections} detection(s)'
            else:
                title_color = 'red'
                title = f'{img_path.stem}\n‚úó No detections'
            
            axes[idx].set_title(title, fontsize=11, fontweight='bold', color=title_color)
            axes[idx].axis('off')
            
        except Exception as e:
            print(f"‚ö†Ô∏è  Error processing {img_path.name}: {str(e)}")
            axes[idx].text(0.5, 0.5, f'Error: {img_path.stem}', ha='center', va='center')
            axes[idx].axis('off')
    
    plt.tight_layout()
    save_path = config.PLOTS_DIR / 'sample_predictions_enhanced.png'
    plt.savefig(save_path, dpi=150, bbox_inches='tight')
    plt.show()
    
    print(f"‚úÖ Sample predictions saved to: {save_path}")

In [None]:
# ============================================================================
# VISUALIZATION 2: Confidence Distribution Analysis
# ============================================================================
print("\n" + "=" * 80)
print("üìä CONFIDENCE SCORE ANALYSIS")
print("=" * 80)

# Use best fold validation images
best_fold_num = best_fold_idx + 1
all_val_images = list((KFOLD_DIR / f'fold_{best_fold_num}' / 'images' / 'val').glob('*.png'))

all_confidences = []
all_classes = []
detection_stats = {'with_detection': 0, 'without_detection': 0}

print(f"\nüîç Analyzing {len(all_val_images)} validation images from Fold {best_fold_num}...\n")

for img_path in tqdm(all_val_images, desc="Processing"):
    results = best_model.predict(
        source=str(img_path),
        conf=config.CONF_THRESHOLD,
        iou=config.IOU_THRESHOLD,
        imgsz=config.IMG_SIZE,
        device=config.DEVICE,
        verbose=False
    )
    
    if len(results[0].boxes) > 0:
        detection_stats['with_detection'] += 1
        for box in results[0].boxes:
            all_confidences.append(float(box.conf.cpu()))
            all_classes.append(int(box.cls.cpu()))
    else:
        detection_stats['without_detection'] += 1

fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('YOLOv11 - Detection Confidence Analysis', fontsize=16, fontweight='bold')

if all_confidences:
    axes[0, 0].hist(all_confidences, bins=20, color='skyblue', edgecolor='black', alpha=0.7)
    axes[0, 0].axvline(config.CONF_THRESHOLD, color='red', linestyle='--', 
                       label=f'Threshold: {config.CONF_THRESHOLD}', linewidth=2)
    axes[0, 0].set_xlabel('Confidence Score', fontsize=12)
    axes[0, 0].set_ylabel('Frequency', fontsize=12)
    axes[0, 0].set_title(f'Confidence Score Distribution\n(Total Detections: {len(all_confidences)})', fontweight='bold')
    axes[0, 0].legend()
    axes[0, 0].grid(alpha=0.3)
else:
    axes[0, 0].text(0.5, 0.5, 'No Detections Found', ha='center', va='center', fontsize=14, color='red')
    axes[0, 0].set_title('Confidence Score Distribution', fontweight='bold')

categories = ['Images with\nDetections', 'Images without\nDetections']
values = [detection_stats['with_detection'], detection_stats['without_detection']]
colors = ['#4CAF50', '#f44336']

axes[0, 1].bar(categories, values, color=colors, edgecolor='black', linewidth=2)
axes[0, 1].set_ylabel('Number of Images', fontsize=12)
axes[0, 1].set_title('Detection Coverage Analysis', fontweight='bold')
for i, v in enumerate(values):
    axes[0, 1].text(i, v + max(values)*0.02, str(v), ha='center', va='bottom', 
                    fontsize=14, fontweight='bold')
axes[0, 1].grid(axis='y', alpha=0.3)

if all_classes:
    class_counts = pd.Series(all_classes).value_counts().sort_index()
    class_names = [config.CLASS_NAMES[i] for i in class_counts.index]
    
    axes[1, 0].barh(class_names, class_counts.values, color='coral', edgecolor='black')
    axes[1, 0].set_xlabel('Number of Detections', fontsize=12)
    axes[1, 0].set_title('Detections per Class', fontweight='bold')
    for i, v in enumerate(class_counts.values):
        axes[1, 0].text(v + max(class_counts.values)*0.02, i, str(v), 
                        va='center', fontweight='bold')
    axes[1, 0].grid(axis='x', alpha=0.3)
else:
    axes[1, 0].text(0.5, 0.5, 'No Detections Found', ha='center', va='center', fontsize=14, color='red')
    axes[1, 0].set_title('Detections per Class', fontweight='bold')

if all_confidences and all_classes:
    conf_by_class = pd.DataFrame({'Class': all_classes, 'Confidence': all_confidences})
    conf_by_class['Class_Name'] = conf_by_class['Class'].map(lambda x: config.CLASS_NAMES[x])
    
    class_names_unique = conf_by_class['Class_Name'].unique()
    box_data = [conf_by_class[conf_by_class['Class_Name'] == cn]['Confidence'].values 
                for cn in class_names_unique]
    
    bp = axes[1, 1].boxplot(box_data, labels=class_names_unique, patch_artist=True)
    for patch in bp['boxes']:
        patch.set_facecolor('lightgreen')
    axes[1, 1].set_ylabel('Confidence Score', fontsize=12)
    axes[1, 1].set_title('Confidence Distribution by Class', fontweight='bold')
    axes[1, 1].grid(axis='y', alpha=0.3)
    axes[1, 1].tick_params(axis='x', rotation=15)
else:
    axes[1, 1].text(0.5, 0.5, 'No Detections Found', ha='center', va='center', fontsize=14, color='red')
    axes[1, 1].set_title('Confidence Distribution by Class', fontweight='bold')

plt.tight_layout()
save_path = config.PLOTS_DIR / 'confidence_analysis.png'
plt.savefig(save_path, dpi=150, bbox_inches='tight')
plt.show()

print(f"\n‚úÖ Confidence analysis saved to: {save_path}")
print(f"\nüìà Detection Summary:")
print(f"   ‚Ä¢ Images with detections: {detection_stats['with_detection']} ({detection_stats['with_detection']/len(all_val_images)*100:.1f}%)")
print(f"   ‚Ä¢ Images without detections: {detection_stats['without_detection']} ({detection_stats['without_detection']/len(all_val_images)*100:.1f}%)")
print(f"   ‚Ä¢ Total detections: {len(all_confidences)}")
if all_confidences:
    print(f"   ‚Ä¢ Average confidence: {np.mean(all_confidences):.4f}")
    print(f"   ‚Ä¢ Min confidence: {np.min(all_confidences):.4f}")
    print(f"   ‚Ä¢ Max confidence: {np.max(all_confidences):.4f}")

In [None]:
# ============================================================================
# VISUALIZATION 3: High-Confidence vs Low-Confidence Detections
# ============================================================================
print("\n" + "=" * 80)
print("üéØ HIGH vs LOW CONFIDENCE DETECTIONS")
print("=" * 80)

if all_confidences:
    high_conf_images = []
    low_conf_images = []
    
    threshold_high = 0.7
    threshold_low = 0.4
    
    for img_path in all_val_images:
        results = best_model.predict(
            source=str(img_path),
            conf=config.CONF_THRESHOLD,
            imgsz=config.IMG_SIZE,
            device=config.DEVICE,
            verbose=False
        )
        
        if len(results[0].boxes) > 0:
            max_conf = max([float(box.conf.cpu()) for box in results[0].boxes])
            if max_conf >= threshold_high:
                high_conf_images.append((img_path, results, max_conf))
            elif max_conf <= threshold_low:
                low_conf_images.append((img_path, results, max_conf))
    
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    fig.suptitle('YOLOv11 - Confidence Comparison: High vs Low', fontsize=16, fontweight='bold')
    
    for idx in range(3):
        if idx < len(high_conf_images):
            img_path, results, conf = high_conf_images[idx]
            annotated = results[0].plot()
            annotated_rgb = cv2.cvtColor(annotated, cv2.COLOR_BGR2RGB)
            axes[0, idx].imshow(annotated_rgb)
            axes[0, idx].set_title(f'HIGH CONF: {conf:.3f}\n{img_path.stem}', 
                                  fontsize=10, fontweight='bold', color='green')
        else:
            axes[0, idx].text(0.5, 0.5, 'No High\nConfidence\nDetections', 
                            ha='center', va='center', fontsize=12)
        axes[0, idx].axis('off')
    
    for idx in range(3):
        if idx < len(low_conf_images):
            img_path, results, conf = low_conf_images[idx]
            annotated = results[0].plot()
            annotated_rgb = cv2.cvtColor(annotated, cv2.COLOR_BGR2RGB)
            axes[1, idx].imshow(annotated_rgb)
            axes[1, idx].set_title(f'LOW CONF: {conf:.3f}\n{img_path.stem}', 
                                  fontsize=10, fontweight='bold', color='orange')
        else:
            axes[1, idx].text(0.5, 0.5, 'No Low\nConfidence\nDetections', 
                            ha='center', va='center', fontsize=12)
        axes[1, idx].axis('off')
    
    plt.tight_layout()
    save_path = config.PLOTS_DIR / 'confidence_comparison.png'
    plt.savefig(save_path, dpi=150, bbox_inches='tight')
    plt.show()
    
    print(f"\n‚úÖ Confidence comparison saved to: {save_path}")
    print(f"   ‚Ä¢ High confidence images (‚â•{threshold_high}): {len(high_conf_images)}")
    print(f"   ‚Ä¢ Low confidence images (‚â§{threshold_low}): {len(low_conf_images)}")
else:
    print("\n‚ö†Ô∏è  No detections found to compare!")

In [None]:
# Create comprehensive metrics summary
print("=" * 80)
print("üìä YOLOV11 K-FOLD FINAL METRICS SUMMARY")
print("=" * 80)

best_fold_num = best_fold_idx + 1

summary_data = {
    'Metric': [
        'Model',
        'K-Fold Configuration',
        'Best Fold',
        'Epochs per Fold',
        'Total Training Time (min)',
        'Best Fold mAP@0.5',
        'Best Fold mAP@0.5:0.95',
        'Best Fold Precision',
        'Best Fold Recall',
        'Best Fold F1 Score',
        'K-Fold Mean mAP@0.5',
        'K-Fold Std mAP@0.5',
        'Image Size',
        'Batch Size',
        'Optimizer',
        'Learning Rate',
    ],
    'Value': [
        config.MODEL_NAME,
        f'{N_FOLDS}-Fold CV',
        f'Fold {best_fold_num}',
        config.EPOCHS,
        f"{training_time/60:.2f}",
        f"{float(val_results.box.map50):.4f}",
        f"{float(val_results.box.map):.4f}",
        f"{float(val_results.box.mp):.4f}",
        f"{float(val_results.box.mr):.4f}",
        f"{2 * (float(val_results.box.mp) * float(val_results.box.mr)) / (float(val_results.box.mp) + float(val_results.box.mr) + 1e-6):.4f}",
        f"{results_df['mAP50'].mean():.4f}",
        f"{results_df['mAP50'].std():.4f}",
        f"{config.IMG_SIZE}x{config.IMG_SIZE}",
        config.BATCH_SIZE,
        config.OPTIMIZER,
        config.LR0,
    ]
}

summary_df = pd.DataFrame(summary_data)

# Display styled table
fig, ax = plt.subplots(1, 1, figsize=(12, 10))
ax.axis('tight')
ax.axis('off')

table = ax.table(cellText=summary_df.values, colLabels=summary_df.columns,
                cellLoc='left', loc='center',
                colWidths=[0.6, 0.4])

table.auto_set_font_size(False)
table.set_fontsize(11)
table.scale(1, 2.5)

# Style header
for i in range(len(summary_df.columns)):
    table[(0, i)].set_facecolor('#4CAF50')
    table[(0, i)].set_text_props(weight='bold', color='white')

# Alternate row colors
for i in range(1, len(summary_df) + 1):
    for j in range(len(summary_df.columns)):
        if i % 2 == 0:
            table[(i, j)].set_facecolor('#f0f0f0')

plt.title('YOLOv11 K-Fold Cross-Validation Metrics', fontsize=16, fontweight='bold', pad=20)
save_path = config.PLOTS_DIR / 'metrics_summary.png'
plt.savefig(save_path, dpi=config.DPI, bbox_inches='tight')
plt.show()

print(f"\n‚úÖ Metrics summary saved to: {save_path}")

# Also save as CSV
csv_path = config.RESULTS_DIR / 'metrics_summary.csv'
summary_df.to_csv(csv_path, index=False)
print(f"‚úÖ Metrics CSV saved to: {csv_path}")

## üìã Section 10: Final Report (K-Fold Results)

In [None]:
# Generate comprehensive markdown report
report_path = config.RESULTS_DIR / 'yolov11_training_report.md'

best_fold_num = best_fold_idx + 1

report_content = f"""# YOLOv11 K-Fold Training Report - TBX11K Tuberculosis Detection

**Date:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}  
**Student:** Shahriar Khan, Rifah Tamannah, Khalid Mahmud Joy, Tanvir Rahman  
**Institution:** East West University  
**Course:** CSE475 - Machine Learning Lab

---

## üéØ Model Information

- **Model:** {config.MODEL_NAME}
- **Architecture:** YOLOv11 Nano
- **Pretrained Weights:** {config.MODEL_WEIGHTS}
- **Task:** Object Detection (Tuberculosis in Chest X-rays)
- **Training Strategy:** {N_FOLDS}-Fold Cross-Validation

---

## üìä Dataset

- **Dataset:** TBX11K Small (Balanced)
- **Total Images:** {len(all_images)}
- **Images per Fold (Train):** ~{len(all_images) * (N_FOLDS-1) / N_FOLDS:.0f}
- **Images per Fold (Val):** ~{len(all_images) / N_FOLDS:.0f}
- **Classes:** {config.NUM_CLASSES}
  - Class 0: {config.CLASS_NAMES[0]}
  - Class 1: {config.CLASS_NAMES[1]}
  - Class 2: {config.CLASS_NAMES[2]}
- **Image Size:** {config.IMG_SIZE}x{config.IMG_SIZE}

---

## ‚öôÔ∏è Training Configuration

### K-Fold Cross-Validation
- **Number of Folds:** {N_FOLDS}
- **Random Seed:** 42
- **Shuffle:** True

### Hyperparameters
- **Epochs per Fold:** {config.EPOCHS}
- **Batch Size:** {config.BATCH_SIZE}
- **Optimizer:** {config.OPTIMIZER}
- **Initial Learning Rate:** {config.LR0}
- **Final LR Factor:** {config.LRF}
- **Momentum:** {config.MOMENTUM}
- **Weight Decay:** {config.WEIGHT_DECAY}
- **Warmup Epochs:** {config.WARMUP_EPOCHS}
- **Patience:** {config.PATIENCE}

### Loss Weights
- **Box Loss:** {config.BOX}
- **Class Loss:** {config.CLS}
- **DFL Loss:** {config.DFL}

### Data Augmentation
- **Rotation:** ¬±{config.DEGREES}¬∞
- **Translation:** {config.TRANSLATE * 100}%
- **Scaling:** ¬±{config.SCALE * 100}%
- **Shearing:** ¬±{config.SHEAR}¬∞
- **Horizontal Flip:** {config.FLIPLR * 100}%
- **Mosaic:** {config.MOSAIC * 100}%
- **MixUp:** {config.MIXUP * 100}%
- **Copy-Paste:** {config.COPY_PASTE * 100}%
- **Random Erasing:** {config.ERASING * 100}%
- **HSV Augmentation:** H={config.HSV_H}, S={config.HSV_S}, V={config.HSV_V}

---

## üìà K-Fold Cross-Validation Results

### Per-Fold Performance

| Fold | mAP@0.5 | mAP@0.5:0.95 | Precision | Recall | F1 Score | Time (min) |
|------|---------|--------------|-----------|--------|----------|------------|
"""

for _, row in results_df.iterrows():
    report_content += f"| {int(row['fold'])} | {row['mAP50']:.4f} | {row['mAP50_95']:.4f} | {row['precision']:.4f} | {row['recall']:.4f} | {row['f1']:.4f} | {row['time_min']:.1f} |\n"

report_content += f"""

### Aggregate Statistics

| Metric | Mean ¬± Std |
|--------|------------|
| **mAP@0.5** | {results_df['mAP50'].mean():.4f} ¬± {results_df['mAP50'].std():.4f} |
| **mAP@0.5:0.95** | {results_df['mAP50_95'].mean():.4f} ¬± {results_df['mAP50_95'].std():.4f} |
| **Precision** | {results_df['precision'].mean():.4f} ¬± {results_df['precision'].std():.4f} |
| **Recall** | {results_df['recall'].mean():.4f} ¬± {results_df['recall'].std():.4f} |
| **F1 Score** | {results_df['f1'].mean():.4f} ¬± {results_df['f1'].std():.4f} |

---

## üèÜ Best Fold Model

- **Best Fold:** Fold {best_fold_num}
- **Best mAP@0.5:** {results_df.loc[best_fold_idx, 'mAP50']:.4f}
- **Model Path:** `{results_df.loc[best_fold_idx, 'model_path']}`

### Best Fold Validation Metrics

| Metric | Value |
|--------|-------|
| **mAP@0.5** | {float(val_results.box.map50):.4f} |
| **mAP@0.5:0.95** | {float(val_results.box.map):.4f} |
| **Precision** | {float(val_results.box.mp):.4f} |
| **Recall** | {float(val_results.box.mr):.4f} |
| **F1 Score** | {2 * (float(val_results.box.mp) * float(val_results.box.mr)) / (float(val_results.box.mp) + float(val_results.box.mr) + 1e-6):.4f} |
| **Fitness** | {float(val_results.fitness):.4f} |

---

## ‚è±Ô∏è Training Time

- **Total Training Time:** {training_time/60:.2f} minutes ({training_time/3600:.2f} hours)
- **Average Time per Fold:** {training_time/60/N_FOLDS:.2f} minutes

---

## üìÅ Output Files

### K-Fold Models
"""

for fold_num in range(1, N_FOLDS + 1):
    report_content += f"- Fold {fold_num}: `{config.MODEL_DIR / f'fold_{fold_num}' / 'weights' / 'best.pt'}`\n"

report_content += f"""

### Visualizations
- K-Fold results: `{config.PLOTS_DIR / 'kfold_results.png'}`
- Training curves (best fold): `{config.PLOTS_DIR / 'training_curves_best_fold.png'}`
- Confusion matrix: `{config.PLOTS_DIR / 'confusion_matrix_display.png'}`
- Sample predictions: `{config.PLOTS_DIR / 'sample_predictions.png'}`
- Confidence analysis: `{config.PLOTS_DIR / 'confidence_analysis.png'}`
- Confidence comparison: `{config.PLOTS_DIR / 'confidence_comparison.png'}`
- Metrics summary: `{config.PLOTS_DIR / 'metrics_summary.png'}`

### Results
- K-Fold results CSV: `{config.RESULTS_DIR / 'kfold_results.csv'}`
- Metrics JSON: `{config.RESULTS_DIR / 'yolov11_metrics.json'}`
- Metrics CSV: `{config.RESULTS_DIR / 'metrics_summary.csv'}`

---

## ‚úÖ Conclusion

YOLOv11 K-Fold cross-validation training completed successfully with {N_FOLDS} folds. The approach achieved:
- **Mean mAP@0.5 of {results_df['mAP50'].mean():.4f} ¬± {results_df['mAP50'].std():.4f}** across all folds
- **Best single model mAP@0.5 of {results_df.loc[best_fold_idx, 'mAP50']:.4f}** (Fold {best_fold_num})
- **Total training time of {training_time/60:.2f} minutes** for {N_FOLDS} models

K-Fold cross-validation provides:
- More robust performance estimates
- Better utilization of limited dataset ({len(all_images)} images)
- Ensemble prediction capability from {N_FOLDS} models
- Reduced overfitting risk

All {N_FOLDS} models saved for potential ensemble predictions (+2-5% mAP boost typically).

---

*Generated automatically by YOLOv11 K-Fold training notebook*
"""

with open(report_path, 'w') as f:
    f.write(report_content)

print("=" * 80)
print("üìã YOLOV11 K-FOLD TRAINING COMPLETE!")
print("=" * 80)
print(f"\n‚úÖ Final report saved to: {report_path}")
print(f"\nüìä K-Fold Summary:")
print(f"  ‚Ä¢ Model: {config.MODEL_NAME}")
print(f"  ‚Ä¢ Folds: {N_FOLDS}")
print(f"  ‚Ä¢ Epochs per Fold: {config.EPOCHS}")
print(f"  ‚Ä¢ Total Training Time: {training_time/60:.2f} min")
print(f"  ‚Ä¢ Mean mAP@0.5: {results_df['mAP50'].mean():.4f} ¬± {results_df['mAP50'].std():.4f}")
print(f"  ‚Ä¢ Best Fold: {best_fold_num} (mAP@0.5: {results_df.loc[best_fold_idx, 'mAP50']:.4f})")
print(f"  ‚Ä¢ Best Fold Precision: {float(val_results.box.mp):.4f}")
print(f"  ‚Ä¢ Best Fold Recall: {float(val_results.box.mr):.4f}")
print(f"\nüíæ All outputs saved to: {config.OUTPUT_DIR}")
print("=" * 80)