# 🥫 6th Place Solution: Soup Can Multi-instance Object Detection (0.989 mAP)

## 🧠 Approach Overview

This solution uses **YOLOv11** to detect and locate Soup Cans in images. By carefully adjusting the training process and enriching the dataset with synthetic images generated by **Falcon**, we achieved a **0.989 mAP** score in the Multi-instance Object Detection Challenge.

---

## 📁 1. Data Strategy

- **Dataset Structure**:
  - Used the full dataset (training + validation) provided by the competition.
  - Added 5 synthetic data folders: `new_data`, `new_data2`, `new_data3`, `new_data4`, `new_data5`.
  - Each folder contains 100 images (81 train, 19 val).
  - Manually excluded images that do **not contain Soup Cans** or have **unclear bounding boxes**.

- **Dataset Summary**:
  - **Total Images**: 544
  - **Training**: 466 images
  - **Validation**: 78 images

- **Test Set**:

  - Used the **test data provided** in the competition.

- **Dataset Set Link**: [Dataset](https://www.kaggle.com/datasets/younesbenalia/multi-instance-object-detection-challenge)

- **Class Handling**:
  - **Single-class detection** (Soup Cans).
  - YOLO-style annotations.

- **Augmentations**:
  - **Mosaic** (disabled in the last 20 epochs).
  - **HSV color space manipulations**:
    - Hue ±0.015
    - Saturation ±0.7
    - Value ±0.4
  - **Spatial transforms**:
    - Flip (50%)
    - Translation (10%)
    - Scaling (50%)
  - **Image resizing** to `1056 × 1056`.

---

## 🏗️ 2. Model Architecture

- **Model**: `YOLOv11x`

- **Optimization**:
  - Optimizer: **SGD** with momentum (0.937)
  - Scheduler: **Cosine Learning Rate Scheduler**
    - Initial learning rate: `0.001`
  - **Weight Decay**: `0.0005` for regularization
  - **Training Duration**: 300 epochs (without early stopping)
  - **Warmup Settings**:
    - `warmup_epochs = 3`
    - `warmup_momentum = 1`

- **Metric Target**:
  - Optimized for **mAP@0.5 IoU**

---

## 🔎 3. Inference Enhancements

- **Model Selection**:
  - Used the checkpoint that performed best on the validation set: `best.pt`

- **Test-Time Augmentation (TTA)**:
  - Flipping and minor zooming during inference to improve predictions.

- **Detection Parameters**:
  - **IoU Threshold**: `0.4`
  - **Max Detections**: `600` per image
  - **Confidence Threshold**: `0` (favoring high recall)

- **Multiscale Inference**:
  - Images were evaluated at 5 different scales:
    - `640`, `1056`, `1440`, `1920`, `2560`
  - Predictions were captured separately for each size.

- **Post-Processing**:
  - **Invalid box filtering**
  - **Weighted Box Fusion (WBF)**:
    - IoU threshold: `0.5`
    - Skip box threshold: `0.01`

---

## 📤 4. Submission

- **Prediction Validation**:
  - Strict format checking to comply with submission requirements.
  - Automatic handling for images with **no detections**.

- **Submission Format**:
  - Final predictions are converted into the required format.
  - **Output File**: `submission_wbf_0.5.csv`

---

## ♻️ Reproducibility

- Fixed random seeds across the entire training pipeline.
- **Hardware** used: `NVIDIA Tesla T4 GPU`.

---

## 📝 Additional Notes

- Synthetic data proved critical for improving generalization due to limited original training images.
- Using a **zero confidence threshold** paired with **WBF** allowed for aggressive box fusion, significantly boosting mAP.
- The pipeline was designed to be **robust to overfitting** through heavy augmentations and proper validation splits.

---

## 📇 Contact Information

If you'd like to connect, collaborate, or ask questions, feel free to reach out:

- **👨‍💻 Author**: Younes Benalia  
- **📧 Email**: [younes.benalia.dz@gmail.com]  
- **🔗 LinkedIn**: [younesbenalia](https://www.linkedin.com/in/younesbenalia)  
- **📊 Kaggle**: [younesbenalia](https://www.kaggle.com/younesbenalia)  
- **🐙 GitHub**: [younesBenalia](https://github.com/younesBenalia)
- **🐙 Huggingface**: [younesbenalia](https://huggingface.co/younesbenalia)  
- **📊 Kaggle notebook**: [notebook](https://www.kaggle.com/code/younesbenalia/soup-can-detection-6th-place-solution)  
---



# Imports

In [None]:
from IPython.display import clear_output
!pip install ultralytics
!pip install ensemble-boxes
clear_output()

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from ultralytics import YOLO
from pathlib import Path
import csv
import os
import random
import torch
# Set random seeds for reproducibility
np.random.seed(42)
random.seed(42)
torch.manual_seed(42)

# Dataset filtering

In [None]:
# === Dataset base path ===
base_path = Path("/kaggle/input/d/younesbenalia/multi-instance-object-detection-challenge/Starter_Dataset")

# === Image folders for train and val (from your YAML)
train_folders = [
    "clutter/train/images",
    "couch_far_10/train/images",
    "far_10_half_clutter/train/images",
    "film_grain_10_half_clutter/train/images",
    "large_plant_10/train/images",
    "new_data/train/images",
    "new_data2/train/images",
    "new_data3/train/images",
    "new_data4/train/images",
    "new_data5/train/images",
    "no_clutter_10/train/images",
    "table_close_10/train/images"
]

val_folders = [
    "clutter/val/images",
    "couch_far_10/val/images",
    "far_10_half_clutter/val/images",
    "film_grain_10_half_clutter/val/images",
    "large_plant_10/val/images",
    "new_data/val/images",
    "new_data2/val/images",
    "new_data3/val/images",
    "new_data4/val/images",
    "new_data5/val/images",
    "no_clutter_10/val/images",
    "table_close_10/val/images"
]

# === Images to exclude (full paths)
excluded_images = {
    str(base_path / "couch_far_10/train/images/000000004.png"),
    str(base_path / "far_10_half_clutter/train/images/000000004.png"),
    str(base_path / "new_data/train/images/000000018.png"),
    str(base_path / "new_data/train/images/000000049.png"),
    str(base_path / "new_data/train/images/000000051.png"),
    str(base_path / "new_data/train/images/000000069.png"),
    str(base_path / "new_data/train/images/000000073.png"),
    # str(base_path / "new_data/train/images/000000076.png"),
    str(base_path / "new_data/val/images/000000003.png"),
    str(base_path / "new_data2/train/images/000000012.png"),
    str(base_path / "new_data2/train/images/000000029.png"),
    str(base_path / "new_data2/train/images/000000030.png"),
    str(base_path / "new_data2/train/images/000000039.png"),
    str(base_path / "new_data2/train/images/000000062.png"),
    str(base_path / "new_data2/train/images/000000067.png"),
    str(base_path / "new_data2/train/images/000000079.png"),
    str(base_path / "new_data3/train/images/000000013.png"),
    str(base_path / "new_data3/train/images/000000036.png"),
    str(base_path / "new_data3/train/images/000000055.png"),
    str(base_path / "new_data3/train/images/000000059.png"),
    str(base_path / "new_data5/train/images/000000018.png"),
    str(base_path / "new_data2/val/images/000000001.png"),
    str(base_path / "new_data2/val/images/000000087.png"),
    str(base_path / "new_data2/val/images/000000090.png"),
    str(base_path / "new_data3/val/images/000000002.png"),
    str(base_path / "new_data4/val/images/000000008.png"),
    str(base_path / "new_data5/val/images/000000001.png"),
  
}
 

def build_image_list(folders, split_name, output_txt_path):
    image_paths = []

    for folder in folders:
        full_dir = base_path / folder
        for img_path in full_dir.glob("*"):
            if img_path.suffix.lower() in ['.jpg', '.jpeg', '.png']:
                if str(img_path) not in excluded_images:
                    image_paths.append(str(img_path))

    # Save to txt
    output_path = Path("/kaggle/working") / output_txt_path
    with open(output_path, "w") as f:
        f.write("\n".join(image_paths))

    print(f"[✅] Saved {len(image_paths)} image paths to {output_path}")

build_image_list(train_folders, "train", "train_filtered.txt")
build_image_list(val_folders, "val", "val_filtered.txt")


data_yaml = """
train: /kaggle/working/train_filtered.txt
val: /kaggle/working/val_filtered.txt  # Optional - same process for val

nc: 1
names: ['Soup']
"""
with open('custom_data.yaml', 'w') as file:
    file.write(data_yaml)

# Trainng

In [None]:
model = YOLO("yolo11x.pt")
# data_yaml = "/kaggle/input/d/younesbenalia/multi-instance-object-detection-challenge/Starter_Dataset/yolo_params.yaml"
data_yaml = "/kaggle/working/custom_data.yaml"


model.train(
    data=data_yaml,
    epochs=300,                
    batch=4,                   
    imgsz=1056,
    patience=300,               
    optimizer='SGD',
    momentum=0.937,          
    lr0=0.001,                
    weight_decay=0.0005,       
    cos_lr=True,               
    save_period=5,             
    workers=8,
    # Augmentations
    close_mosaic=20,
    hsv_h=0.015,
    hsv_s=0.7,
    hsv_v=0.4,
    flipud=0,
    fliplr=0.5,
    translate=0.1,
    scale=0.5,
    shear=0,
    warmup_epochs= 3,
    warmup_momentum= 1,
)


# Inference & Post-Processing

In [None]:
def filter_invalid_boxes(boxes, scores, labels):
    filtered_boxes, filtered_scores, filtered_labels = [], [], []
    for b, s, l in zip(boxes, scores, labels):
        if abs(b[2] - b[0]) > 1e-6 and abs(b[3] - b[1]) > 1e-6:
            filtered_boxes.append(b)
            filtered_scores.append(s)
            filtered_labels.append(l)
    return filtered_boxes, filtered_scores, filtered_labels
    
def run_inference(models, image_sizes, test_images_path):
    image_paths = [p for p in Path(test_images_path).glob("*") if p.suffix.lower() in [".jpg", ".jpeg", ".png"]]
    predictions = {}

    for model_idx, model in enumerate(models):
        model.eval()
        model.training = False
        predictions[model_idx] = {}
        for size in image_sizes:
            predictions[model_idx][size] = {}
            pred = []
            for img_path in image_paths:
                image_id = img_path.stem
                image = Image.open(img_path)
                img_width, img_height = image.size

                results = model.predict(source=str(img_path), conf=conf,iou=0.4, max_det=600, augment=True, imgsz=size, verbose=False)
                boxes, scores, labels = [], [], []

                for result in results:
                    if result.boxes is None:
                        continue
                    boxes = result.boxes.xyxy.cpu().numpy().tolist()
                    scores = result.boxes.conf.cpu().numpy().tolist()
                    labels = result.boxes.cls.cpu().numpy().tolist()

                    norm_boxes = [
                        [x1 / img_width, y1 / img_height, x2 / img_width, y2 / img_height]
                        for x1, y1, x2, y2 in boxes
                    ]
                    norm_boxes, scores, labels = filter_invalid_boxes(norm_boxes, scores, labels)

                predictions[model_idx][size][image_id] = {
                    "boxes": norm_boxes,
                    "scores": scores,
                    "labels": labels
                }
                
                if boxes:
                    prediction_string = " ".join(
                        f"{int(lbl)} {score:.6f} {(b[0]+b[2])/2:.6f} {(b[1]+b[3])/2:.6f} {(b[2]-b[0]):.6f} {(b[3]-b[1]):.6f}"
                        for b, score, lbl in zip(norm_boxes, scores, labels)
                    )
                else:
                    prediction_string = "no boxes"

                pred.append({
                    "image_id": image_id,
                    "prediction_string": prediction_string
                })

            # Save CSV per model and size
            df = pd.DataFrame(pred)
            csv_path = f"submission_{model_idx}_{size}.csv"
            df.to_csv(csv_path, index=False, quoting=csv.QUOTE_MINIMAL)
            print(f"[saved] {csv_path}")
            print(df.head(10))

    return predictions

def apply_wbf_and_save_final_submission(predictions, image_ids, output_path="submission_wbf.csv"):
    wbf_results = []

    for image_id in image_ids:
        all_boxes, all_scores, all_labels = [], [], []

        for model_preds in predictions.values():
            for size_preds in model_preds.values():
                if image_id not in size_preds:
                    continue
                pred = size_preds[image_id]
                if not pred["boxes"]:
                    continue
                all_boxes.append(pred["boxes"])
                all_scores.append(pred["scores"])
                all_labels.append(pred["labels"])

        if not all_boxes:
            pred_str = "no boxes"
        else:
            fused_boxes, fused_scores, fused_labels = weighted_boxes_fusion(
                all_boxes, all_scores, all_labels, iou_thr=iou_thr, skip_box_thr=skip_box_thr
            )

            pred_str = " ".join(
                f"{int(lbl)} {score:.6f} {(b[0]+b[2])/2:.6f} {(b[1]+b[3])/2:.6f} {(b[2]-b[0]):.6f} {(b[3]-b[1]):.6f}"
                for b, score, lbl in zip(fused_boxes, fused_scores, fused_labels)
            )

        wbf_results.append({
            "image_id": image_id,
            "prediction_string": pred_str
        })

    wbf_df = pd.DataFrame(wbf_results)
    wbf_df.to_csv(output_path, index=False, quoting=csv.QUOTE_MINIMAL)
    print(f"[notice] ✅ WBF submission saved to {output_path}")
    print(wbf_df.head(10))



# Submission

In [None]:
import os
from pathlib import Path
import pandas as pd
import csv
from ultralytics import YOLO
from ensemble_boxes import weighted_boxes_fusion
from PIL import Image

model_paths = [
    "/kaggle/working/runs/detect/train/weights/best.pt",
]

test_images_path = "/kaggle/input/multi-instance-object-detection-challenge/Starter_Dataset/TestImages/images"
output_dir = "/kaggle/working/predictions/labels"
conf = 0
iou_thr = 0.5
skip_box_thr = 0.01
image_sizes = [640, 1056, 1440, 1920, 2560]

models = [YOLO(path) for path in model_paths]
predictions = run_inference(models, image_sizes, test_images_path)

image_ids = list(next(iter(next(iter(predictions.values())).values())).keys())

apply_wbf_and_save_final_submission(predictions, image_ids, output_path=f"submission_wbf_{iou_thr}.csv")

