# 2025 DL Lab5: Object Detection on Pascal VOC

Before we start, please put **your name** and **SID** in following format: <br>
Hi I'm 陸仁賈, 314831000.

**Your Answer:**    
Hi I'm XXX, XXXXXXXXX

## Overview

This project focuses on object detection using the Pascal VOC dataset. 

The goal is to identify and locate various objects within images by training and evaluating detection models.
 
The dataset provides annotated images across multiple categories, making it a standard benchmark for evaluating object detection performance.


## Kaggle Competition
Kaggle is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish datasets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

This assignment use kaggle to calculate your grade.  
Please use this [**LINK**](https://www.kaggle.com/t/3fd493e454a744bdacc7f2918f9a2605) to join the competition.

## Unzip Data

Unzip `dataset.zip` 

+ `vocall_test.txt` : list for the training set
+ `vocall_test.txt` : list for the validation set
+ `vocall_test.txt` : list for the test set
+ `image/` : contains all images.


The train set contains 8,218 images, the val set contains 3,823 images, and the test set contains 8,920 images.


#### You are allowed to use a **backbone model**, but only those available from the **timm package** (https://huggingface.co/timm/models).

# Import package

In [None]:
import os
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
import numpy as np
from torch.amp import autocast, GradScaler
from src.yolo import getODmodel
from yolo_loss import YOLOv3Loss
from src.dataset import VocDetectorDataset, train_data_pipelines, test_data_pipelines, collate_fn
from src.eval_voc import evaluate
from src.config import GRID_SIZES, ANCHORS
from torch.optim.lr_scheduler import CosineAnnealingLR

In [None]:
#####hyperparameters#####
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
num_epochs = 50
batch_size = 40
learning_rate = 1e-3
lambda_coord=5.0
lambda_obj=1.0
lambda_noobj=0.5
lambda_class=1.0

In [None]:
# Data paths
file_root_train = './dataset/image/'
annotation_file_train = './dataset/vocall_train.txt'
file_root_val = './dataset/image/'
annotation_file_val = './dataset/vocall_val.txt'
 # Data paths
file_root_train = './dataset/image/'
annotation_file_train = './dataset/vocall_train.txt'
file_root_val = './dataset/image/'
annotation_file_val = './dataset/vocall_val.txt'

# Create datasets
print('Loading datasets...')
train_dataset = VocDetectorDataset(
    root_img_dir=file_root_train,
    dataset_file=annotation_file_train,
    train=True,
    transform=train_data_pipelines,
    grid_sizes=GRID_SIZES,
    encode_target=True
)
train_loader = DataLoader(
    train_dataset,
    batch_size=batch_size,
    collate_fn=collate_fn,
    shuffle=True,
    num_workers=4,
)
print(f'Loaded {len(train_dataset)} train images')

val_dataset = VocDetectorDataset(
    root_img_dir=file_root_val,
    dataset_file=annotation_file_val,
    train=False,
    transform=test_data_pipelines,
    grid_sizes=GRID_SIZES,
    encode_target=True,
)
val_loader = DataLoader(
    val_dataset,
    batch_size=batch_size,
    collate_fn=collate_fn,
    shuffle=False,
    num_workers=4,
)
#for computing val maps
eval_dataset = VocDetectorDataset(
    root_img_dir=file_root_val,
    dataset_file=annotation_file_val,
    train=False,
    transform=test_data_pipelines,
    grid_sizes=GRID_SIZES,
    encode_target=False,
)
eval_loader = DataLoader(
    eval_dataset,
    batch_size=batch_size,
    collate_fn=collate_fn,
    shuffle=False,
    num_workers=4
)
print(f'Loaded {len(val_dataset)} val images')

## Initialization

### Only backbone model on timm is acceptable (https://huggingface.co/timm/models).
### You can modify model name in yolo class

In [None]:
load_network_path = None #'checkpoints/best_detector.pth' 
pretrained = True
model = getODmodel(pretrained=pretrained).to(device)

### Some training utils, use mix precision if valid

In [None]:
# Create loss and optimizer
criterion = YOLOv3Loss(lambda_coord, lambda_obj, lambda_noobj, lambda_class, ANCHORS).to(device)
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=5e-4)
lr_scheduler = CosineAnnealingLR(optimizer, T_max=num_epochs, eta_min=1e-6)
use_amp = torch.cuda.is_available()
scaler = GradScaler(enabled=use_amp)

### Training Loop

In [None]:
# Training loop
print('\nStarting training...')
torch.cuda.empty_cache()
best_val_loss = np.inf
for epoch in range(num_epochs):
    model.train()
    print(f'\n\nStarting epoch {epoch + 1} / {num_epochs}')
    for i, (images, target) in enumerate(train_loader):
        # Move to device
        images = images.to(device)
        target = [t.to(device) for t in target]
        # Forward pass
        optimizer.zero_grad()
        with autocast("cuda", enabled=use_amp):
            pred = model(images)
            # pred and target are lists of each scales
            loss_dict = criterion(pred, target)
        # Backward pass with mixed precision support
        scaler.scale(loss_dict['total']).backward()
        scaler.step(optimizer)
        scaler.update()
        # Print progress
        if i % 50 == 0:
            outstring = f'Epoch [{epoch+1}/{num_epochs}], Iter [{i+1}/{len(train_loader)}], Loss: '
            outstring += ', '.join(f"{key}={val :.3f}" for key, val in loss_dict.items())
            print(outstring)
    lr_scheduler.step()
    learning_rate = lr_scheduler.get_last_lr()[0]
    print(f'Learning Rate for this epoch: {learning_rate}')
    # Validation
    with torch.no_grad():
        val_loss = 0.0
        model.eval()
        for i, (images, target) in enumerate(val_loader):
            # Move to device
            images = images.to(device)
            target = [t.to(device) for t in target]
            # Forward pass
            pred = model(images)
            loss_dict = criterion(pred, target)
            val_loss += loss_dict['total'].item()

        val_loss /= len(val_loader)
        print(f'Validation Loss: {val_loss:.4f}')

    # Save best model
    if best_val_loss > val_loss:
        best_val_loss = val_loss
        print(f'Updating best val loss: {best_val_loss:.5f}')
        os.makedirs('checkpoints', exist_ok=True)
        torch.save(model.state_dict(), 'checkpoints/best_detector.pth')

    # Save checkpoint
    if (epoch + 1) in [5, 10, 20, 30, 40]:
        torch.save(model.state_dict(), f'checkpoints/detector_epoch_{epoch+1}.pth')

    torch.save(model.state_dict(), 'checkpoints/detector.pth')

    # Evaluate on val set
    if (epoch + 1) % 5 == 0:
        print('\nEvaluating on validation set...')
        val_aps = evaluate(model, eval_loader)
        print(f'Epoch {epoch}, mAP: {np.mean(val_aps):.4f}')

# Kaggle submission

### Predict Result

Predict the results based on testing set. Upload to [Kaggle](https://www.kaggle.com/t/3fd493e454a744bdacc7f2918f9a2605).

**How to upload**

1. Click the folder icon in the left hand side of Colab.
2. Right click "result.csv". Select "Download"
3. To kaggle. Click "Submit Predictions"
4. Upload the result.csv
5. System will automaticlaly calculate the accuracy of 50% dataset and publish this result to leaderboard.


In [None]:
!python predict_test.py