# MODULE 2: Fine-tune on EY Challenge 2024 dataseet

In this module, we will use the pretrained model from  Module 1, and fine-tune it using EY Challenge 2024 dataset. 

The pipeline for Module 2 is shown below:
1. A **fine-grained dataset** is prepared. Specifically, we go through the post-event dataset, and find relevant images that are suitable for training. Notably, we keep in mind of the potential class imbalanced issue while collecting the relevant images. Then, we annotate the dataset.
2. **Transfer Learning**: Use pretrained YOLOv8 (from Module 1) on the fine-grained dataset

Note that without transfer learning using Module 1's YOLOv8, there is a high chance of overfitting on the actual test data provided by the EY Challenge 2024 dataset. You might get a better train mAP, but it does not actually reflect the mAP on the test set.

## Dependencies

In [1]:
# Install YOLOv8
!pip install ultralytics==8.0.196

# Import required libraries
from IPython import display
display.clear_output()

## Unzip all datasets

In [2]:
import os
import zipfile

def unzip_folder(zip_filepath, dest_dir):
    with zipfile.ZipFile(zip_filepath, 'r') as zip_ref:
        zip_ref.extractall(dest_dir)
    print(f'The zip file {zip_filepath} has been extracted to the directory {dest_dir}')

In [4]:
# Coarse dataset
zip_file = './coarse-dataset.zip'
unzip_directory = './coarse-dataset'
if not os.path.isdir(unzip_directory):
    unzip_folder(zip_file,unzip_directory)

The zip file ./coarse-dataset.zip has been extracted to the directory ./coarse-dataset


In [5]:
# Extra-fine-grained dataset
zip_file = './extra-fine-grained.zip'
unzip_directory = './extra-fine-grained'
if not os.path.isdir(unzip_directory):
    unzip_folder(zip_file,unzip_directory)

The zip file ./extra-fine-grained.zip has been extracted to the directory ./extra-fine-grained


In [6]:
# Submission dataset
submission_zip = './challenge_1_submission_images.zip'
submission_directory = './challenge_1_submission_images'
if not os.path.isdir(submission_directory):
    unzip_folder(submission_zip,submission_directory)

## Custom Training

In [7]:
import shutil, os

# if you are doing multiple experiments, you may need to delete previous runs

try:
    shutil.rmtree('/home/tham/Desktop/delete/EY/runs')
except Exception as e:
    print(e)

try:
    os.remove('/home/tham/Desktop/delete/EY/submission.zip')
except Exception as e:
    print(e)

[Errno 2] No such file or directory: '/home/tham/Desktop/delete/EY/runs'
[Errno 2] No such file or directory: '/home/tham/Desktop/delete/EY/submission.zip'


In [8]:
# !pip install roboflow

# from roboflow import Roboflow
# rf = Roboflow(api_key="FmbGB9W7el5OsAuMapGm")
# project = rf.workspace("edge-4k8xw").project("extra-fine-grained")
# version = project.version(2)
# dataset = version.download("yolov8")

# display.clear_output()

In [9]:
from ultralytics import YOLO

# yaml file of the training dataset
yaml_file = "coarse-dataset/data.yaml"

# use COCO pretrained YOLOv8 models from Module 1 for transfer learning
model = YOLO("/home/tham/Desktop/delete/EY/pretrained/detect/train/weights/best.pt")
model.train(data=yaml_file, epochs=50, imgsz=512, plots=True)

New https://pypi.org/project/ultralytics/8.1.24 available 😃 Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.0.196 🚀 Python-3.8.10 torch-2.2.1+cu121 CUDA:0 (NVIDIA GeForce RTX 2070 Super, 7974MiB)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=/home/tham/Desktop/delete/EY/pretrained/detect/train/weights/best.pt, data=coarse-dataset/data.yaml, epochs=50, patience=50, batch=16, imgsz=512, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=None, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True,

ultralytics.utils.metrics.DetMetrics object with attributes:

ap_class_index: array([0, 1, 2, 3])
box: ultralytics.utils.metrics.Metric object
confusion_matrix: <ultralytics.utils.metrics.ConfusionMatrix object at 0x7f291c026c70>
fitness: 0.3513725514503218
keys: ['metrics/precision(B)', 'metrics/recall(B)', 'metrics/mAP50(B)', 'metrics/mAP50-95(B)']
maps: array([     0.4892,      0.3992,     0.22253,     0.21205])
names: {0: '0', 1: '1', 2: '2', 3: '3'}
plot: True
results_dict: {'metrics/precision(B)': 0.5059302947682462, 'metrics/recall(B)': 0.5993382587132587, 'metrics/mAP50(B)': 0.5370082715449616, 'metrics/mAP50-95(B)': 0.3307463603286952, 'fitness': 0.3513725514503218}
save_dir: PosixPath('runs/detect/train')
speed: {'preprocess': 0.110169251759847, 'inference': 0.962217648824056, 'loss': 0.0010530153910319011, 'postprocess': 1.1388659477233887}

In [10]:
# rename
import os
os.rename('runs', 'runs (coarse)')

In [11]:
from ultralytics import YOLO

# yaml file of the training dataset
# yaml_file = f"{dataset.location}/data.yaml"
yaml_file = "extra-fine-grained/data.yaml"

# use COCO pretrained YOLOv8 models from Module 1 for transfer learning
model = YOLO("/home/tham/Desktop/delete/EY/runs (coarse)/detect/train/weights/best.pt")
model.train(data=yaml_file, epochs=10, imgsz=512, plots=True)

New https://pypi.org/project/ultralytics/8.1.24 available 😃 Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.0.196 🚀 Python-3.8.10 torch-2.2.1+cu121 CUDA:0 (NVIDIA GeForce RTX 2070 Super, 7974MiB)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=/home/tham/Desktop/delete/EY/runs (coarse)/detect/train/weights/best.pt, data=extra-fine-grained/data.yaml, epochs=10, patience=50, batch=16, imgsz=512, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=None, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_con

ultralytics.utils.metrics.DetMetrics object with attributes:

ap_class_index: array([0, 1, 2, 3])
box: ultralytics.utils.metrics.Metric object
confusion_matrix: <ultralytics.utils.metrics.ConfusionMatrix object at 0x7f29444a6160>
fitness: 0.2200792007977212
keys: ['metrics/precision(B)', 'metrics/recall(B)', 'metrics/mAP50(B)', 'metrics/mAP50-95(B)']
maps: array([     0.3858,    0.045482,     0.17868,     0.19688])
names: {0: '0', 1: '1', 2: '2', 3: '3'}
plot: True
results_dict: {'metrics/precision(B)': 0.3085945346819713, 'metrics/recall(B)': 0.6096801844318783, 'metrics/mAP50(B)': 0.38537524197395795, 'metrics/mAP50-95(B)': 0.20171297400036153, 'fitness': 0.2200792007977212}
save_dir: PosixPath('runs/detect/train')
speed: {'preprocess': 0.1718282699584961, 'inference': 1.0261774063110352, 'loss': 0.001049041748046875, 'postprocess': 1.3030767440795898}

## Results and Visualization

In [None]:
from IPython.display import Image

Image(filename=f'runs/detect/train/confusion_matrix.png', width=600)

In [None]:
Image(filename=f'runs/detect/train/results.png', width=600)

In [None]:
Image(filename=f'runs/detect/train/val_batch0_pred.jpg', width=600)

In [None]:
# Check the metrics provided
import pandas as pd

df = pd.read_csv(f'runs/detect/train/results.csv')
df.tail()

In [None]:
import matplotlib.pyplot as plt

epochs = df['                  epoch']
mAP50_B = df['       metrics/mAP50(B)']
mAP50_95_B = df['    metrics/mAP50-95(B)']

fig, ax = plt.subplots(figsize=(5, 4))
ax.plot(epochs, mAP50_B, label='mAP50(B)')
ax.plot(epochs, mAP50_95_B, label='    metrics/mAP50-95(B)')
ax.set_ylabel('mAP')
ax.set_xlabel('Epoch')
ax.legend()
fig.suptitle('mAP50(B) and mAP50-95(B) vs Epoch')
plt.show()

## Phase 1 Submission

In [12]:
import os
import zipfile

def unzip_folder(zip_filepath, dest_dir):
    with zipfile.ZipFile(zip_filepath, 'r') as zip_ref:
        zip_ref.extractall(dest_dir)
    print(f'The zip file {zip_filepath} has been extracted to the directory {dest_dir}')



submission_zip = './challenge_1_submission_images.zip'
submission_directory = './challenge_1_submission_images'
if not os.path.isdir(submission_directory):
    unzip_folder(submission_zip,submission_directory)

In [13]:
# Load the Model
model = YOLO('runs/detect/train/weights/best.pt')

In [14]:
# Decoding according to the .yaml file class names order
decoding_of_predictions ={0: 'undamagedresidentialbuilding', 1: 'undamagedcommercialbuilding', 2: 'damagedresidentialbuilding', 3: 'damagedcommercialbuilding'}

directory = 'challenge_1_submission_images/Validation_Data_JPEG'
directory = 'challenge_1_submission_images/Submission data'
# Directory to store outputs
results_directory = 'Validation_Data_Results'

# Create submission directory if it doesn't exist
if not os.path.exists(results_directory):
    os.makedirs(results_directory)

# Loop through each file in the directory
for filename in os.listdir(directory):
    # Check if the current object is a file and ends with .jpeg
    if os.path.isfile(os.path.join(directory, filename)) and filename.lower().endswith('.jpg'):
        # Perform operations on the file
        file_path = os.path.join(directory, filename)
        print(file_path)
        print("Making a prediction on ", filename)
        results = model.predict(file_path, save=True, iou=0.5, save_txt=True, conf=0.25)
        
        for r in results:
            conf_list = r.boxes.conf.cpu().numpy().tolist()
            clss_list = r.boxes.cls.cpu().numpy().tolist()
            original_list = clss_list
            updated_list = []
            for element in original_list:
                 updated_list.append(decoding_of_predictions[int(element)])

        bounding_boxes = r.boxes.xyxy.cpu().numpy()
        confidences = conf_list
        class_names = updated_list

        # Check if bounding boxes, confidences and class names match
        if len(bounding_boxes) != len(confidences) or len(bounding_boxes) != len(class_names):
            print("Error: Number of bounding boxes, confidences, and class names should be the same.")
            continue
        text_file_name = os.path.splitext(filename)[0]
        # Creating a new .txt file for each image in the submission_directory
        with open(os.path.join(results_directory, f"{text_file_name}.txt"), "w") as file:
            for i in range(len(bounding_boxes)):
                # Get coordinates of each bounding box
                left, top, right, bottom = bounding_boxes[i]
                # Write content to file in desired format
                file.write(f"{class_names[i]} {confidences[i]} {left} {top} {right} {bottom}\n")
        print("Output files generated successfully.")


image 1/1 /home/tham/Desktop/delete/EY/challenge_1_submission_images/Submission data/Validation_Post_Event_006.jpg: 512x512 32 0s, 1 1, 6 2s, 5.3ms
Speed: 0.6ms preprocess, 5.3ms inference, 1.3ms postprocess per image at shape (1, 3, 512, 512)
Results saved to [1mruns/detect/predict[0m
1 label saved to runs/detect/predict/labels



challenge_1_submission_images/Submission data/Validation_Post_Event_006.jpg
Making a prediction on  Validation_Post_Event_006.jpg
Output files generated successfully.
challenge_1_submission_images/Submission data/Validation_Post_Event_012.jpg
Making a prediction on  Validation_Post_Event_012.jpg


image 1/1 /home/tham/Desktop/delete/EY/challenge_1_submission_images/Submission data/Validation_Post_Event_012.jpg: 512x512 23 0s, 2 1s, 5 2s, 9.8ms
Speed: 0.7ms preprocess, 9.8ms inference, 2.9ms postprocess per image at shape (1, 3, 512, 512)
Results saved to [1mruns/detect/predict[0m
2 labels saved to runs/detect/predict/labels

image 1/1 /home/tham/Desktop/delete/EY/challenge_1_submission_images/Submission data/Validation_Post_Event_005.jpg: 512x512 32 0s, 2 2s, 8.5ms
Speed: 0.8ms preprocess, 8.5ms inference, 2.7ms postprocess per image at shape (1, 3, 512, 512)
Results saved to [1mruns/detect/predict[0m
3 labels saved to runs/detect/predict/labels

image 1/1 /home/tham/Desktop/delete/EY/challenge_1_submission_images/Submission data/Validation_Post_Event_002.jpg: 512x512 6 0s, 4 2s, 1 3, 8.3ms
Speed: 0.7ms preprocess, 8.3ms inference, 2.3ms postprocess per image at shape (1, 3, 512, 512)
Results saved to [1mruns/detect/predict[0m
4 labels saved to runs/detect/predict/labels



Output files generated successfully.
challenge_1_submission_images/Submission data/Validation_Post_Event_005.jpg
Making a prediction on  Validation_Post_Event_005.jpg
Output files generated successfully.
challenge_1_submission_images/Submission data/Validation_Post_Event_002.jpg
Making a prediction on  Validation_Post_Event_002.jpg
Output files generated successfully.
challenge_1_submission_images/Submission data/Validation_Post_Event_004.jpg
Making a prediction on  Validation_Post_Event_004.jpg
Output files generated successfully.
challenge_1_submission_images/Submission data/Validation_Post_Event_007.jpg
Making a prediction on  Validation_Post_Event_007.jpg
Output files generated successfully.
challenge_1_submission_images/Submission data/Validation_Post_Event_009.jpg
Making a prediction on  Validation_Post_Event_009.jpg


image 1/1 /home/tham/Desktop/delete/EY/challenge_1_submission_images/Submission data/Validation_Post_Event_009.jpg: 512x512 18 0s, 1 1, 3 2s, 3 3s, 6.8ms
Speed: 0.8ms preprocess, 6.8ms inference, 1.9ms postprocess per image at shape (1, 3, 512, 512)
Results saved to [1mruns/detect/predict[0m
7 labels saved to runs/detect/predict/labels

image 1/1 /home/tham/Desktop/delete/EY/challenge_1_submission_images/Submission data/Validation_Post_Event_010.jpg: 512x512 11 0s, 3 1s, 2 2s, 2 3s, 7.2ms
Speed: 0.7ms preprocess, 7.2ms inference, 2.7ms postprocess per image at shape (1, 3, 512, 512)
Results saved to [1mruns/detect/predict[0m
8 labels saved to runs/detect/predict/labels

image 1/1 /home/tham/Desktop/delete/EY/challenge_1_submission_images/Submission data/Validation_Post_Event_011.jpg: 512x512 7 0s, 2 1s, 2 2s, 6.8ms
Speed: 0.7ms preprocess, 6.8ms inference, 1.7ms postprocess per image at shape (1, 3, 512, 512)
Results saved to [1mruns/detect/predict[0m
9 labels saved to runs/detec

Output files generated successfully.
challenge_1_submission_images/Submission data/Validation_Post_Event_010.jpg
Making a prediction on  Validation_Post_Event_010.jpg
Output files generated successfully.
challenge_1_submission_images/Submission data/Validation_Post_Event_011.jpg
Making a prediction on  Validation_Post_Event_011.jpg
Output files generated successfully.
challenge_1_submission_images/Submission data/Validation_Post_Event_003.jpg
Making a prediction on  Validation_Post_Event_003.jpg
Output files generated successfully.
challenge_1_submission_images/Submission data/Validation_Post_Event_008.jpg
Making a prediction on  Validation_Post_Event_008.jpg
Output files generated successfully.
challenge_1_submission_images/Submission data/Validation_Post_Event_001.jpg
Making a prediction on  Validation_Post_Event_001.jpg


image 1/1 /home/tham/Desktop/delete/EY/challenge_1_submission_images/Submission data/Validation_Post_Event_001.jpg: 512x512 23 0s, 14 2s, 6.2ms
Speed: 0.7ms preprocess, 6.2ms inference, 1.5ms postprocess per image at shape (1, 3, 512, 512)
Results saved to [1mruns/detect/predict[0m
12 labels saved to runs/detect/predict/labels


Output files generated successfully.


In [15]:
import shutil

# Define your source directory and the destination where the zip file will be created
source_dir = results_directory
destination_zip = 'submission'

# Create a zip file from the directory
shutil.make_archive(destination_zip, 'zip', source_dir)

print(f"Directory {source_dir} has been successfully zipped into {destination_zip}.")

Directory Validation_Data_Results has been successfully zipped into submission.
