# Download datasets

## 2018 Dataset

If the two following cells are not working, you can download the dataset [here](https://mycityreport.s3-ap-northeast-1.amazonaws.com/02_RoadDamageDataset/public_data/Japan/CACAIE2018/RoadDamageDataset.tar.gz) and extract it.

In [None]:
import urllib.request

print('Downloading the 2018 GRDDC Dataset (1.7 GB) ...')

url = 'https://mycityreport.s3-ap-northeast-1.amazonaws.com/02_RoadDamageDataset/public_data/Japan/CACAIE2018/RoadDamageDataset.tar.gz'
urllib.request.urlretrieve(url, 'RDD_Dataset_2018.tar.gz')

In [None]:
print('Extracting file ...\n')
!tar -zxf 'RDD_Dataset_2018.tar.gz'
!mv RoadDamageDataset RDD_Dataset_2018
print('Extraction done !\n')
!rm RDD_Dataset_2018.tar.gz
print('RDD_Dataset_2018.tar.gz removed.')

## 2020 Dataset

If the two following cells are not working, you can download the dataset [here](https://md-datasets-public-files-prod.s3.eu-west-1.amazonaws.com/7b38c0a4-9c9a-4aa7-8c45-290b70c36262) and extract it.

In [None]:
import urllib.request

print('Downloading the 2020 GRDDC Dataset (1.1 GB)')

url = 'https://md-datasets-public-files-prod.s3.eu-west-1.amazonaws.com/7b38c0a4-9c9a-4aa7-8c45-290b70c36262'
urllib.request.urlretrieve(url, 'RDD_Dataset_2020.tar.gz')

In [None]:
print('Extracting file ...\n')
!tar -zxf 'RDD_Dataset_2020.tar.gz'
!mv train RDD_Dataset_2020
print('Extraction done !\n')
!rm RDD_Dataset_2020.tar.gz
print('RDD_Dataset_2020.tar.gz removed.')

# Modify the architecture of the datasets as well as the label format to be compatible with YoloV5

Imports

In [None]:
from xml.etree import ElementTree
from xml.dom import minidom
import collections
import os

import matplotlib.pyplot as plt
import matplotlib as matplot
import seaborn as sns
%matplotlib inline

import torch
from IPython.display import Image  # for displaying images

print('torch %s %s' % (torch.__version__, torch.cuda.get_device_properties(0) if torch.cuda.is_available() else 'CPU'))

## 2018 Dataset

In [None]:
base_path = "RDD_Dataset_2018/"
damageTypes=["D00", "D01", "D10", "D11", "D20", "D40", "D43", "D44"]
damageType_to_class = {"D00":0,"D01":1, "D10":2, "D11":3, "D20":4, "D40":5, "D43":6, "D44":7}
# govs corresponds to municipality name.
govs = ["Adachi", "Chiba", "Ichihara", "Muroran", "Nagakute", "Numazu", "Sumida"]

In [None]:
!mkdir RDD_Dataset_2018_Yolo
!mkdir RDD_Dataset_2018_Yolo/images
!mkdir RDD_Dataset_2018_Yolo/labels
!mkdir RDD_Dataset_2018_Yolo/images/train
!mkdir RDD_Dataset_2018_Yolo/images/val
!mkdir RDD_Dataset_2018_Yolo/labels/train
!mkdir RDD_Dataset_2018_Yolo/labels/val

In [None]:
PATH_IMGS = "RDD_Dataset_2018_Yolo/images/"
PATH_LABELS = "RDD_Dataset_2018_Yolo/labels/"

Move the val and train datasets in a good architecture for YoloV5

In [None]:
from shutil import copy
for gov in govs:
    file = open(base_path + gov + "/ImageSets/Main/train.txt")
    for line in file :
        img_name = line.rstrip('\n')
        img_path = base_path + gov + "/JPEGImages/" + img_name + ".jpg"
        train_path = PATH_IMGS + "train/"
        copy(img_path,train_path)

    file.close()
    file = open(base_path + gov + "/ImageSets/Main/val.txt")
    for line in file :
        img_name = line.rstrip('\n')
        img_path = base_path + gov + "/JPEGImages/" + img_name + ".jpg"
        val_path = PATH_IMGS + "val/"
        copy(img_path,val_path)

    file.close()

Adapt the labels to YoloV5 format

In [None]:
for gov in govs:
    
    for phase in ['train','val'] :

        file_list = []

        file_for_names = open(base_path + gov + "/ImageSets/Main/{}.txt".format(phase))
        for line in file_for_names :
            img_name = line.rstrip('\n')
            file_list.append(img_name)
        file_for_names.close()

        for file in file_list:
            if file =='.DS_Store':
                pass
            else:
                infile_xml = open(base_path + gov + '/Annotations/' + file+'.xml')
                tree = ElementTree.parse(infile_xml)
                root = tree.getroot()
                file_txt = open(PATH_LABELS+phase+'/'+file+'.txt','a')
                for obj in root.iter('object'):
                    cls_name = obj.find('name').text

                    if cls_name == 'D30' :
                        pass
                    else :

                        xmlbox = obj.find('bndbox')
                        xmin = int(xmlbox.find('xmin').text)
                        xmax = int(xmlbox.find('xmax').text)
                        ymin = int(xmlbox.find('ymin').text)
                        ymax = int(xmlbox.find('ymax').text)

                        x_center = 0.5*(xmin + xmax)
                        y_center = 0.5*(ymin + ymax)
                        width = xmax - xmin
                        height = ymax - ymin
                        
                        x_center, y_center, width, height = round(x_center/600,5), round(y_center/600,5), round(width/600,5), round(height/600,5)

                        class_number = damageType_to_class[cls_name]

                        file_txt.write(str(class_number)+' '+str(x_center)+' '+str(y_center)+' '+str(width)+' '+str(height)+'\n')
                file_txt.close()

## 2020 Dataset

Imports

In [None]:
import os
import random
from shutil import copy

In [None]:
countries = ['Czech','India','Japan']
base_path = "RDD_Dataset_2020/"
damageType_to_class = {"D00":0,"D10":1, "D20":2, "D40":3}
damageTypes = ['D00','D10','D20','D40']
class_dict = {'D00':0,'D10':0,'D20':0,'D40':0}

As we only have the labels of the train dataset, we have to split the train set into two sets : the train set and the test/validation set
We do that by random, 

In [None]:
file_list = []
for country in countries :
    file_list_country = os.listdir('RDD_Dataset_2020/{}/images'.format(country))
    random.shuffle(file_list_country)
    file_list.append(file_list_country)
    print("Number of images in "+country+" : "+str(len(file_list_country)))

In [None]:
!mkdir RDD_Dataset_2020_Yolo
!mkdir RDD_Dataset_2020_Yolo/images
!mkdir RDD_Dataset_2020_Yolo/labels
!mkdir RDD_Dataset_2020_Yolo/images/train
!mkdir RDD_Dataset_2020_Yolo/images/val
!mkdir RDD_Dataset_2020_Yolo/labels/train
!mkdir RDD_Dataset_2020_Yolo/labels/val

In [None]:
PROPORTION_TRAIN = 0.9 # Proportion of the images used for training
PATH_IMGS = "RDD_Dataset_2020_Yolo/images/"
PATH_LABELS = "RDD_Dataset_2020_Yolo/labels/"

In [None]:
file_list_train = []
file_list_val = []
for i in range(len(countries)) :
    file_list_train.append(file_list[i][:int(PROPORTION_TRAIN*len(file_list[i]))])
    file_list_val.append(file_list[i][int(PROPORTION_TRAIN*len(file_list[i])):])

In [None]:
phases = ['train','val']
file_list_train_val = [file_list_train,file_list_val]
for (j,phase) in enumerate(phases) :
    file_list_phase = file_list_train_val[j]
    for (i,country) in enumerate(countries) :
        file_list_country = file_list_phase[i]

        ################################### FOR THE LABELS ###################################
        for file in file_list_country:
            file_name = file.rsplit('.')[0]
            infile_xml = open(base_path + country + '/annotations/xmls/' + file_name +'.xml')
            tree = ElementTree.parse(infile_xml)
            root = tree.getroot()
            file_txt = open(PATH_LABELS+phase+'/'+file_name+'.txt','w')

            for obj in root.iter('size'):
                img_height = int(obj.find('height').text)
                img_width = int(obj.find('width').text)

            nb_boxes_img = 0
            for obj in root.iter('object'):
                cls_name = obj.find('name').text
                if cls_name not in damageTypes :
                    pass
                else :
                    class_dict[cls_name]+=1
                    nb_boxes_img += 1
                    xmlbox = obj.find('bndbox')
                    xmin = int(xmlbox.find('xmin').text)
                    xmax = int(xmlbox.find('xmax').text)
                    ymin = int(xmlbox.find('ymin').text)
                    ymax = int(xmlbox.find('ymax').text)

                    x_center = 0.5*(xmin + xmax)
                    y_center = 0.5*(ymin + ymax)
                    width = xmax - xmin
                    height = ymax - ymin
                    
                    x_center, y_center, width, height = round(x_center/img_width,5), round(y_center/img_height,5), round(width/img_width,5), round(height/img_height,5)

                    class_number = damageType_to_class[cls_name]

                    file_txt.write(str(class_number)+' '+str(x_center)+' '+str(y_center)+' '+str(width)+' '+str(height)+'\n')
            file_txt.close()
            ################################ FOR THE IMAGES ########################################
            img_path = base_path + country + '/images/' + file
            phase_path = PATH_IMGS+phase+'/'
            copy(img_path,phase_path)

# Clone YoloV5 GitHub repository to start training

In [None]:
!git clone https://github.com/ultralytics/yolov5  # clone repository
%cd yolov5
# !git reset --hard '79af114' Uncomment if a new version of YoloV5 makes errors in the code
%pip install -qr requirements.txt  # install dependencies
import torch
from IPython.display import Image, clear_output  # to display images

clear_output()
print(f"Setup complete. Using torch {torch.__version__} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")

In [None]:
!mv ../rdd2018.yaml data/
!mv ../rdd2020.yaml data/

In [None]:
# Tensorboard  (optional)

# %load_ext tensorboard
# %tensorboard --logdir runs/train

In [None]:
# Weights & Biases  (optional)

# %pip install -q wandb
# import wandb
# wandb.login(key='xxx') # After registering to WandB you will have access to your key, that you cant put in the place of xxx

If you have any problem with training or testing with YoloV5, please refer to the [YoloV5 wiki](https://github.com/ultralytics/yolov5/wiki)

## Train

Every train run is saved in yolov5/runs/train/

Train with default hyperparameters and predefined weights

In [None]:
!python train.py --img 608 --batch 16 --epochs 15 --data rdd2018.yaml --weights 'yolov5x.pt'

Train from scratch using yolov5 architecture and randomly initialised weights

In [None]:
!python train.py --img 608 --batch 16 --epochs 15 --data rdd2018.yaml --weights ' ' -cfg 'yolov5x.yaml'

Use hyperparameter evolution to optimise them

In [None]:
!python train.py --img 608 --batch 16 --epochs 15 --data rdd2018.yaml --weights 'yolov5x.pt' --evolve 50

## Test

Every test run is saved in yolov5/runs/test/

Test a neural network on a test dataset

In [None]:
!python val.py --weights 'best_2018_608.pt' --data rdd2018.yaml --img 608

Test a neural network on a test dataset with Test-Time Augmentation

In [None]:
!python val.py --weights 'best_2018_608.pt' --data rdd2018.yaml --img 608 --augment

Use model ensembling to make predictions on a test dataset

In [None]:
!python val.py --weights 'best_2018_608.pt' 'best_2018_448.pt' --data rdd2018.yaml --img 608 --augment

## Detect

This section is to produce images on which we can see the predictions

Every detect run is saved in yolov5/runs/detect/

In [None]:
!python detect.py --weights 'best_2018_608.pt' --data rdd2018.yaml --img 608 --augment