### **Traffic Sign Detection - Project**

- **Author**: Uyen Nguyen
- **Date**: 2023/09/29
- **Course**: AI Vietnam - Course 2023
- **Module**: Machine Learning

#### **I. Introduction**

***Traffic Sign Dectection*** is a problem applied algorithms related to the field of Object Detection to detect traffic signs on road. Normally, a Traffic Sgin Detection program includes two parts, which includes locating the signs and recognizing the traffic sign. So, a high-accuracy program needs to well-build the two parts mentioned above.

In this porject, we will build a Traffic Sign Detection using Support Vector Machine (SVM). Defining the inpput and output of the program as follows:
- **Input**: A picture of a traffic sign
- **Output**: The location (location) and class of the sign in the picture

#### **II. Program Installation**

In this part, we will start to build the program to detect different types of traffic signs. For the sake of the easeness of forming and logical thinking of building this program, this section will be split into two sections, corresponding to two main modules. These include "*Building a classification model using SVM*" and "*Building an Object Detect model using sliding window technique*".

##### **1. Traffic Sign Classification Model Using SVM**

##### **a. Import necessary libraries**

In [2]:
# Import necessary library
import time
import os
import cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import xml.etree.ElementTree as ET

from skimage.transform import resize
from skimage import feature
from sklearn.svm import SVC
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

##### **b. Data Loading**

We begin to read and store picture files and associated labels into 2 different lists, corresponding to X and y in our traffic signs classification problem. Inside the data folder, we have two other folders:
- **images**: Folder with pictures
- **annotations**: Folder with .xml file, which is the label file, contains the information about coordinate and class of objects in the coressponding pictures

First, we will declare the (relative) path to the two folders as well as two empty lists to store the pictures and labels as we're going through:

In [3]:
# Declare the path to the two images and annotations folder
annotations_dir = "../data/traffic_sign_detection/annotations"
img_dir         = "../data/traffic_sign_detection/images"

# Create two emmpty lists to store images and labels 
label_lst = []
img_lst  = []

The following code chunk contains of a few different steps to expand the `img_lst` and `label_lst` above:


1. First, we will browse through each .xml file in **annotations** folder. To browser every file name in a folder, we will use `os.listdir()` function. To create a complete path to the .xml file, we use `os.path.join(path1, path2)` to connect folder annotations and file name together. 

2. Examining a sample .xml file in the annotations folder, we can see the following information:

    ```
    <annotation>
        <folder>images</folder>
        <filename>road0.png</filename>
        <size>
            <width>267</width>
            <height>400</height>
            <depth>3</depth>
        </size>
        <segmented>0</segmented>
        <object>
            <name>trafficlight</name>
            <pose>Unspecified</pose>
            <truncated>0</truncated>
            <occluded>0</occluded>
            <difficult>0</difficult>
            <bndbox>
                <xmin>98</xmin>
                <ymin>62</ymin>
                <xmax>208</xmax>
                <ymax>232</ymax>
            </bndbox>
        </object>
    </annotation>
    ```
    Inside a .xml file, we will get multiple information about the picture, and the most important information that we want to focus on this project is coordination information and class name of the object (in this case is the traffic sign). So that, we will care about <**name**> and <**bndbox**> entities. Inside a <*object*> entity, <*name*> corresponds to its class and <*bnd*> gives information about its location (coordinate) in the picture. To read the content of a .xml file in Python, we can use this xml module as demonstrated in the code

3. The xml module also allow us to interact with different entities in a .xml file. After getting the root, we can search for/extract information of child entities of the root. For example, we can get the information of the <*folder*> entity to read the name of the image file.

4. Similarly, we can get information about <*name*> of <*object*>. Because a single picture can contain different objects, we will use a loop to browse each object if needed.

5. Finally, we will get information about the coordinates of the <*bndbox*> and extract the object picture to store in the `img_lst`. We also save the `classname` in the `label_lst`

In [16]:
for xml_file in os.listdir(annotations_dir):
    # Step 1: Connect folder annotations and file name together
    xml_filepath = os.path.join(annotations_dir, xml_file)

    # Step 2: Implement the xml module to read the content of the .xml file
    tree = ET.parse(xml_filepath)
    root = tree.getroot()

    # Step 3: Use xml module to get the image file name and create a path to the corresponding image file
    folder = root.find("folder").text
    img_filename = root.find("filename").text
    img_filepath = os.path.join(img_dir, img_filename)
    img = cv2.imread(img_filepath)

    # Step 4: Get information about the name of the object in the picture
    #         Because we only care about traffic signs, we will pass the class "trafficlight" if encountered in the dataset
    for obj in root.findall("object"):
        classname = obj.find("name").text
        if classname == "trafficlight":
            continue
        
        # Step 5: Get information about the coordinate of the boundary box
        xmin = int(obj.find("bndbox/xmin").text)
        ymin = int(obj.find("bndbox/ymin").text)
        xmax = int(obj.find("bndbox/xmax").text)
        ymax = int(obj.find("bndbox/ymax").text)

        # Extract the object inside the picture
        object_img = img[ymin:ymax, xmin:xmax]

        # Append the object extraction to the img_list
        img_lst.append(object_img)

        # Save the corresponding classname
        label_lst.append(classname)

With the information about the `xmin`, `ymin`, `xmax`, `ymax`, we can easily extract the object out of the original picture by using slicing technique. Finally, we store cut object picture in `img_lst` and class name in `label_lst` as in the code above.

For sanity test, we will print out the number of objects and names of the classes inside the dataset used to train traffic sign detection algorithm

In [5]:
print("Number of objects:", len(img_lst))
print("Class names", list(set(label_lst)))

Number of objects: 1074
Class names ['crosswalk', 'speedlimit', 'stop']


It can be seen that out of all pictures in the given dataset, there are 1074 objects detected. The objects are of three different classes: `crosswalk`, `speedlimit`, and `stop`.

##### **c. Image Preprocessing Function**

In order for the SVM model achieve better accuracy, we will procede to build a preprocessing function to preprocess the input picture and create a better form of representation for the images. Specifically, we would use **HOG (Histogram of Oriented Gradients)** feature in this problem.

In order to create the HOG feature, we will use `feature.hog()` function inside `skimage` library. The preprocessing function can be implemented as follows:

In [6]:
def preprocess_img(img):
    if len(img.shape) > 2:
        img = cv2.cvtColor(
            img,
            cv2.COLOR_BGR2GRAY
        )
    img = img.astype(np.float32)

    resized_img = resize(
        img,
        output_shape  = (32, 32),
        anti_aliasing = True
    )

    hog_feature = feature.hog(
        resized_img,
        orientations = 9,
        pixels_per_cell = (8, 8),
        cells_per_block = (2, 2),
        transform_sqrt = True,
        block_norm = "L2",
        feature_vector = True
    )

    return hog_feature

Beside HOG, we will also converse the picture into grayscale and change the size to (32, 32) before calculating HOG. Because objects have different sizes so uniforming the size is necessary for the HOG feature vector of all pictures are of the same shape.

##### **d. Preprocessing Inputs**

After defining the `preprocess_img()`, we will now implement the function to preprocess all of the input pictures as follow:

In [7]:
img_features_lst = []

for img in img_lst:
    hog_feature = preprocess_img(img)
    img_features_lst.append(hog_feature)

img_features = np.array(img_features_lst)

For sanity check, we will check the shape of the first picture in the list before and after preprocessing. 

In [8]:
print("Shape of the first image before preprocessing:", img_lst[0].shape)
print("Shape of the first image after preprocessing:", img_features[0].shape)

Shape of the first image before preprocessing: (42, 41, 3)
Shape of the first image after preprocessing: (324,)


It can be seen that before preprocess, the picture is a 42 x 41 (pixels) image with colors. However, after preprocess, the image has transformed into a single feature vector with 324 elements.

##### **e. Encode Labels**

For now, the labels are of type string ("stop", "crosswalk", "speedlimit"). We need to change the labels into numeric types for the purpose of model training. Here, we use `LabelEncoder()` to transform the class name into the corresponding 0, 1, 2:

In [19]:
label_encoder = LabelEncoder()
encoded_labels = label_encoder.fit_transform(label_lst)

##### **f. Dataset Spliting**

With the list of HOG feature input vectors (X) and corresponding label (y), we now procede to divide the dataset into 2 different sets, namely `train` for training data and `val` for validation data, with the ratio of 7:3.

In [11]:
random_state = 0
test_size = 0.3
is_shuffle = True

X_train, X_val, y_train, y_val, = train_test_split(
    img_features, encoded_labels,
    test_size = test_size,
    random_state = random_state,
    shuffle = is_shuffle
)

##### **g. Data Normalization**

For the ease of calculation of the training model, we normalize the data of the HOG feature vectors using `StandardScaler()`.

In [12]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)

##### **h. Model Training**

After all of the necessary steps to prepare the datasets, we now move to train the SVM model using the `train` dataset:

In [20]:
# Initialize a Support Vector Machine (SVM) Classifier
clf = SVC(
    kernel = "rbf",
    random_state = random_state,
    probability = True,
    C = 0.5
)

# Train the SVM Classifier
clf.fit(X_train, y_train)

##### **i. Model Evaluation**

We will now evaluate the accuracy of the model on `val` dataset:

In [21]:
y_pred = clf.predict(X_val)
score = accuracy_score(y_pred, y_val)

print("Evaluation results on val set")
print("Accuracy:", score)

Evaluation results on val set
Accuracy: 0.978328173374613
