### Comparing the performance of the following machine learning models in the classification of German Traffic Sign Recognition Benchmark (GTSRB).
1. k-Nearest Neighbor
2. Linear Discriminant Analysis
3. Logistic Regression
4. Linear Support Vector Machine
5. RBF Support Vector Machine
6. Random Forest
7. Convolutional Neural Network (CNN)

In [1]:
import os
import numpy as np
import cv2
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler

In [2]:
# load image data
y = []
img_vectors = []
folder_path = 'GTSRB'
width = 32
height = 32
for idx, folder_name in enumerate(os.listdir(folder_path)):
    for filename in (os.listdir(os.path.join(folder_path, folder_name))):
        img = cv2.imread(os.path.join(folder_path, folder_name, filename))
        img = cv2.resize(img, (width, height))
        img = np.ravel(img)
        img_vectors.append(img)
        y.append(idx)

In [3]:
len(y)

8910

In [4]:
y = np.asarray(y)

In [5]:
y.shape

(8910,)

In [6]:
len(img_vectors)

8910

In [7]:
X = np.asarray(img_vectors)

In [8]:
X.shape

(8910, 3072)

In [9]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

In [10]:
X_train.shape

(7128, 3072)

In [11]:
X_test.shape

(1782, 3072)

In [12]:
y_train.shape

(7128,)

In [13]:
y_test.shape

(1782,)

In [14]:
scaler = MinMaxScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

In [15]:
X_train.shape

(7128, 3072)

In [16]:
X_test.shape

(1782, 3072)

In [17]:
from sklearn.metrics import accuracy_score
import time
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier

In [18]:
# put the classifiers in a list
classifiers = [(KNeighborsClassifier(n_neighbors=3, metric='euclidean'), '3-Nearest Neighbor'),
              (LinearDiscriminantAnalysis(), 'Linear Discriminant Analysis'), 
              (LogisticRegression(random_state=0), 'Logistic Regression'),  
              (SVC(kernel='linear', random_state=0), 'Linear SVM'), 
              (SVC(kernel='rbf', random_state=0), 'RBF SVM'), 
              (RandomForestClassifier(n_estimators=20, random_state=0), 'Random Forest')]

In [19]:
for clf, name in classifiers:
    
    train_start_time = time.time()
    clf.fit(X_train, y_train) # train the classifier
    train_elapsed_time = time.time() - train_start_time # training time
    
    test_start_time = time.time()
    y_pred = clf.predict(X_test) # prediction
    test_elapsed_time = time.time() - test_start_time # prediction time
    
    # prediction accuracy
    accuracy = accuracy_score(y_test, y_pred)
    
    # classification summary
    print('Results for', name,':')
    print('------------------------------------')
    print('Accuracy is {:.2f} %'.format(accuracy*100))
    print('Training time {:.2f}'.format(train_elapsed_time))
    print('Test time {:.2f}'.format(test_elapsed_time))
    

Results for 3-Nearest Neighbor :
------------------------------------
Accuracy is 90.97 %
Training time 17.32
Test time 106.48
Results for Linear Discriminant Analysis :
------------------------------------
Accuracy is 76.21 %
Training time 34.59
Test time 0.02


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Results for Logistic Regression :
------------------------------------
Accuracy is 91.92 %
Training time 7.35
Test time 0.02
Results for Linear SVM :
------------------------------------
Accuracy is 94.11 %
Training time 144.84
Test time 40.80
Results for RBF SVM :
------------------------------------
Accuracy is 79.97 %
Training time 388.51
Test time 90.64
Results for Random Forest :
------------------------------------
Accuracy is 94.05 %
Training time 8.24
Test time 0.08


|Model                            |Accuracy Score|Training time|Prediction time|
|---------------------------------|--------------|-------------|---------------|           
|3-Nearest Neighbor               |90.97 %       |17.32        |106.48         |
|Linear Discriminant Analysis     |76.21 %       |34.59        |0.02           |
|Logistic Regression              |91.92 %       |7.35         |0.02           |
|SVM (linear kernel)              |94.11 %       |144.84       |40.80          |
|SVM (RBF kernel)                 |79.97 %       |388.51       |90.64          |
|Random Forest (n_estimators=20)  |94.05 %       |8.24         |0.08           |
|CNN                              |97.87 %       |163.98       |1.14           |

### #4

In [20]:
from sklearn.model_selection import cross_val_score, ShuffleSplit
classifiers_2_best = [(SVC(kernel='linear', random_state=0), 'Linear SVM'), 
                     (RandomForestClassifier(n_estimators=20, random_state=0), 'Random Forest')]
print('Results for cross validation')
for clf, name in classifiers_2_best:
    cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=0) #shuffle the data set
    accuracies = cross_val_score(estimator=clf, X=X_train, y=y_train, cv=cv) # cross validation accuracies
    print('Results for', name,':')
    print('----------------------------------------------')
    print('Accuracy scores: ', accuracies)
    print('Accuracy: {:.2f} %'.format(accuracies.mean()*100))
    print('Standard deviation: {:.2f}'.format(accuracies.std()))

Results for cross validation
Results for Linear SVM :
----------------------------------------------
Accuracy scores:  [0.95371669 0.92987377 0.93969144 0.93548387 0.93338008]
Accuracy: 93.84 %
Standard deviation: 0.01
Results for Random Forest :
----------------------------------------------
Accuracy scores:  [0.91865358 0.92706872 0.92075736 0.91514727 0.93478261]
Accuracy: 92.33 %
Standard deviation: 0.01
