### W281 Final Project Supplemental Notebook
### Learned Features Approach to MAVOC Vehicle Classification

This notebook tries a "black box" approach to classifying electro optical images of 10 vehicle classes from the MAVOC dataset. We first generate image embeddings using a pre-trained Resnet-152 model, and then use an autoML process on the learned embeddings to quickly fit and tune a number of mostly linear classifiers. We find that the ResNet-152 pre-trained embedding plus an ensemble of KNN and SVM classifiers outperforms our hand-engineered + single linear classifier approach as well as our basic CNN approach.

In [1]:
import numpy as np
import pandas as pd
import pprint
from eo_learned_features import get_eo_ndarray, get_embedding

import autosklearn.classification
from autosklearn.experimental.askl2 import AutoSklearn2Classifier
import sklearn.model_selection
from sklearn.metrics import accuracy_score, classification_report

We first split the data into train and test (in the same way as in our main project notebook). We convert the train and test images into a giant feature vectors of shape (n_samples, height, width). Note, we omit the dev split as the autoML process we use toward the end splits part of the train dataset into validation for hyperparameter tuning.

In [2]:
files = pd.read_csv('mavoc_partition_scheme.csv')
files['eo_img'] = files['eo_img'].str[9:]

train = files[files['partition'] == 'train']
dev = files[files['partition'] == 'dev']
test = files[files['partition'] == 'test']

# Get the numpy array matrices of each partition
train_features_arr, train_labels_arr = get_eo_ndarray(train)
# dev_features_arr, dev_labels_arr = get_eo_ndarray(dev)
test_features_arr, test_labels_arr = get_eo_ndarray(test)

100%|██████████| 10/10 [02:18<00:00, 13.86s/it]
100%|██████████| 10/10 [00:16<00:00,  1.67s/it]


In [11]:
print(f"Train array: {train_features_arr.shape} Train labels:{train_labels_arr.shape}")
# print(f"Dev array: {dev_features_arr.shape} Dev labels:{dev_labels_arr.shape}")
print(f"Test array: {test_features_arr.shape} Test labels:{test_labels_arr.shape}")

Train array: (4992, 32, 32) Train labels:(4992,)
Test array: (624, 32, 32) Test labels:(624,)


Convert the pixel features to a CNN-based embedding, using output of the last hidden layer from ResNet-152, which generates a single-dim vector of size 2048. The resulting feature vectors are then (n_samples, 2048).

In [3]:
# Grab the numpy array matrices of each partition
train_learned_features = get_embedding(train_features_arr)
# dev_learned_features = get_embedding(dev_features_arr)
test_learned_features = get_embedding(test_features_arr)

100%|██████████| 4992/4992 [13:52<00:00,  6.00it/s]  
100%|██████████| 624/624 [01:39<00:00,  6.27it/s]


In [12]:
print(f"Train embedding: {train_learned_features.shape}")
# print(f"Dev embedding: {dev_learned_features.shape}")
print(f"Test embedding: {test_learned_features.shape}")

Train embedding: (4992, 2048)
Test embedding: (624, 2048)


We then use [auto-sklearn](https://automl.github.io/auto-sklearn/master/index.html) to fit and tune multiple classifiers with a 1-hour time cap. The final model is an ensemble of the top-performing models. The final modeling approach uses an ensemble of support vector machine and k-nearest neighbor to achive an overall accuracy of 0.90, almost 3 percentage points better than our non-linear CNN! The ensemble model also has difficulty predicting sedans in our test set, similar to the CNN.

In [None]:
automl = autosklearn.classification.AutoSklearnClassifier(n_jobs=-1,memory_limit=None, seed=281)
automl.fit(train_learned_features, train_labels_arr)

In [5]:
test_preds_arr = automl.predict(test_learned_features)

class_names = ['sedan', 'suv', 'pickup truck', 'van','box truck', 'motorcycle', 'flatbed truck','bus' , 'pickup truck with trailer',
'flatbed truck with trailer']

print(classification_report(test_labels_arr, test_preds_arr, target_names=class_names))

                            precision    recall  f1-score   support

                     sedan       0.59      0.63      0.61        63
                       suv       0.71      0.85      0.77        62
              pickup truck       0.92      0.89      0.90        62
                       van       0.91      0.84      0.87        62
                 box truck       1.00      1.00      1.00        62
                motorcycle       1.00      0.85      0.92        62
             flatbed truck       1.00      1.00      1.00        63
                       bus       1.00      0.95      0.98        62
 pickup truck with trailer       1.00      1.00      1.00        63
flatbed truck with trailer       0.95      0.97      0.96        63

                  accuracy                           0.90       624
                 macro avg       0.91      0.90      0.90       624
              weighted avg       0.91      0.90      0.90       624



SVM and k-nearest neighbors were the most performant single-model classifiers.

In [6]:
print(automl.leaderboard())

print("\n========= Final ensemble model ==========\n\n")
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(automl.show_models())

          rank  ensemble_weight                 type      cost    duration
model_id                                                                  
93           1             0.02           libsvm_svc  0.132282   76.824723
82           2             0.02           libsvm_svc  0.139563   55.465609
128          3             0.02           libsvm_svc  0.140777   66.210677
7            4             0.22  k_nearest_neighbors  0.144417    7.496868
139          5             0.02           libsvm_svc  0.145024   78.565759
97           6             0.02           libsvm_svc  0.147451   53.675237
106          7             0.06           libsvm_svc  0.151092  121.528774
34           8             0.02           libsvm_svc  0.157767   46.682972
13           9             0.02           libsvm_svc  0.160194   26.277497
74          10             0.02           libsvm_svc  0.165655   34.997453
36          11             0.04           libsvm_svc  0.181432   24.600532
108         12           

Out of curiosity, we also tried the experimental auto-sklearn version 2 which touts to be more accurate by utilizing meta learning. However the top approach from this automated machine learning process is less accurate than the ensemble generated by the version 1 of the package. Interestingly, version two produces a different mix of classifieres like multilayer perceptrons and gradient boosting methods, which should generally be more performant than something like KNN.

In [None]:
automl2 = AutoSklearn2Classifier(n_jobs=-1,memory_limit=None, seed=281)
automl2.fit(train_learned_features, train_labels_arr)

test_preds_arr2 = automl2.predict(test_learned_features)

In [13]:
print("Ensemble accuracy score (AutoML 2.0) \n", sklearn.metrics.accuracy_score(test_labels_arr, test_preds_arr2))
print(automl2.leaderboard())

Ensemble accuracy score (AutoML 2.0) 
 0.8381410256410257
          rank  ensemble_weight               type      cost     duration
model_id                                                                 
23           1             0.02                mlp  0.209335    97.039469
39           2             0.02                mlp  0.255008    77.509783
49           3             0.02                mlp  0.260016    79.052327
40           4             0.02                mlp  0.261819    56.425754
30           5             0.02                mlp  0.262620    61.239457
38           6             0.02                mlp  0.273438   119.480307
41           7             0.02                mlp  0.274639    33.196541
4            8             0.02  gradient_boosting  0.277644  1799.048124
6            9             0.02  gradient_boosting  0.300280   558.060165
24          10             0.02  gradient_boosting  0.304687  1598.268912
31          11             0.02      random_forest  0.