# Cars dataset

Here I make the [cars dataset](http://ai.stanford.edu/~jkrause/cars/car_dataset.html) usable.
The dataset has 16,185 images classified into 196 different classes of car. Classes are defined as specific makes, year, models.

### Reading the label file

In [37]:
from scipy.io import loadmat
import os
import pickle

In [38]:
labels = loadmat(os.path.join('data', 'cars_dataset', 'cars_annos.mat'))
labels.keys()

dict_keys(['__header__', '__version__', '__globals__', 'annotations', 'class_names'])

In [39]:
annotations = labels['annotations'].flatten()
class_names = labels['class_names'].flatten()
print('annotations shape: {}' .format(annotations.shape))
print('class_names shape: {}' .format(class_names.shape))

annotations shape: (16185,)
class_names shape: (196,)


### Understanding the components of the labels
The annotations have image_name and class_name in indices 0 and 5, respectively.
The other annotations are tied to bounding boxes (1-4).
I am not sure what annotation 6 is for.

In [40]:
annotations[0]

(array(['car_ims/000001.jpg'], 
      dtype='<U18'), array([[112]], dtype=uint8), array([[7]], dtype=uint8), array([[853]], dtype=uint16), array([[717]], dtype=uint16), array([[1]], dtype=uint8), array([[0]], dtype=uint8))

In [41]:
print('Image name: {}, class label: {}' .format(annotations[0][0], annotations[0][5]))
print('Image name: {}, class label: {}' .format(annotations[88][0], annotations[88][5]))
print('Image name: {}, class label: {}' .format(annotations[89][0], annotations[89][5]))

Image name: ['car_ims/000001.jpg'], class label: [[1]]
Image name: ['car_ims/000089.jpg'], class label: [[1]]
Image name: ['car_ims/000090.jpg'], class label: [[2]]


From visual inspection it looks like the class labels correspond to class_names indexes.
__class_names is 1-indexed__

In [42]:
class_names[0]

array(['AM General Hummer SUV 2000'], 
      dtype='<U26')

In [43]:
class_names[1]

array(['Acura RL Sedan 2012'], 
      dtype='<U19')

In [44]:
# convert class_names into list for ease of indexing
label_full_list = [name[0] for name in class_names]
label_full_list[0:3]

['AM General Hummer SUV 2000', 'Acura RL Sedan 2012', 'Acura TL Sedan 2012']

### Cleaning the labels
1) Provide naming that cleanly separates make, model & year.

2) Pair up each image name with its corresponding label.

In [45]:
label_tuples = []
for name in label_full_list:
    words = name.split(' ')
    make = words[0]
    if make in ['Aston', 'Land']:
        make = '_'.join(words[:2])
        model = '_'.join(words[2:-1])
    else:
        model = '_'.join(words[1:-1])
    year = words [-1]
    if '-' in make:
#         print(make)
        make = make.replace('-', '_')
#         print(make)
    if '-' in model:
#         print(model)
        model = model.replace('-', '_')
#         print(model)
    tup = (make, model, year)
    label_tuples.append(tup)

label_tuples[0:10]

[('AM', 'General_Hummer_SUV', '2000'),
 ('Acura', 'RL_Sedan', '2012'),
 ('Acura', 'TL_Sedan', '2012'),
 ('Acura', 'TL_Type_S', '2008'),
 ('Acura', 'TSX_Sedan', '2012'),
 ('Acura', 'Integra_Type_R', '2001'),
 ('Acura', 'ZDX_Hatchback', '2012'),
 ('Aston_Martin', 'V8_Vantage_Convertible', '2012'),
 ('Aston_Martin', 'V8_Vantage_Coupe', '2012'),
 ('Aston_Martin', 'Virage_Convertible', '2012')]

In [46]:
label_list = ['-'.join(tup) for tup in label_tuples]
label_list[0:10]

['AM-General_Hummer_SUV-2000',
 'Acura-RL_Sedan-2012',
 'Acura-TL_Sedan-2012',
 'Acura-TL_Type_S-2008',
 'Acura-TSX_Sedan-2012',
 'Acura-Integra_Type_R-2001',
 'Acura-ZDX_Hatchback-2012',
 'Aston_Martin-V8_Vantage_Convertible-2012',
 'Aston_Martin-V8_Vantage_Coupe-2012',
 'Aston_Martin-Virage_Convertible-2012']

In [47]:
clean_pairs = [(annotation[0][0], label_list[annotation[5][0][0]-1]) 
               for annotation in annotations]
clean_pairs = [(tup[0].strip('car_ims/'), tup[1]) for tup in clean_pairs]

In [48]:
clean_pairs[0], clean_pairs[88], clean_pairs[89], clean_pairs[-1]

(('000001.jpg', 'AM-General_Hummer_SUV-2000'),
 ('000089.jpg', 'AM-General_Hummer_SUV-2000'),
 ('000090.jpg', 'Acura-RL_Sedan-2012'),
 ('016185.jpg', 'smart-fortwo_Convertible-2012'))

### Saving the data

In [55]:
import cars_utils

save_dir = os.path.join('data', 'notebooks', '1_make_dataset_usable')
dicto_to_save = {
    'clean_pairs': clean_pairs,
    'label_tuples': label_tuples
}

cars_utils.pickle_variable_to_path(dicto_to_save, 'pair_dicto.pkl', save_dir)

file already exists
