# Model CIFAR 10 Data in Python

In this script, we'll model the CIFAR 10 image recognition dataset in python. The input is all set up for you, but its up to you to build the network. 

First we'll import all the libraries we'll need. 

In [1]:
import numpy as np
import pandas as pd
import os
import tarfile
import keras 

Using TensorFlow backend.


# IMPORTANT IMPORTANT IMPORTANT

Change the line below to point to the cifar-10-python.tar datafile that you downloaded in the zip with this example notebook.

In [2]:
wdir: 'Users/tsehay/Desktop/SVHM'

# IMPORTANT IMPORTANT IMPORTANT

## Import the data.

This is all boilerplate that imports all the data. It should tun all in one go AS LONG AS you've set the directory above correctly, and included the double slashes ("\\\").

In [3]:
def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict

tarFile = tarfile.open('cifar-10-python.tar')

tarFile.getnames()

tarFile.extractall()

train_x = []
train_y = []

for name in tarFile.getnames():
    if name == 'cifar-10-batches-py/readme.html':
        continue
    elif name == 'cifar-10-batches-py/batches.meta':
        continue
    elif name == 'cifar-10-batches-py':
        continue
    elif name == 'cifar-10-batches-py/test_batch':
        testData = unpickle(name)
        test_x = testData[b'data']
        test_y = testData[b'labels']
    else:
        tempDict = unpickle(name)
        train_x.extend(tempDict[b'data'])
        train_y.extend(tempDict[b'labels'])

# Matrix shape is 3072, which is three matrixes of 32*32 (1024 units), one for each color

# The training sets are train_x (data) and train_y (labels)
# The test sets are test_x (data) and test_y (labels)

# last thing is to convert this into an array, so we can use it to train our model. 
train_x = np.array(train_x)
train_y = np.array(train_y)
train_y = keras.utils.to_categorical(train_y, 10)

test_x = np.array(test_x)
test_y = np.array(test_y)
test_y = keras.utils.to_categorical(test_y, 10)


# Rearrange the flat items into a matrix so we can use a CNN to predict the values. 

train_x_v2 = [np.concatenate((x[:1024].reshape(1, -1, 32), x[1024:2048].reshape(1, -1, 32), x[2048:].reshape(1, -1, 32)), axis = 0) for x in train_x]
train_x_v2 = np.array(train_x_v2)

test_x_v2 = [np.concatenate((x[:1024].reshape(1, -1, 32), x[1024:2048].reshape(1, -1, 32), x[2048:].reshape(1, -1, 32)), axis = 0) for x in test_x]
test_x_v2 = np.array(test_x_v2)


Now import all the Keras parts you need, and start building your network. We've included some for you. 

In [4]:
train_x.shape

(50000, 3072)

In [5]:
train_x_v2.shape

(50000, 3, 32, 32)

In [6]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Conv2D, Dropout  
from keras.layers import AveragePooling2D, MaxPooling2D, BatchNormalization


Now build your network! Here are some tips: 
- The input shape is (3, 32, 32)
- You'll want to use 2D layers (Conv2D, MaxPooling2D, etc.)
- The kernel_size argument for 2D layers is a tuple like this (5, 5)

So insead of kernel_size = 5 (in the example), you'll say kernel_size = (5, 5). This imput is still tunable. 

And finally - here is your canvas! 

### To start you off: 

<p>Here is a simple network with a single set of hidden layers and all the basics you need to get started. You can experiment by extending this network, changing the value of the parameters, adding (or removing) layers, etc. For a good grade, you WILL need to get creative, and record the imapct the changes have on accuracy, on training time, etc.<p>

In [7]:
model = Sequential()
model.add(Conv2D(filters = 64, kernel_size = (2, 2),  input_shape = (3, 32, 32), data_format = 'channels_first'))
model.add(MaxPooling2D(pool_size = (2, 2)))
model.add(Dropout(0.2))
model.add(BatchNormalization())

model.add(Flatten())
model.add(Dense(units = 10, activation = 'softmax'))

In [8]:
model.compile(loss = 'categorical_crossentropy', 
              optimizer = keras.optimizers.Adadelta(), 
              metrics = ['accuracy'])

In [None]:
model.fit(train_x_v2, train_y, epochs = 10, batch_size = 128)

Epoch 1/10

In [None]:
score = model.evaluate(test_x_v2, test_y)
print('\nloss is: ' + str(score[0].round(4)))
print('accuracy is: ' + str(score[1]))