# Image Classification

**Challenges.** Since this task of recognizing a visual concept (e.g. cat) is relatively trivial for a human to perform, it is worth considering the challenges involved from the perspective of a Computer Vision algorithm. As we present (an inexhaustive) list of challenges below, keep in mind the raw representation of images as a 3-D array of brightness values:

* **Viewpoint variation.** A single instance of an object can be oriented in many ways with respect to the camera.
* **Scale variation.** Visual classes often exhibit variation in their size (size in the real world, not only in terms of their extent in the image).
* **Deformation.** Many objects of interest are not rigid bodies and can be deformed in extreme ways.
* **Occlusion.** The objects of interest can be occluded. Sometimes only a small portion of an object (as little as few pixels) could be visible.
* **Illumination conditions.** The effects of illumination are drastic on the pixel level.
* **Background clutter.** The objects of interest may blend into their environment, making them hard to identify.
* **Intra-class variation.** The classes of interest can often be relatively broad, such as chair. There are many different types of these objects, each with their own appearance.

A good image classification model must be invariant to the cross product of all these variations, while simultaneously retaining sensitivity to the inter-class variations.

In [26]:
import numpy as np

In [1]:
def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict

In [75]:
train_1 = unpickle('./cifar-10-batches-py/data_batch_1')
train_2 = unpickle('./cifar-10-batches-py/data_batch_2')
train_3 = unpickle('./cifar-10-batches-py/data_batch_3')
train_4 = unpickle('./cifar-10-batches-py/data_batch_4')
train_5 = unpickle('./cifar-10-batches-py/data_batch_5')

In [95]:
train_data_1 = train_1[b'data']
train_labels_1 = train_1[b'labels']
train_data_2 = train_2[b'data']
train_labels_2 = train_2[b'labels']
train_data_3 = train_3[b'data']
train_labels_3 = train_3[b'labels']
train_data_4 = train_4[b'data']
train_labels_4 = train_4[b'labels']
train_data_5 = train_5[b'data']
train_labels_5 = train_5[b'labels']

train_data =  np.append(train_data_1, train_data_2, axis = 0)
train_data = np.append(train_data, train_data_3, axis = 0)
train_data = np.append(train_data, train_data_4, axis = 0)
train_data = np.append(train_data, train_data_5, axis = 0)

#print(train_data.shape) (50000,3072)

train_labels = np.append(train_labels_1, train_labels_2, axis = 0)
train_labels = np.append(train_labels, train_labels_3, axis = 0)
train_labels = np.append(train_labels, train_labels_4, axis = 0)
train_labels = np.append(train_labels, train_labels_5, axis = 0)


#print(train_labels.shape) (50000,)

In [22]:
test = unpickle('./cifar-10-batches-py/test_batch')

In [72]:
test_data = test[b'data']
test_labels = test[b'labels']

In [100]:
class NearestNeighbor(object):
  def __init__(self):
    pass

  def train(self, X, y):
    """ X is N x D where each row is an example. Y is 1-dimension of size N """
    # the nearest neighbor classifier simply remembers all the training data
    self.Xtr = X
    self.ytr = y

  def predict(self, X):
    """ X is N x D where each row is an example we wish to predict label for """
    num_test = X.shape[0]
    # lets make sure that the output type matches the input type
    Ypred = np.zeros(num_test, dtype = self.ytr.dtype)

    # loop over all test rows
    for i in range(num_test):
      # find the nearest training image to the i'th test image
      # using the L1 distance (sum of absolute value differences)
      distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
      min_index = np.argmin(distances) # get the index with smallest distance
      Ypred[i] = self.ytr[min_index] # predict the label of the nearest example

    return Ypred

In [106]:
nn = NearestNeighbor() # create a Nearest Neighbor classifier class
nn.train(train_data, train_labels) # train the classifier on the training images and labels
Yte_predict = nn.predict(test_data) # predict labels on the test images
# and now print the classification accuracy, which is the average number
# of examples that are correctly predicted (i.e. label matches)
print('accuracy: %f' % ( np.mean(Yte_predict == test_labels) ))

accuracy: 0.249200


>*Evaluate on the test set only a single time, at the very end.*

> *Split your training set into training set and a validation set. Use validation set to tune all hyperparameters. At the end run a single time on the test set and report performance.*