## Computer Vision

Let's do some very basic computer vision. We're going to import the MNIST handwritten digits data and $k$NN to predict values (i.e. "see/read").

1. To load the data, run the following code in a chunk:
```
from keras.datasets import mnist
df = mnist.load_data('minst.db')
train,test = df
X_train, y_train = train
X_test, y_test = test
```
The `y_test` and `y_train` vectors, for each index `i`, tell you want number is written in the corresponding index in `X_train[i]` and `X_test[i]`. The value of `X_train[i]` and `X_test[i]`, however, is a 28$\times$28 array whose entries contain values between 0 and 256. Each element of the matrix is essentially a "pixel" and the matrix encodes a representation of a number. To visualize this, run the following code to see the first ten numbers:
```
import matplotlib.pyplot as plt
import numpy as np
np.set_printoptions(edgeitems=30, linewidth=100000)
for i in range(5): 
    print(y_test[i],'\n') # Print the label
    print(X_test[i],'\n') # Print the matrix of values
    plt.contourf(np.rot90(X_test[i].transpose())) # Make a contour plot of the matrix values
    plt.show()
```
OK, those are the data: Labels attached to handwritten digits encoded as a matrix.

In [None]:
from keras.datasets import mnist
df = mnist.load_data('minst.db')
train,test = df
X_train, y_train = train
X_test, y_test = test

import matplotlib.pyplot as plt
import numpy as np
np.set_printoptions(edgeitems=30, linewidth=100000)
for i in range(5): 
    print(y_test[i],'\n') # Print the label
    print(X_test[i],'\n') # Print the matrix of values
    plt.contourf(np.rot90(X_test[i].transpose())) # Make a contour plot of the matrix values
    plt.show()

2. What is the shape of `X_train` and `X_test`? What is the shape of `X_train[i]` and `X_test[i]` for each index `i`? What is the shape of `y_train` and `y_test`?

In [None]:
print(X_train.shape)
print(X_test.shape)
print(X_train[0].shape)
print(X_test[0].shape)
print(y_train.shape)
print(y_test.shape)

> There are 60,000 matrices of size 28 by 28 in the training set, and 10,000 matrices of size 28 by 28 in the test set. The y_train vector has 60,000 numeral assignments, and the y_test vector has 10,000 numeral assignments. Basically, each `X_train[i]` is a matrix of values in two-dimensional space, associated with a numeral in `y_train[i]`

3. Use Numpy's `.reshape()` method to covert the training and testing data from a matrix into an vector of features. So, `X_test[index].reshape((1,784))` will convert the $index$-th element of `X_test` into a $28\times 28=784$-length row vector of values, rather than a matrix. Turn `X_train` into an $N \times 784$ matrix $X$ that is suitable for scikit-learn's kNN classifier where $N$ is the number of observations and $784=28*28$ (you could use, for example, a `for` loop).

In [None]:
import pandas as pd
import numpy as np

# To save on reloading cost, I save the reshaped data and reload it rather than run the
# code that loops over appending the rows 

reload = 0 # Control the way data loads

if reload == 1:  # If reload is 1, do the reshaping process
    Z_train = []
    for i in range(60000):
        row = X_train[i].reshape((1,784)) # Turn the matrix for i into a row vector of features
        Z_train.append(row[0]) # Append the row vector to the list
    Z_train = pd.DataFrame(Z_train)
    Z_train.to_csv('/Users/vaibhavjha/labsvj/03_computer_vision/Z_train.csv')
    #
    Z_test = []
    for i in range(len(y_test)):
        row = X_test[i].reshape((1,784)) # Turn the matrix for i into a row vector of features
        Z_test.append(row[0]) # Append the row vector to the list
    Z_test = pd.DataFrame(Z_test)
    Z_test.to_csv('/Users/vaibhavjha/labsvj/03_computer_vision/Z_train.csv')
else: # If reload is not 1, just load the reshaped data
    Z_train = pd.read_csv('/Users/vaibhavjha/labsvj/03_computer_vision/Z_train.csv')
    Z_test = pd.read_csv('/Users/vaibhavjha/labsvj/03_computer_vision/Z_train.csv')

4. Use the reshaped `X_test` and `y_test` data to create a $k$-nearest neighbor classifier of digit. What is the optimal number of neighbors $k$? If you can't determine this, play around with different values of $k$ for your classifier.

In [None]:
from sklearn.neighbors import KNeighborsClassifier

# Determine the optimal k:
k_bar = 50
k_grid = np.arange(2,k_bar) # The range of k's to consider
accuracy = np.zeros(k_bar) 

for k in range(k_bar):
    knn = KNeighborsClassifier(n_neighbors=k+1)
    predictor = knn.fit(Z_train.values,y_train) 
    #y_hat = predictor.predict(Z_test.values) 
    accuracy[k] = knn.score(Z_test.values,y_test) # Bug in sklearn requires .values

accuracy_max = np.max(accuracy) # highest recorded accuracy
max_index = np.where(accuracy==accuracy_max) 
k_star = k_grid[max_index] # Find the optimal value of k
print(k_star)

plt.plot(np.arange(0,k_bar),accuracy) # Plot accuracy by k
plt.xlabel("k")
plt.title("optimal k:"+str(k_star))
plt.ylabel('Accuracy')
plt.show()


5. For the optimal number of neighbors, how well does your predictor perform on the test set?

In [None]:
knn = KNeighborsClassifier(n_neighbors=1)
predictor = knn.fit(Z_train.values,y_train) 
y_hat = predictor.predict(Z_test.values) 

accuracy = knn.score(Z_test.values,y_test) # Bug in sklearn requires .values
print('Accuracy: ', accuracy)

pd.crosstab(y_test, y_hat)


> With k=3, the rule is 90% accurate on the test set. When it does make mistakes, it tends to be things like confusing 4 for 9 or 8 for 3 or 7 for 1, which is understandable. It is remarkable that a simple algorithm like kNN does this well at classifying such complex data.

6. For your confusion matrix, which mistakes are most likely? Do you find any interesting patterns?

> The biggest mistakes are mistaking a 7 for 1 (39), a 9 for a 7 (40), an 8 for a 3 (50), a 9 for a 4 (51), and a 8 for a 5 (45), a 4 for a 9 (81). The pattern here is that these are all very visually similar, so it makes sense that a computer would make these mistakes, since even humans make these mistakes sometimes, especially when the written value isn't very legible.

7. So, this is how computers "see." They convert an image into a matrix of values, that matrix becomes a vector in a dataset, and then we deploy ML tools on it as if it was any other kind of tabular data. To make sure you follow this, invent a way to represent a color photo in matrix form, and then describe how you could convert it into tabular data. (Hint: RGB color codes provide a method of encoding a numeric value that represents a color.)

> The current data include an "intensity" for each pixel in the 28$\times$28 grid. To add color, we could have three $28 \times 28$ matrices that each capture the Red, Green, or Blue color intensity. Then we would reshape the three matrices and put them side by side into one long row to create tabular data, like we did above.