# HW2.1 Image classification using logistic regression on MNIST data in Tensorflow Keras framework

In this homework, we will load handwritten digits data MNIST and build a logistic regression model using Tensorflow Keras framework to classify the digits. 

You may run this on your local laptop or you can upload it to run it on Google Colab cloud. Either way, you will still submit your homework through github. If you use your own laptop, you would just need to install tensorflow first. See https://canvas.tufts.edu/courses/54565/pages/resources . 

If you need to refresh your Keras documentation, refer to:
- https://keras.io/guides/sequential_model/
- https://keras.io/guides/training_with_built_in_methods/


## 1. Getting data and inspect the data

### loading data

In [1]:
from tensorflow.keras.datasets import mnist
(training_dataset_x, training_dataset_y), (test_dataset_x, test_dataset_y) = mnist.load_data()

In [None]:
# show some images
import matplotlib.pyplot as plt

figure = plt.gcf()
figure.set_size_inches(10, 10)
for i in range(9):
  plt.subplot(3, 3, i+1)
  axis = plt.gca()
  axis.set_title(str(training_dataset_y[i]))
  plt.imshow(training_dataset_x[i].reshape(28, 28), cmap='gray')
plt.show()

### write code to inspect to data and answer the following question:

1. What is the data type of `training_dataset_x` and `training_dataset_y`?
2. What is the dimension of `training_dataset_x` and `training_dataset_y`? How about `test_dataset_x`, `test_dataset_y`? Explain why the data and the label are in this shape respectively. This is just for you to understand the data set and the labels/classes. 
3. What is the min and max values of the pixels of the images? (hint: you can inspect several elements of the `training_data_x`) You can assume all images in this data have the same range of values. 

In [None]:
# YOUR CODE HERE

### massage the data

We need to do the following data manipulation in order to prepare the data to feed into the model. Write code for: 

1. <b>Flatten each image into 1-d array</b>. The images come in 2-d array (such as x by y, where `x==y==28`), but the logistic regression needs a 1-d input. Let's flatten the images into a 1-d array of z, where `z=x*y`, using the reshape function of numpy. 
2. <b>Normalization</b>. The images' pixel values come in with a big range as you see above. ML models usually trains better if the input values are constrained in a small range with a nice distribution. In this case, let's divide all the pixel values by the max value you found above. What is the resulting range of values for this data? 
3. <b>Convert labels to catgorical data</b>. You have seen above that the labels (`training_dataset_y` and `test_dataset_y`) comes in as integers, but we want to convert them to one-hot encodings. Let's use this function from tensorflow:

```python
from tensorflow.keras.utils import to_categorical
training_dataset_y = to_categorical(training_dataset_y)
```

You can apply it to test labels too. After you did both, inspect the shapes again. What do you see? 

In [None]:
# YOUR CODE HERE

## 2. Build a logistic regression model

Recall that the multinomial logistic regression shown in the following figure consists of two layers: an input layer and an output layer, connected with a weight matrix W. 

<img src="logistic.png" alt="logisticRegression" width="400"/>

In our current image classification task, the input features are just the pixel values. Write the code to build such a logistic regression model in Keras with two layers. First create a `Sequential` model. The input layer can be ommited, so all you need to do is to add a `Dense` layer (output layer) with the number of notes equal to the number of classes. When adding this layer, you need to specify the number of output classes, input dimension of the image (one image), and activation function of the output shown in the picture above (note that there is no non-linear layers in this model so you shouldn't use a non-linear activation). Something like this:

```python
Dense(num_output, input_dim, activation)
```

In [18]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow import keras

model = Sequential()
# add dense layer (see above comments)
# YOUR CODE HERE

# compile model with adam optimizer, categorical_crossentropy loss, and add 'categorical_accuracy' to the metrics
# YOUR CODE HERE

# print out a model summary
# YOUR CODE HERE

# train model
hist = model.fit(training_dataset_x, training_dataset_y, epochs=10, batch_size=64,
validation_split=0.2)
# YOUR CODE HERE

### Add tensorboard to keep track of your training

Documentation: https://keras.io/api/callbacks/tensorboard/

You can add tensorboard, a visualization tool for monitoring your training. 

Add this and train again: 

```python
tb_callback = keras.callbacks.TensorBoard('./logs', update_freq=1)

hist = model.fit(training_dataset_x, training_dataset_y, epochs=10, batch_size=64,
validation_split=0.2,callbacks=[tb_callback])
```
What happens here is a callback, which is an action that the training will perform at a certain point. In this case, Tensorflow will write the record of the training to the directory `./logs` every epoch. 

In [None]:
# YOUR CODE HERE

## 3. Evaluate the trained model

Reference the following code for testing your model on the test data. 

In [None]:
eval_result = model.evaluate(test_dataset_x, test_dataset_y)
for i in range(len(eval_result)):
  print(f'{model.metrics_names[i]} ---> {eval_result[i]}')

## 4. Turn this model into a neural network

In the above implementation of logistic regression, the one `Desnse` layer you added basically can be considered the output layer of a neural network. Let's turn this into an actual neural netowrk by adding another "hidden" `Dense` layer before the output layer: 

- You should specify the input dimension in this hidden layer and remove `input_dim` from the output layer;
- The hidden layer should have activation function of non-linearity, such as "relu";
- You can specify the dimension (number of units) in this hidden layer as 256 (but you can pick any number you like);
- Then you can keep the output layer similar to the logistic regression `Dense` layer above (remember it should not have the argument `input_dim`)

In [None]:
# add hidden dense layer (non-linear layer)
# YOUR CODE HERE

This is basically a neural network model now. Print a model summary, what do you see? Train the model to see how well it performs. 

Then, try adding another hidden layer and retrain the model, report the results. 

In [19]:
# YOUR CODE HERE