# Project 1: How to Train Your Model

In Jupyter Notebook 2: Matrices, we saw that images are stored digitally as multidimensional arrays. Another name for a multidimensional array is a *tensor*. This project will introduce you to TensorFlow, a machine learning programming library created by Google and used by many companies for creating and training models. It’s named after tensors, which are the primary object of coding with TensorFlow.

Let's check to see if you have Tensorflow installed, and install it if necessary.

In [None]:
try:
    import tensorflow as tf
    print("\nTensorflow is installed.")
except ImportError as e:
    print("Tensorflow not installed, please install it using 'pip install tensorflow.")
    pass

In [None]:
# Only run this cell if you need to install tensorflow.
pip install tensorflow # install tensorflow

### **Problem 1**
It’s important to appreciate the computational efficiency of programming with tensors. Let’s work through an example that illustrates this. In the two code cells below, compute the element-wise product of two vectors. I provided you with the code for using a TensorFlow command. In the code cell below that, perform the same computation using a for loop and the multiplication operation `*`. *Hint:* You should use the command `product = np.zeros(size)` to initialize a vector for your for loop.

In [None]:
import numpy as np
import tensorflow as tf
import time

size = (50000,)
u = tf.random.uniform(size)
v = tf.random.uniform(size)

start_time = time.perf_counter()
tf.math.multiply(u, v)
end_time = time.perf_counter()
execution_time = end_time - start_time

print(f"Execution time: {execution_time} seconds")

## Tensorflow

Many of the array operations and functions in Tensorflow should feel familiar to NumPy, however some basic operations require us to use TensorFlow functions.

### **Problem 2**

Use the commands `tf.constant`, `tf.matmul`, `tf.scalar_mul`, and `tf.add` to define two $2\times 2$ matrices $A$ and $B$ and compute $AB+3B$.

### **Problem 3**

Print the entry in the first row and second column of the tensor $A$ you defined in the previous problem. Describe the output in a sentence in a comment or Markdown cell below your code.

Sometimes it's convenient to convert tensor objects to NumPy arrays and vice versa. Let's see how this is done below.

In [None]:
print(A.numpy()) # convert tensor to numpy array

A_r1 = A.numpy()[0] # first row of A as numpy array
A_r1 = tf.constant(A_r1) # convert numpy array to tensor

print(A_r1) # convert numpy array to tensor 


### **Problem 4**

Print out the tensor $B$ as a NumPy array.

## MNIST Dataset

Now let's load the MNIST dataset, which contains 70,000 samples of handwritten digits. This data is conveniently included in TensorFlow.

In [7]:
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data() # load dataset and split into training and testing sets

### **Problem 5**

Display the first image. What is the image? Use Python and the `train_labels` to print a message stating the digit in the image.  *Tip: Refer to Jupyter Notebook 2 if you need to review how to display an image.*

### Preparing the Data

Computer processors comprise many transistors, which are essentially on/off switches. When we say computers work in binary, really we are referring to these transistors. For the most part, in all of our computations we will store float values as `float32`, which means each number is represented by `32` transistors (or bits if we want to refer to the binary representation). 

Computers are generally more efficient working with float values between 0 and 1. Let's explore why.

### **Problem 6**

1. By either computing yourself, or doing an internet search, determine how many floating point numbers are there in a 32-bit system. What about a normalized 32-bit system?

2. By either computing yourself, or doing an internet search, determine the decimal precision of a 32-bit floating point number and a 32-bit floating point number in the range $0$ to $1$.

Write your answers in the Markdown cell below.

### **Problem 7**

Normalize the training and testing datasets and use the `tf.reshape` command to reshape the images to be arrays of size `(784,)`.

### **Problem 8**

As we discussed in class, we need to convert the labels to categorical data. Fortunately, there is a command for this: `tf.keras.utils.to_categorical`. 

Convert the training and testing labels to categorical data.

### Initializing and Training Our Model

Defining a feedforward neural network using Tensorflow takes just a few lines of code.

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(784,)),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

Now we should specify our loss function and *optimizer*. The optimizer is our method for training--we'll discuss more methods in the future. For now, let's use stochastic gradient descent with `learning_rate=0.01`.

In [34]:
loss_function = tf.keras.losses.MSE
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)

model.compile(optimizer=optimizer, loss=loss_function, metrics=['accuracy']) # initialize model weights and compile model


### **Problem 9**

Apply the model to the training data by running `model(train_images)`--define this as `pred_labels`. Then compute and print `loss_function(train_labels, pred_labels)`.

Now let's train our model:

In [None]:
model.fit(train_images,
          train_labels, 
          epochs=5, 
          batch_size=32) # train model on training data for 5 epochs with batch size of 32

### **Problem 10**
Complete the following tasks:

1. Load the Fashion MNIST dataset using the command `tf.keras.datasets.fashion_mnist.load_data()`
2. Display the first 5 images and print their labels.
3. Prepare the training and testing data. That is, normalize and reshape the images and convert the labels to categorical data.
4. Define and train a model with $784\times 128\times 64\times 32\times 10$ architecture with the ReLU activation function in every layer except the last. For the final layer, use softmax for the activation function. Use stochastic gradient descent to train the model and mean squared error for its loss function. Train until your model has at least $90\%$ accuracy on the training data. 