# TensorFlow - Unit 10 - Image Classification

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%202%20-%20Unit%20Objective.png"> Unit Objectives

* Fit a convolutional neural network for Classification task using image dataset



---

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%204%20-%20Import%20Package%20for%20Learning.png"> Import Package for Learning

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import seaborn as sns
sns.set_style('white')

---

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Unit 10 - Image Classification: Toy Datasets

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Workflow

<img width="3%" height="3%" align="top"  src=" https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Challenge%20test.png
">
 We will follow the typical process used for supervised learning which we are familiar with, but now with a few tweaks:

* Split the dataset into train, validation and test set
* Preprocess the image data
* Create the neural network
* Fit the model to the train and validation set
* Evaluate the model
* Prediction

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Load and split the data

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Let's first load the data. We are using the mnist dataset from tensorflow. It has handwritten digits from 0 to 9.
* Here, we are interested in predicting numbers based on the handwritten digit image
* This is a toy dataset, where all images are provided in a single and standardized format and arranged in a NumPy array.
  * This is useful for learning purposes. However, actual image datasets rarely have the characteristic of having all images of the same size. In the first walkthrough project, we will handle a dataset where its images have different sizes. For now, we are focused on the workflow for managing the image dataset


import os;
import tensorflow as tf;
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2';
from tensorflow.keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print(X_train.shape, X_test.shape)

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> We have already explored this dataset in a previous unit notebook; however, you may remember you choose an index to reveal a number
* We are using `plt.imshow()` to display the NumPy array as an image

pointer = 88

print(f"array pointer = {pointer}")
print(f"x_train[{pointer}] shape: {X_train[pointer].shape}")
print(f"label: {y_train[pointer]}")

plt.imshow(X_train[pointer],cmap='gray')
plt.show()

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> We loaded the data already in train and test sets. This happens since TensorFlow provides the data in this format already.
* from the train set, we split a validation set. We set the validation set as 20% of the train set
* Have a look at the print statement, showing the amount of data we have in each set (train, validation and test)

from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(
                                    X_train,
                                    y_train,
                                    test_size=0.2,
                                    random_state=0
                                    )

print("* Train set:", X_train.shape, y_train.shape)
print("* Validation set:",  X_val.shape, y_val.shape)
print("* Test set:",   X_test.shape, y_test.shape)

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> We first, get the unique values from the target variables, we will use them when evaluating the model performance

target_classes= np.unique(y_train)
target_classes

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Next, we get the number of unique values from the target variables, we will use them here and when creating the model

n_labels = len(np.unique(y_train))
n_labels

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Next, let's inspect the first 5 rows from y_train

y_train[:5,]

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png">  Similarly to the previous notebook, we use the `to_categorical()` function to one hot encode in the required format. We parse the data to `to_categorical()` and assign the number of classes.
* Let's inspect the first 5 items again from `y_train` after transformation.

from tensorflow.keras.utils import to_categorical
y_train = to_categorical(y=y_train, num_classes=n_labels)
y_val = to_categorical(y=y_val, num_classes=n_labels)
y_test = to_categorical(y=y_test, num_classes=n_labels)

y_train[:5,]

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Data processing

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png">  We first need to preprocess the data. 
* In this exercise, we will check if scaling the data and reshaping the array size is required.

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png">  Let's evaluate if we need to scale the data.
* Scale is important since the algorithm learns best when the data is in a shorter range; in this case, it has a range of 0 to 1
* Since X_train is in an array format, we can get the max() value; if it is greater than 1, it means we would need to scale.
  *  We note the max value is 255. The pixels values of 255 mean maximum light (or white), where 0 means min light (or black)

`X_train.max() = 255`

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> As a result, we will scale the data by dividing the NumPy arrays (X_train, X_val and X_test) by 255.

`X_train = X_train / 255`
`X_val = X_val / 255`
`X_test = X_test / 255`

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Let's again check the max value. The data is now in the proper format to feed the neural network

`X_train.max() = 1.0`

However, we don't need to do this in this dataset as it is already a balanced dataset

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Next, we will look at the image shape
* We note it has 3 dimensions
  * the first is referred to the number of images, in this case, the X_train has 48k samples (or images)
  * the next 2 are the image size: 28x28
  * However, it is missing one last dimension, the channel. In this case, the image is gray, as a result, there is one channel. If the image was colored, it would be 3 channels (RGB)

X_train.shape

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> We will simply reshape the array, where we essentially add a **1** to the last dimension. In that way, we add the channel dimension to the data
* We reshape all sets (X_train, X_val and X_test)
* We reshape with its current 1st, 2nd and 3rd dimension, and we force the last dimension to be 1

X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], X_train.shape[2], 1)
X_val = X_val.reshape(X_val.shape[0], X_val.shape[1], X_val.shape[2], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], X_test.shape[2], 1)

X_train.shape

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%206%20-%20Warning.png"> Note: we are taking these steps for scaling the data and reshaping to include the channel dimension since the data was provided in such format and is in a NumPy array format
* When you get image datasets in a NumPy format, you will recheck these items, and if required, you will need to process them.
* However, when dealing with real images, the preprocessing tasks are done in another way, which we will cover in the walkthrough project 1

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Create Deep Learning Network

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png">  We will create a tensorflow model
* We create a function that creates a sequential model, compiles the model and returns the model. The function needs the input shape (image size) as well as the number of neurons in the last layer 


<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> The network has 2 pairs of Convolution + Pooling layers. We know in advance that for this dataset that 1 pair would be enough; however, we want to showcase multiple pairs of convolutions + pooling layers.
* Quick recap: convolution layers are used to select the dominant pixel value from within images using filters. Pooling layers reduce the image size by extracting only the dominant pixels
* The first pair has a convolution layer with 16 filters and kernel size 3x3. We parse the input shape as well as the relu as an activation function. The MaxPool has a pool size of 2x2
* The next pair has the same setup as the previous pair



<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Next, there is a Flatten layer
* The Flatten layer is used to flatten the matrix into a vector, which means a single list of all values. Then that is fed into a dense layer.

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Next, there is a Dense layer with 128 neurons.
* Typically, here, you arrange the dense layers in multiples of 2, and the number of layers depends on the data complexity after the Flatten layer.
  * We will check in the .summary() or .plot_model() that the data shape after the Flatten layer is 400, so it makes sense to reduce the number of neurons from this case 400 to 128. So naturally, you will only know the output from the Flatten layer is 400 after creating a model and checking the summary/plot_model.
  * If the output from the Flatten layer were much higher, like 5k, you would consider 2 or more dense layers to reduce the number of connections progressively.
  * The value 128 is a good starting point. If you notice the CNN is not learning, you may add more dense layers and adjust the number of neurons in them
* After, we have a dropout layer with a rate of 25% to reduce the chance of overfitting. 


<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> The output layer should reflect a multiclass classification.
  * We set a dense layer, where the number of neurons equals the number of classes in the target variable. This information is stored in a previously created variable - `n_labels`. 
  * For multiclass classification, we set the activation function as softmax, and we compile the model with adam as optimizer and loss function as categorical_crossentropy. We also arranged to monitor the metric accuracy.


from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, Flatten, Dropout

def create_tf_model(input_shape, n_labels):
  model = Sequential()
  
  model.add(Conv2D(filters=16, kernel_size=(3,3),input_shape=input_shape, activation='relu',))
  model.add(MaxPool2D(pool_size=(2, 2)))

  model.add(Conv2D(filters=16, kernel_size=(3,3), activation='relu',))
  model.add(MaxPool2D(pool_size=(2, 2)))

  model.add(Flatten())
  
  model.add(Dense(128, activation='relu'))
  model.add(Dropout(0.25))
  
  model.add(Dense(n_labels, activation='softmax'))
  model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) 

  return model


<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Let's visualize the network structure
* The network has just over 55k parameters. We will study the layer's input/output in the next cell 

model = create_tf_model(input_shape=X_train.shape[1:], n_labels=n_labels )
model.summary()

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Once again, we can use `plot_model()` also from Keras.utils for a more graphical approach
* Note in the first convolution, the input is 28x28, but then it reduces to 26x26, due to the convolution dynamic (where you lose 2 pixels at the edge of the image, in both directions). Then it goes to a pooling layer, and the image is halved (since the kernel is 2x2): 13 x 13
* In the second convolution, the same dynamic happens, 2 pixels in each direction are lost due to the convolution, and the pooling layer halves the image due to the kernel size
* The Flatten layer transforms the pooled image to a single vector by multiplying all dimensions from the pooled image
* Next, there is a dense layer of 128 and finally an output layer with 10 neurons (where each represents a number from 0 to 9)

from tensorflow.keras.utils import plot_model
plot_model(model, show_shapes=True)

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Fit the model

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Early stopping allows us to stop the training when a monitored metric has stopped improving; This is useful to avoid overfitting the model to the data.
* We will monitor the validation accuracy now 
  * We set patience as 1, the number of epochs with no improvement, after which training will be stopped. There is no fixed rule to set patience; if you feel that your model was learning still and you stopped, you may increase the value and train again. However, we want the training process to be quick, so we also set patience to 1 since the idea here is to provide you with a "look and feel" learning experience.
  * We set the mode to min, since now we want the model to stop training when the loss didn't improve its performance, and improve means decrease

from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=1)

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> We finally will fit the model
* We create the model object and use `.fit()`, as usual
  * We parse the Train set
  * The epochs are set to 4. We know in advance that this amount is fine to learn the patterns considering the dataset and the network structure
  * We parse the validation data in a tuple.
  * Verbose is set to 1 so we can see in which epochs we are and the training and validation loss.
  * Finally, we parse our callback as the early_stop object we created earlier.

* For each epoch, note the training and validation loss and accuracy. Is it increasing? Decreasing? Static?
  * Ideally, the loss should decrease as long as the epoch increases, showing a practical sign the network is learning. The accuracy should increase over the epochs.
  * Note, the model will take a bit longer now to train

model = create_tf_model(input_shape= X_train.shape[1:], n_labels=n_labels )

model.fit(x=X_train, 
          y=y_train, 
          epochs=4,
          validation_data=(X_val, y_val),
          verbose=1,
          callbacks=[early_stop]
          )

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Model evaluation

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png">  Now we will evaluate the model performance by analyzing the train and validation losses and accuracy that happened during the training process. 
* In deep learning we use the model history to assess if the model learned, using the train and validation sets. We also evaluate separately how the model generalize on unseen data (on the test set)
* The model training history information is stored in a `.history.history` attribute from the model. 
* **Note it shows loss and accuracy for train and validation**

history = pd.DataFrame(model.history.history)
history.head()

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> We are plotting each loss and accuracy in a line plot, where the y-axis has the loss/accuracy value, the x-axis is the epoch number and the lines are colored by train or validation
* We use `.plot(style='.-')` for this task
  * Note the loss plot for training and validation data follow a similar path and are close to each other. So it looks like the network learned the patterns.
  * Note in the accuracy plot that both train and validation accuracies keep increasing. The training stops when the performance "saturates" for validation, as we set in the early stopping object.

sns.set_style("whitegrid")
history[['loss','val_loss']].plot(style='.-')
plt.title("Loss")
plt.show()

print("\n")
history[['accuracy','val_accuracy']].plot(style='.-')
plt.title("Accuracy")
plt.show()

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Next, we will evaluate the model performance on the test set, using `.evaluate()` and parsing the test set. Note the value is not much different from the losses and accuracy in the train and validation set.
* Note the loss is low and accuracy is high. It looks like the model has learned the relationships between the features and the target.

model.evaluate(X_test,y_test)

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> When evaluating a deep learning model, you typically cover the loss plot and evaluate the test set; however, **as an additional step, you can do if you want** a similar evaluation we did in conventional ML.
* In classification, you would analyze the confusion matrix and classification report, using the custom function we have seen over the course.
* One difference is that we readapted the function to evaluate also the validation set, but that is a minor change in the code; the overall logic is the same

from sklearn.metrics import classification_report, confusion_matrix

def confusion_matrix_and_report(X,y,pipeline,label_map):
  # the prediction comes in a one hot encoded format
  prediction = pipeline.predict(X)
  # so we take the index from the highest probability, which is the "winner" or predicted class
  prediction = np.argmax(prediction, axis=1)
  
  # we also take the index from the highest probability from the actual values
  y = np.argmax(y, axis=1)

  print('---  Confusion Matrix  ---')
  print(pd.DataFrame(confusion_matrix(y_true=prediction, y_pred=y),
        columns=[ ["Actual " + sub for sub in label_map] ], 
        index= [ ["Prediction " + sub for sub in label_map ]]
        ))
  print("\n")

  print('---  Classification Report  ---')
  print(classification_report(y, prediction, target_names=label_map),"\n")


def clf_performance(X_train,y_train,X_test,y_test,X_val, y_val,pipeline,label_map):

  print("#### Train Set #### \n")
  confusion_matrix_and_report(X_train,y_train,pipeline,label_map)

  print("#### Validation Set #### \n")
  confusion_matrix_and_report(X_val,y_val,pipeline,label_map)

  print("#### Test Set ####\n")
  confusion_matrix_and_report(X_test,y_test,pipeline,label_map)

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> We need label_map to be a list with the classes meaning in a string format.
* We have target_classes, which is a list that represents the class meaning, however, it is an integer list
* We will convert the list of integers to a list of strings using a list comprehension.

[str(x) for x in target_classes]

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Let's parse the values as usual.
* Note the model is capable of separating the classes, including in the test set

clf_performance(X_train, y_train,
                X_test,y_test,
                X_val, y_val,
                model,
                label_map= [str(x) for x in target_classes]
                )

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Prediction

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Let's take a sample from the test set and use it as if it was live data. We will consider 1 sample

index = 102
my_number = X_test[index]
print(my_number.shape)
print(y_test[index])

sns.set_style('white')
plt.imshow(my_number.reshape(28,28), cmap='gray')
plt.show()

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> We first investigate the shape of our live data. It has 3 dimensions as we would expect from an image, in this case, it shows the image size (28 x 28) and the channel information (it is 1 since it is a grey color image)

my_number.shape

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> However when interacting with the model, we need the data in 4 dimensions, where the first dimension is the number of images the data has, the next 2 are the image size and the last is the color channels
* In our case, we need to add the first dimension and the value will be 1, so the final shape is (**1** ,28 ,28 ,1 )
* We use the command ` np.expand_dims()` for this task. The documentation link is [here](https://numpy.org/doc/stable/reference/generated/numpy.expand_dims.html).

live_data = np.expand_dims(my_number, axis=0)
print(live_data.shape)

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> We use `.predict()` and parse the data. Note the result is a probabilistic result for each class. 

prediction_proba = model.predict(live_data)
prediction_proba

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> So we take the index from the highest probability, which is the "winner" or predicted class

prediction_class = np.argmax(prediction_proba, axis=1) 
prediction_class

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Let's plot the probabilistic result, so you can check the predictions in a more visual fashion
* Read the pseudo-code
* At the end you are getting `prediction_proba`, to define the associate probability for each class. Then you plot it in a bar plot using Plotly 

# create an empty dataframe, that will show the probability per class
# we set that the probabilities as the prediction_proba
prob_per_class= pd.DataFrame(data=prediction_proba[0],
                             columns=['Probability']
                             )

# we round the values to 3 decimal points, for better visualization
prob_per_class = prob_per_class.round(3)

# we add a column to prob_per_class that shows the meaning of each class
# in this case, the species name that is mapped in the target_classes
prob_per_class['Results'] = target_classes

prob_per_class

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> We will use a bar plot, where x-axis shows the Result and y-axis the associated probability for a given Result
* I encourage you to go to the first cell of the Prediction section and change the index variable so that you would take a sample. Then run all cells to predict until the plot from the cell below
* You may change the index to other positive integers

import plotly.express as px
fig = px.bar(
        prob_per_class,
        x = 'Results',
        y = 'Probability',
        range_y=[0,1],
        width=600, height=400,template='seaborn')
fig.update_xaxes(type='category')
fig.show()

---