# TensorFlow - Unit 09 - Multiclass Classification

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%202%20-%20Unit%20Objective.png"> Unit Objectives

* Fit a deep learning neural network for Multiclass Classification task




---

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%204%20-%20Import%20Package%20for%20Learning.png"> Import Package for Learning

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import seaborn as sns
sns.set_style('whitegrid')

---

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png">  Unit 09 - Multiclass Classification

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Workflow

<img width="3%" height="3%" align="top"  src=" https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Challenge%20test.png
">
 We will follow the typical process used for supervised learning that we are familiar with, but now with a few tweaks:

* Split the dataset into train, validation and test set
* Create a pipeline to handle data cleaning, feature engineering and feature scaling
* Create the neural network
* Fit the pipeline to the train set and transform the other sets
* Fit the model to the train and validation set
* Evaluate the model
* Prediction

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Load and split the data

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Let's first load the data. We are using the penguins dataset from seaborn. It has records for 3 different species of penguins, collected from 3 islands in the Palmer Archipelago, Antarctica
* Here, we are interested in predicting the penguin species based on a penguin characteristic

df = sns.load_dataset('penguins')
print(df.shape)
df.head()

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png">  When you want to create a TensorFlow model for multiclass classification, your target variable needs to be encoded as numerical since TensorFlow handles numbers.
* As a result, we create a dictionary that maps the target classes to numbers, then replace them with the target variable.
* It is better to do that before splitting the data; otherwise, you would have to do that 3 times, one for each target set (y_train, y_val y_test)

target_map = {'Adelie':0, 'Chinstrap':1, 'Gentoo':2}
df['species'] = df['species'].replace(target_map)
df.head()

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.
amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> As part of our workflow, we split the data, but now we will split it into train, validation, and test sets. 
* First, we split into train and test sets

from sklearn.model_selection import train_test_split
X_train, X_test,y_train, y_test = train_test_split(
                                    df.drop(['species'],axis=1),
                                    df['species'],
                                    test_size=0.2,
                                    random_state=0
                                    )

print("* Train set:", X_train.shape, y_train.shape, "\n* Test set:",  X_test.shape, y_test.shape)

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Then, from the train set, we split a validation set. We set the validation set as 20% of the train set
* Have a look at the print statement, showing the amount of data we have in each set (train, validation and test)

X_train, X_val,y_train, y_val = train_test_split(
                                    X_train,
                                    y_train,
                                    test_size=0.2,
                                    random_state=0
                                    )

print("* Train set:", X_train.shape, y_train.shape)
print("* Validation set:",  X_val.shape, y_val.shape)
print("* Test set:",   X_test.shape, y_test.shape)

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png">  When you want to create a TensorFlow model for a multiclass classification, you will choose the loss function when compiling the model. The "first go to" option is `categorical_crossentropy`, we will use that over the course.
* The target variable should be one hot encoded when using this loss function.
* We are converting each categorical level into new binary columns, and assigning a binary value of 1 or 0. Each binary column is a category level from the variable. The number of binary columns is the same as the number of classes from that target variable.
* The binary column is 1 when the original categorical variable represents the associated binary column. Let's see the example and learn from that. 


<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> First, let's inspect the first 5 rows from y_train

y_train[:5,]

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> We first, get the unique values from the target variables, we will use them here and when creating the model

* We use `to_categorical()` function to one hot encode in the format we require. We parse the data to to_categorical() and assign the number of classes.
* Let's again inspect the first 5 items from y_train. Note we had 3 classes (0, 1 and 2)
* 3 binary columns were created, each representing one of the possible classes (0, 1 or 2). The first row was 1, and when hot encoded, the second binary variable (that represents class 1) has the value 1, where the other binary variables are zero.

import os;
import tensorflow as tf;
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2';
from tensorflow.keras.utils import to_categorical
n_labels = y_train.nunique()

y_train = to_categorical(y=y_train, num_classes=n_labels)
y_val = to_categorical(y=y_val, num_classes=n_labels)
y_test = to_categorical(y=y_test, num_classes=n_labels)

y_train[:5,]

---


### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Pipeline for data processing

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png">  We first create a pipeline for preprocessing the data. We will list here the steps, but in a real project, you would have used your knowledge to explore the data and look for data cleaning and feature engineering steps. In this case, the steps are: 
* Impute missing data with the median for all variables (when you don't parse the `variables` argument, you define all numerical variables to be imputed. This trick saves you time)
* Impute the most frequent level in the categorical variables. We again didn't parse the `variables` argument, so it includes all categorical variables, so you didn't have to parse ['island', 'sex']
  * Note: you shouldn't do this for all datasets. We had studied the dataset before and concluded we could use this imputer for island and sex, which happen to be categorical variables.
* Encode all categorical variables (we have the same rationale from the previous bullet on not parsing explicitly the `variables` argument) 
* Feature scaling

from sklearn.pipeline import Pipeline
### Feature Engineering
from feature_engine.imputation import MeanMedianImputer
from feature_engine.imputation import CategoricalImputer
from feature_engine.encoding import OrdinalEncoder
### Feat Scaling
from sklearn.preprocessing import StandardScaler

def pipeline_pre_processing():
  pipeline_base = Pipeline([
                            
      ( 'median',  MeanMedianImputer(imputation_method='median') ),

      ( 'categorical_imputer', CategoricalImputer(imputation_method='frequent')),

      ( "ordinal",OrdinalEncoder(encoding_method='arbitrary' )),    
      
      ( "feat_scaling",StandardScaler() )

    ])

  return pipeline_base



<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png">  Next, we fit the pipeline to the train set and transformations to the validation and test set
* So the pipeline can learn the transformations (in this case it is only feature scaling) from the train set, and apply the transformation to the other sets. 
* Let's visualize the first rows from the scaled data. Note it is a 2D NumPy array

pipeline = pipeline_pre_processing()
X_train = pipeline.fit_transform(X_train)
X_val= pipeline.transform(X_val)
X_test = pipeline.transform(X_test)

X_train[:2,]

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Create Deep Learning Network

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png">  We will create a tensorflow model
* We create a function that creates a sequential model, compiles the model and returns the model. The function needs the number of features the data has and  the number of neurons in the last layer
* Let's define the network architecture
  * We noted the data has 6 features. First, we will create a simple network just for a learning experience. 
  * The network is built using Dense layers - fully connected layers
  * The input layer has the same number of neurons as the number of columns from the data. The activation function is relu. We parse the input_shape using a tuple.
  * We are using 3 hidden layers, the first with 20 neurons, then 10 neurons, and the last with 5 neurons. Both will use relu as an activation function. This approach is the "expansive-shrink" option we mentioned in a previous notebook related to model architecture.
  * After the input layer and each hidden layer, we have a dropout layer with a rate of 25% to reduce the chance of overfitting. 
* The output layer should reflect a multiclass classification.
  * We set a dense layer, where the number of neurons used is the same as the number of classes in the target variable. This information is stored in a previously created variable - `n_labels`. 
  * For multiclass classification, we set the activation function as softmax, and we compile the model with adam as optimizer and loss function as categorical_crossentropy. We also set to monitor the metric accuracy.




from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Dropout

def create_tf_model(n_features, n_labels):

  model = Sequential()

  model = Sequential()
  model.add(Dense(units=n_features,activation='relu', input_shape=(n_features,)))
  model.add(Dropout(0.25))

  model.add(Dense(units=20,activation='relu'))
  model.add(Dropout(0.25))

  model.add(Dense(units=10,activation='relu'))
  model.add(Dropout(0.25))

  model.add(Dense(units=5,activation='relu'))
  model.add(Dropout(0.25))

  model.add(Dense(n_labels, activation='softmax'))
  model.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy'])

  return model


<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Let's visualize the network structure

model = create_tf_model(n_features=X_train.shape[1], n_labels=n_labels )
model.summary()

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Once again, we can use `plot_model()` also from Keras.utils for a more graphical approach

from tensorflow.keras.utils import plot_model
plot_model(model, show_shapes=True)

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Fit the model

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Early stopping allows us to stop training when a monitored metric has stopped improving; this is useful to avoid overfitting the model to the data.
* We will monitor the validation accuracy now 
  * We set patience as 10, which is the number of epochs with no improvement, after which training will be stopped. Although there is no fixed rule to set patience, if you feel that your model was learning still, then you stopped, you may increase the value and train again.
  * We set the mode to min, since now we want the model to stop training when the loss didn't improve its performance, and improve means decrease

from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=10)

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> We finally will fit the model
* We create the model object and use `.fit()`, as usual
  * We parse the Train set
  * The epochs are set to 100. In theory, you may set a high value since we will add an early stop, which stops the training process when there is no training improvement. 
  * We parse the validation data using a tuple.
  * Verbose is set to 1 so we can see in which epochs we are and the training and validation loss.
  * Finally, we parse our callback as the early_stop object we created earlier.

* For each epoch, note the training and validation loss and accuracy. Are they increasing? Decreasing? Static?
  * Ideally, the loss should decrease as long as the epoch increases, showing a practical sign the network is learning. The accuracy should increase over the epochs.

model = create_tf_model(n_features=X_train.shape[1],  n_labels=n_labels)

model.fit(x=X_train, 
          y=y_train, 
          epochs=100,
          validation_data=(X_val, y_val),
          verbose=1,
          callbacks=[early_stop]
          )

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Model evaluation

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png">  Now we will evaluate the model performance by analyzing the train and validation losses and accuracy that happened during the training process. 
* In deep learning we use the model history to assess if the model learned, using the train and validation sets. We also evaluate separately how the model generalize on unseen data (on the test set)
* The model training history information is stored in a `.history.history` attribute from the model. 
* **Note it shows loss and accuracy for train and validation**

history = pd.DataFrame(model.history.history)
history.head()

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> We are plotting each loss and accuracy in a line plot, where the y-axis has the loss/accuracy value, the x-axis is the epoch number and the lines are colored by train or validation
* We use `.plot(style='.-')` for this task
  * Note the loss plot for training and validation data follow a similar path and are close to each other. It looks like the network learned the patterns.
  * Note in the accuracy plot that both train and validation accuracies keep increasing. When the performance "saturates" for validation, the training stops, as we set the early stopping object.

sns.set_style("whitegrid")
history[['loss','val_loss']].plot(style='.-')
plt.title("Loss")
plt.show()

print("\n")
history[['accuracy','val_accuracy']].plot(style='.-')
plt.title("Accuracy")
plt.show()

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Next, we will evaluate the model performance on the test set, using `.evaluate()` and parsing the test set. Note the value is not much different from the losses and accuracy in the train and validation set.
* Note the loss is low and accuracy is high. It looks like the model learned the relationship between the features and the target, considering all features.

model.evaluate(X_test,y_test)

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> When evaluating a deep learning model, you typically cover the loss plot and evaluate the test set; however, **you can do this as an additional step** similar to the evaluation we did in conventional ML.
* In classification, you would analyze the confusion matrix and classification report, using the custom function we have seen over the course.
* One difference is that we readapted the function to evaluate also the validation set, but that is a minor change in the code; the overall logic is the same

from sklearn.metrics import classification_report, confusion_matrix

def confusion_matrix_and_report(X,y,pipeline,label_map):
  # the prediction comes in a one hot encoded format
  prediction = pipeline.predict(X)
  # so we take the index from the highest probability, which is the "winner" or predicted class
  prediction = np.argmax(prediction, axis=1)
  
  # we also take the index from the highest probability from the actual values
  y = np.argmax(y, axis=1)
  
  print('---  Confusion Matrix  ---')
  print(pd.DataFrame(confusion_matrix(y_true=prediction, y_pred=y),
        columns=[ ["Actual " + sub for sub in label_map] ], 
        index= [ ["Prediction " + sub for sub in label_map ]]
        ))
  print("\n")

  print('---  Classification Report  ---')
  print(classification_report(y, prediction, target_names=label_map),"\n")


def clf_performance(X_train,y_train,X_test,y_test,X_val, y_val,pipeline,label_map):

  print("#### Train Set #### \n")
  confusion_matrix_and_report(X_train,y_train,pipeline,label_map)

  print("#### Validation Set #### \n")
  confusion_matrix_and_report(X_val,y_val,pipeline,label_map)

  print("#### Test Set ####\n")
  confusion_matrix_and_report(X_test,y_test,pipeline,label_map)

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Let's parse the values as usual.
* Note the model is capable of separating the classes, including in the test set

clf_performance(X_train, y_train,
                X_test,y_test,
                X_val, y_val,
                model,
                label_map= target_map.keys()
                )

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Prediction

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Let's take a sample from the test set and use it as if it was live data. We will consider 1 sample

index = 1
live_data = X_test[index-1:index,]
live_data

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> We use `.predict()` and parse the data. Note the result is not a direct 0, 1 or 2, but instead a probabilistic result for each class. 

  prediction_proba = model.predict(live_data)
  prediction_proba

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> So we take the index from the highest probability, which is the "winner" or predicted class

prediction_class = np.argmax(prediction_proba, axis=1) 
prediction_class

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Let's plot the probabilistic result, so you can check the predictions in a more visual fashion
* Read the pseudo-code
* At the end you are getting `prediction_proba`, to define the associate probability for each class. Then you plot it in a bar plot using Plotly 

# create an empty dataframe, that will show the probability per class
# we set that the probabilities as the prediction_proba
prob_per_class= pd.DataFrame(data=prediction_proba[0],
                             columns=['Probability']
                             )

# we round the values to 3 decimal points, for better visualization
prob_per_class = prob_per_class.round(3)

# we add a column to prob_per_class that shows the meaning of each class
# in this case, the species name that are mapped in the target_map dict keys
prob_per_class['Results'] = target_map.keys()

prob_per_class

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> We will use a bar plot, where x-axis shows the Result and the y-axis the associated probability for a given Result
* I encourage you to go to the first cell of the Prediction section and change the index variable so that you would take a sample. Then you run all cells to predict until the plot from the cell below
* You may change the index to another positive integer

import plotly.express as px
fig = px.bar(
        prob_per_class,
        x = 'Results',
        y = 'Probability',
        range_y=[0,1],
        width=400, height=400,template='seaborn')
fig.show()

---

