# Introduction to neural network classification with Tensorflow.

In this notebook we are going to learn how to write neural network for classification problems.

A classification is where you try to classify something as one thing or the other.

A few types of classification problems includes:

* Binary classification
* Multiclass classification
* Multilabel classification


# Creating data to view and fit

In [None]:
from sklearn.datasets import make_circles

# Make 1000 examples
n_samples = 1000

# Create circles
X, y = make_circles(n_samples,
                    noise=0.03,
                    random_state=42)


In [None]:
# Check out the features
X

In [None]:
# Check the labels
y[:10]

# Our data is a little hard to understand right now let visualize it 

In [None]:
import pandas as pd
circles = pd.DataFrame({"X0":X[:,0], "X1":X[:, 1], "label":y}) 
circles

In [None]:
circles["label"].value_counts()

In [None]:
# Visualize with a plot 
import matplotlib.pyplot as plt
plt.scatter(X[:,0], X[:, 1], c=y, cmap=plt.cm.RdYlBu)

In [None]:
## IInput and output of our features and labels
X.shape, y.shape

In [None]:
# Lets check homany samples we are working with
len(X), len(y)

In [None]:
# View the first examples of features and labels 
X[0], y[0]

## Steps in modeling 

steps in modeling with tensorflow are typically:

1. Create or import a model
2. Compile the model
3. Fit the model
4. Evaluate the model
5. Tweak the model
6. Evaluate...

In [None]:
import tensorflow as tf
# Set the random seed 
tf.random.set_seed(42)

# 1. Create the model using the sequential API
model_1 = tf.keras.Sequential([
  tf.keras.layers.Dense(1)
])

# 2. Compile the model
model_1.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=tf.keras.optimizers.SGD(),
                metrics=["accuracy"])

# 3. Fit the model
model_1.fit(tf.expand_dims(X, axis=-1), y, epochs=5)


In [None]:
 #lets try to improve  our model by training for longer
 model_1.fit(tf.expand_dims(X, axis=-1), y, epochs=200, verbose=0)
 model_1.evaluate(X, y)

Since we'er working on a binary classification problem and our model is getting around ~50% accuracy... it's perfoming as if it's guessing.

So, let's step things up a notch and add some extra layaer

In [None]:
# Set the random seed 
tf.random.set_seed(42)

# 1. Create a model, this time with 2 layers
model_2 = tf.keras.Sequential([
  tf.keras.layers.Dense(1),
  tf.keras.layers.Dense(1)                             
])

# 2, Compile the model
model_2.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=tf.keras.optimizers.SGD(),
                metrics=["accuracy"])

# 3. fit the model
model_2.fit(tf.expand_dims(X, axis=-1), y, epochs=100, verbose=0)

In [None]:
# 4. Evaluate the model
model_2.evaluate(X, y)

# Improving our model

Let's look into our bag of tricks to see how we can improve our model.

1. Create a model - we might want to add more layers or increase the number of hidden units within a layer 
2. Compiling a model - here we might wan to choose a different optimization function such as Adam instead of SGD.
3. Fitting a model - perhaps we might fit our model for more epochs(leave it training for longer). 

In [None]:
# Set the random seed
tf.random.set_seed(42)

# 1.  Create the model (this time 3 layers)
model_3 = tf.keras.Sequential([
  tf.keras.layers.Dense(100), # add 100 dense neurons
  tf.keras.layers.Dense(10),  # add another layer with 10 neurons
  tf.keras.layers.Dense(1)   
])

# 2. Compile the model
model_3.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

# 3. Fit the model
model_3.fit(tf.expand_dims(X, axis=-1), y, epochs=100)


In [None]:
model_3.evaluate(X, y)

In [None]:
model_3.predict(X)

To visualize our model predictions, let's create a function plot_decision_boundary(), this function will:

* Take in a trained model, feature(X) and label (y)
* Create a meshgrid of the different X value_counts
* Make predictions accross the meshgrid
* Plot the predictions as well as a line between zones (where each unique class falls) 

In [None]:
import numpy as np

In [None]:
def plot_decision_boundary(model, X, y):
  """
  Plots the decision boundary created by a model predicting on X.
  This function has been adapted from two phenomenal resources:
   1. CS231n - https://cs231n.github.io/neural-networks-case-study/
   2. Made with ML basics - https://github.com/GokuMohandas/MadeWithML/blob/main/notebooks/08_Neural_Networks.ipynb
  """
  # Define the axis boundaries of the plot and create a meshgrid
  x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
  y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
  xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                       np.linspace(y_min, y_max, 100))
  
  # Create X values (we're going to predict on all of these)
  x_in = np.c_[xx.ravel(), yy.ravel()] # stack 2D arrays together: https://numpy.org/devdocs/reference/generated/numpy.c_.html
  
  # Make predictions using the trained model
  y_pred = model.predict(x_in)

  # Check for multi-class
  if model.output_shape[-1] > 1: # checks the final dimension of the model's output shape, if this is > (greater than) 1, it's multi-class 
    print("doing multiclass classification...")
    # We have to reshape our predictions to get them ready for plotting
    y_pred = np.argmax(y_pred, axis=1).reshape(xx.shape)
  else:
    print("doing binary classifcation...")
    y_pred = np.round(np.max(y_pred, axis=1)).reshape(xx.shape)
  
  # Plot decision boundary
  plt.contourf(xx, yy, y_pred, cmap=plt.cm.RdYlBu, alpha=0.7)
  plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.RdYlBu)
  plt.xlim(xx.min(), xx.max())
  plt.ylim(yy.min(), yy.max())


In [None]:
# Check out the predictions our model is making
plot_decision_boundary(model=model_3,
                       X=X,
                       y=y)

In [None]:
# Visualize the variables of our plot functions
x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1

In [None]:
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                       np.linspace(y_min, y_max, 100))

In [None]:
model_2.summary

Some code will not run due to the fact that there has been an upgrade to tf__version__ .

the code below is the correct way to train neural network in tf version 2.7 and above.

In [None]:
# Set random seed
tf.random.set_seed(42)
     
# 1. Create the model (this time 3 layers)
model_3 = tf.keras.Sequential([
  ## Before TensorFlow 2.7.0
  # tf.keras.layers.Dense(100), # add 100 dense neurons
     
  ## After TensorFlow 2.7.0
  tf.keras.layers.Dense(100, input_shape=(None, 1)), # <- define input_shape here
  tf.keras.layers.Dense(10), # add another layer with 10 neurons
  tf.keras.layers.Dense(1)
    ])
     
# 2. Compile the model
model_3.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=tf.keras.optimizers.Adam(), # use Adam instead of SGD
                metrics=['accuracy'])



In [None]:
# Set random seed
tf.random.set_seed(42)
     
# Create some regression data
X_regression = np.arange(0, 1000, 5)
y_regression = np.arange(100, 1100, 5)
     
# Split it into training and test sets
X_reg_train = X_regression[:150]
X_reg_test = X_regression[150:]
y_reg_train = y_regression[:150]
y_reg_test = y_regression[150:]
     
# Fit our model to the data
     
    ## Note: Before TensorFlow 2.7.0, this line would work
    # model_3.fit(X_reg_train, y_reg_train, epochs=100) # <- this will error in TensorFlow 2.7.0+
     
## After TensorFlow 2.7.0
model_3.fit(tf.expand_dims(X_reg_train, axis=-1), # <- expand input dimensions
            y_reg_train,
            epochs=100)

In [None]:
# Set random seed
tf.random.set_seed(42)

# 1. Create the model
model_4 = tf.keras.Sequential([
  tf.keras.layers.Dense(100),
  tf.keras.layers.Dense(10),
  tf.keras.layers.Dense(1)                             
])

# 2. Compile the model, this time with a regression-specific loss function
model_4.compile(loss=tf.keras.losses.mae,
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["mae"])

# 3. Fit the model
model_4.fit(tf.expand_dims(X_reg_train, axis=-1), y_reg_train, epochs=100)

In [None]:
# Make predictions with our trained model
y_reg_preds = model_4.predict(X_reg_test)

# Plot the model's predictions against our regression data
plt.figure(figsize=(10, 7))
plt.scatter(X_reg_train, y_reg_train, c="b", label="Training data")
plt.scatter(X_reg_test, y_reg_test, c="g", label="Test data")
plt.scatter(X_reg_test, y_reg_preds, c="r", label="predictions")
plt.legend();

## The missing piece: Non-linearity is one of the most important concetps in neural network.
\

In [None]:
# Set randim seed
tf.random.set_seed(42)

model_5 = tf.keras.Sequential([
  tf.keras.layers.Dense(1, activation=tf.keras.activations.linear)                             
])

# 2. Compile the model
model_5.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(lr=0.001),
                metrics=['accuracy'])

# 3 Fit the model
history = model_5.fit(tf.expand_dims(X, axis=-1), y, epochs=100)


In [None]:
# Check out our data
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdYlBu)

In [None]:
#Check the decision boundary for our latest model
plot_decision_boundary(model=model_5,
                       X=X,
                       y=y)

Let's try build our first neural network with a non-linear activation function


In [None]:
# Set random seed
tf.random.set_seed(42)

# 1. Build the model
model_6 = tf.keras.Sequential([
  tf.keras.layers.Dense(1, activation=tf.keras.activations.relu)                             
])

# 2. Compile the model
model_6.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(lr=0.001),
                metrics=["accuracy"])

# 3. Fit the model
history = model_6.fit(tf.expand_dims(X, axis=-1,), y, epochs=100)

In [None]:
# Time to replicate the multi-layer neural network hypothesized on Tensorflow playground

# Set the random seed
tf.random.set_seed(42)

# Create the model
model_7 = tf.keras.Sequential([
   tf.keras.layers.Dense(4, activation="relu"),
   tf.keras.layers.Dense(4, activation="relu"),
   tf.keras.layers.Dense(1)                         
])

# Compile the model
model_7.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(lr=0.001),
                metrics=["accuracy"])

# Fit the model
history = model_7.fit(tf.expand_dims(X, axis=-1,), y, epochs=100)

In [None]:
# Evaluate the model
model_7.evaluate(X,y)

In [None]:
# How do our model_7 prediction look?
plot_decision_boundary(model_7, X, y)

In [None]:
# Build model_8 with and output layers that has a sigmoid activation finction
# When solving binary classification problems, there is no need to expand the dimensions else you'll run into a shape error.

# Set random seed
tf.random.set_seed(42)

# Create a model
model_8 = tf.keras.Sequential([
  tf.keras.layers.Dense(4, activation=tf.keras.activations.relu), # hidden layer 1, ReLU activation
  tf.keras.layers.Dense(4, activation=tf.keras.activations.relu), # hidden layer 2, ReLU activation
  tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid) # ouput layer, sigmoid activation
])

# Compile the model
model_8.compile(loss=tf.keras.losses.binary_crossentropy,
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['accuracy'])

# Fit the model
history = model_8.fit(X, y, epochs=100, verbose=1)



In [None]:
X.shape

In [None]:
model_8.summary()

What is wrong with our model? Why is it returning a TypeError saying 'NoneType' object is not callable. Are we really evaluating our model correctly? What data did the model learn on and what data did the model predict on?


**Note:** The combination of **linear (straight lines) and non-linear (non-straight lines) functions** is one of the key fundamentals of neural networks.



In [None]:
# Set random seed
tf.random.set_seed(42) # For reproduceability

# 1. Create the model (this time 3 layers)
model_3 = tf.keras.Sequential([
  tf.keras.layers.Dense(100), # add 100 dense neurons
  tf.keras.layers.Dense(10), # add another layer with 10 neurons
  tf.keras.layers.Dense(1)
])

# 2. Compile the model
model_3.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=tf.keras.optimizers.Adam(), # use Adam instead of SGD
                metrics=['accuracy'])

## Create data
# Set random seed
tf.random.set_seed(42)

# Create some regression data
X_regression = np.arange(0, 1000, 5)
y_regression = np.arange(100, 1100, 5)

# Split it into training and test sets
X_reg_train = X_regression[:150]
X_reg_test = X_regression[150:]
y_reg_train = y_regression[:150]
y_reg_test = y_regression[150:]

# Fit our model to the data
# -> Note: Before TensorFlow 2.7.0, this line would work <- 
model_3.fit(tf.expand_dims(X_reg_train, axis=-1),y_reg_train, epochs=100)

https://github.com/mrdbourke/tensorflow-deep-learning/discussions/278

The combination on of linear andf non linear functions is one of the key funbdamantals of neural networks.

In [None]:
# Create a toy tensor (similar to the data we pass into the model)
A = tf.cast(tf.range(-10, 10), tf.float32)
A

In [None]:
plt.plot(A)

In [None]:
# Sigmoid - https://www.tensorflow.org/api_docs/python/tf/keras/activations/sigmoid

def sigmoid(x):
  return 1 / (1 + tf.exp(-x))

# Use the sigmoid function in ouyr tensor
sigmoid(A)

In [None]:
# Plot sigmoid modified tensor
plt.plot(sigmoid(A))

In [None]:
# Relu - https://www.tensorflow.org/apo_docs/python/tf/keras/activations/relu

def relu(x):
  return tf.maximum(0, x)

# Pass toy tensor through ReLu function
relu(A)

In [None]:
plt.plot(relu(A))

In [None]:
# Linear - https://www.tensorflow.org/api_docs/python/tf/kerasactivations/linear (return inpot non-modified)
tf.keras.activations.linear(A)

In [None]:
A == tf.keras.activations.linear(A)

The model dosen't learn anything when using linear activation finction

# Evaluating and improving our classification model

web been training and evaluating  our dataset on thye same sample. Now, let's split our dataset into training and test set 

In [None]:
# Firstly let's check the total num,be of training example we have 
len(X)

In [None]:
# Split data into train and test set
X_train, y_train = X[:800], y[:800] # 80% of the data for thge training set
X_test, y_test = X[800:], y[800:] # 205 of the data for test set

# Check the shape of the data 
X_train.shape, X_test.shape # 300 examples in the training set, 200 examples in the test set 

In [None]:
# Excellent, we've split our dataset into train and test set. Now let's see how the model will oerform when evaluating on the test set

In [None]:
# Create a model with 2 hidden layers of 4 neurons each and a sigmoid output layer
tf.random.set_seed(42)

# Create the model
model_9 = tf.keras.Sequential([
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(1, activation="sigmoid")
])

# Compile the model
model_9.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
                metrics=['accuracy'])

# Fit the model
model_9.fit(X_train, y_train, epochs=25)

In [None]:
# Evaluate our model on the test set
loss, accuracy = model_9.evaluate(X_test, y_test)
print(f"Model loss on the test set: {loss}")
print(f"Model accuracy on the test set: {100*accuracy:.2f}%")

In [None]:
# Plot decision boundary for the training and test set 
plt.figure(figsize=(12,6))
plt.subplot(1, 2, 1)
plt.title("Train")
plot_decision_boundary(model_9, X=X_train, y=y_train)
plt.subplot(1, 2, 2)
plot_decision_boundary(model_9, X=X_train, y=y_train)
plt.show

The history variable that holds our model.fit() finction contains information on how our model learns

In [None]:
# You can access the infomatio in the history variable using the .history attribute
pd.DataFrame(history.history)

In [None]:
#Plot the loss curve
pd.DataFrame(history.history).plot()
plt.title("Model_9 training curve")

The idea plot we are looking for when dealing with a classification problem is:
* Loss going down 
* Accuracy going up

When the loss decreases it means the model is improving (the predictions it is making is getting clossert to the grand truth label)

# Finding the best learning rate

Aside from the architecture(the layers, number of neurons, activations, etc), the most important hyperparameter you can tune for your neural network models is the **learning rate**

**Learning rate callback**
Think of a callback as an extra piece of functionality you can add to your m,odel while its is training.

It's a  good pratice to try the default learning rate first before tweaking 

In [None]:
# Set random seed
tf.random.set_seed(42)

# Create the model 
model_10 = tf.keras.Sequential([
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(1, activation="sigmoid")
])

# Compile the model 
model_10.compile(loss="binary_crossentropy",
                 optimizer="Adam",
                 metrics=['accuracy'])

# Create a learning rate scheduler callback
lr_scheduler = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-4 * 10**(epoch/20))

# Fit the model
history = model_10.fit(X_train,
                       y_train,
                       epochs=100,
                       callbacks=[lr_scheduler])

In [None]:
pd.DataFrame(history.history).plot(figsize=(10,7), xlabel="epochs");

In [None]:
# Plot the learning rate versus the loss
lrs = 1e-4 * (10 ** (np.arange(100)/20))
plt.figure(figsize=(10, 7))
plt.semilogx(lrs, history.history["loss"])
plt.xlabel("Learning Rate")
plt.ylabel("Loss")
plt.title("Learning rate vs loss");

To figurer ou theidea value of the learning rate (at least the idea value to begin training our model), the rule of thumb is to take the learning rate value where the loss is still decreasing but not totally flattened out (usually about 10x smaller than the bottom of the curve).plot_decision_boundary}
The idea learning rate to start of model training is somewhere just before the loss curve bottoms out(a value where the loss is still decreasing)

In [None]:
# Examples of other typical learning rate values
10**0, 10**-1, 10**-2, 10**-3, 1e-4

In [None]:
# Let's create a new model with a learning rate of 0.02

# Set random seed 
tf.random.set_seed(42)

# Create the model 
model_11 = tf.keras.Sequential([
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(1, activation="sigmoid")                                       
])

# Compile the model
model_11.compile(loss="binary_crossentropy",
                 optimizer=tf.keras.optimizers.Adam(learning_rate=0.02),
                 metrics=["accuracy"])

# Fit the model
history = model_11.fit(X_train, 
             y_train,
             epochs=20)

In [None]:
# Plot the decision boundary for the training and test set 
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("Train")
plot_decision_boundary(model_11, X=X_train, y=y_train)
plt.subplot(1, 2, 2)
plt.title("Test")
plot_decision_boundary(model_11, X=X_test, y=y_test)
plt.show()

In [None]:
# Let's check the accuracy of our model
loss, accuracy = model_11.evaluate(X_test, y_test)
print(f"model loss test set: {loss}")
print(f"model accuracy on test set: {(accuracy*100):.2f}%")

In [None]:
# We can make a confussion matrix using a confussion matrix method
from sklearn.metrics import confusion_matrix

# Make a prediction
y_preds = model_11.predict(X_test)

# Create confussion matrix
confusion_matrix(y_test, y_preds)

Our predictions are not in the format they need to be


In [None]:
# Let's view the first 10 predictions
y_preds[:10]

Our y_pred is a prediction probability fomart. One of the set you will often see after making a prediction with a neural network is converting the prediction probability into labels.



In [None]:
# Lets's view the first 10 labels
y_test[:10]

In [None]:
# Convert the prediction probability into label using tf.round() amnd view the first 10 rows
tf.round(y_preds)[:10]

In [None]:
# Now let's re-create our confussion matrix
confusion_matrix(y_test, tf.round(y_preds))

In [None]:
import itertools

figsize = (10, 10)

# Create the confussion matrix
cm = confusion_matrix(y_test, tf.round(y_preds))
cm_norm = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
n_classes = cm.shape[0]

# Let's prettify it
fig, ax = plt.subplots(figsize=figsize)
# Create a matrix plot 
cax = ax.matshow(cm, cmap=plt.cm.Blues)
fig.colorbar(cax)

# Create classes
classes = False

if classes:
  labels = classes
else:
  labels = np.arange(cm.shape[0])

# Label the axes
ax.set(title="Confussion Matrix",
       xlabel="Predicted label",
       ylabel="True label",
       xticks=np.arange(n_classes),
       yticks=np.arange(n_classes),
       xticklabels=labels,
       yticklabels=labels)

# Set x-axis label to bottom
ax.xaxis.set_label_position("bottom")
ax.xaxis.tick_bottom()

# Adjust label size 
ax.xaxis.label.set_size(20)
ax.yaxis.label.set_size(20)
ax.title.set_size(20)

# Set threshold for different colors
threshold = (cm.max() + cm.min()) / 2

# Plot the text on each cell
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
  plt.text(j, i, f"{cm[i, j]} ({cm_norm[i, j]*100:.1f}%)",
           horizontalalignment="center",
           color="white" if cm[i, j] > threshold else "black",
           size=15)

In [None]:
# What does intertools.product do? It combines two things into each combination
import itertools 
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
  print(1, j)

# Multiclass Classification
Multiclass classification is predit one out of many given examples.

Everything we've learnt so far is applicable to multiclass classification.

Let's import som data from Tensorflow dataset module (tf.keras.datasets)


In [None]:
import tensorflow as tf 
from tensorflow.keras.datasets import fashion_mnist

# The data has already been sorted into training and test set for us 
(train_data, train_labels), (test_data, test_labels) = fashion_mnist.load_data()

In [None]:
# Show the first training example
print(f"Training sample:\n{train_data[0]}\n")
print(f"Training label: {train_labels[0]}")

In [None]:
# Check the shape of our data
train_data.shape, train_labels.shape, test_data.shape, test_data.shape

In [None]:
# Plot a single example
import matplotlib.pyplot as plt
plt.imshow(train_data[7])

In [None]:
# Check our sample label
train_labels[7]

In [None]:
# Let's create a small list of class name 
class_name = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
              'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankel boot']

# How many classes are there (this'll be our output shape)?
len(class_name)

###Erro1

In [None]:
# Plot an example image and its label
plt.imshow(train_data[17], cmap=plt.cm.binary) # change the color to black and white
plt.title(class_name[train_labels[17]]) 

### Error 2

In [None]:
# Plot multiple random images of fashion MNIST
import random
plt.figure(figsize=(7, 7))
for i in range(4):
  ax = plt.subplot(2, 2, i + 1)
  rand_index = random.choice(range(len(train_data)))
  plt.imshow(train_data[rand_index], cmap=plt.cm.binary)
  plt.title(class_name[train_labels[rand_index]])
  plt.axis(False)

let's build a model to figure out the relatonship between the pixel values and their labels

In [None]:
# Set random seed 
tf.random.set_seed(42)

# Create the model
model_14 = tf.keras.Sequential([
  tf.keras.layers.Flatten(input_shape=(28,28)),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(10, activation="softmax")
])

# Compile the model
model_14.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                 optimizer=tf.keras.optimizers.Adam(),
                 metrics=['accuracy'])

# Fit the model
non_norm_history = model_14.fit(train_data,
                                train_labels,
                                epochs=10,
                                validation_data=(test_data, test_labels))

In [None]:
model_14.summary()

### Our model did not perform well probably because we did not normalize our dataset 

In [None]:
# Let's check the min and max value of our dataset
train_data.min(), train_data.max()

### Scalling/ Normalizing our data.

We can scale or noprmalize our data by simply dividing the entire array by the maximum

In [None]:
# Divide train and test images by maximum value (normalize it)
train_data = train_data / 255.0
test_data = test_data / 255.0

# Check the min and max values of the training data
train_data.min(), train_data.max()

In [None]:
# Now, let us retrain our data wit the normalized train and test data

# Set random seed 
tf.random.set_seed(42)

# Create the model
model_15 = tf.keras.Sequential([
  tf.keras.layers.Flatten(input_shape=(28,28)),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(10, activation="softmax")                                
])

# Compile the model
model_15.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                 optimizer=tf.keras.optimizers.Adam(),
                 metrics=['accuracy'])

# Fit the model (to the normalized data)
norm_history = model_15.fit(train_data,
                                train_labels,
                                epochs=10,
                                validation_data=(test_data, test_labels))

When comparing models make sure you aree cpmparing them on the same criterias (e.g same archetecture but different data or same data but different architecture)

In [None]:
# Lets plut each model history with loss curves
# non-normalized data loss curve
import pandas as pd
pd.DataFrame(non_norm_history.history).plot(title="Non-normalized Data")
# Plot normalized data loss curves
pd.DataFrame(norm_history.history).plot(title="Normalized data");

In [None]:
# Set random seed 
tf.random.set_seed(42)

# Create the model
model_16 = tf.keras.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(10, activation="softmax")                                
])

# Compile the model
model_16.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                 optimizer=tf.keras.optimizers.Adam(),
                 metrics=["accuracy"])

# Create the learning rate callback
lr_scheduler = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-3 * 10**(epoch/20))

# Fit the model
find_lr_history = model_16.fit(train_data,
                               train_labels,
                               epochs=40,
                               validation_data=(test_data, test_labels),
                               callbacks=[lr_scheduler])


In [None]:
# Plot the learning rate decay curve
import numpy as np
import matplotlib.pyplot as plt
lrs = 1e-3 * (10**(np.arange(40)/20))
plt.semilogx(lrs, find_lr_history.history["loss"])
plt.xlabel("Learning Rate")
plt.ylabel("loss")
plt.title("Finding the ideal learning rate");

In [None]:
10**-2


In [None]:
# Lets refit our model using the ideal learning rate
tf.random.set_seed(42)

# Create the model
model_17 = tf.keras.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(10, activation="softmax")                               
])

# Compile the model
model_17.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                 optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
                 metrics=["accuracy"])

# Fit the model
history = model_14.fit(train_data,
                       train_labels,
                       epochs=20,
                       validation_data=(test_data, test_labels))

In [None]:


# Note: The following confusion matrix code is a remix of Scikit-Learn's 
# plot_confusion_matrix function - https://scikit-learn.org/stable/modules/generated/sklearn.metrics.plot_confusion_matrix.html
# and Made with ML's introductory notebook - https://github.com/GokuMohandas/MadeWithML/blob/main/notebooks/08_Neural_Networks.ipynb
import itertools
from sklearn.metrics import confusion_matrix

# Our function needs a different name to sklearn's plot_confusion_matrix
def make_confusion_matrix(y_true, y_pred, classes=None, figsize=(10, 10), text_size=15): 
  """Makes a labelled confusion matrix comparing predictions and ground truth labels.

  If classes is passed, confusion matrix will be labelled, if not, integer class values
  will be used.

  Args:
    y_true: Array of truth labels (must be same shape as y_pred).
    y_pred: Array of predicted labels (must be same shape as y_true).
    classes: Array of class labels (e.g. string form). If `None`, integer labels are used.
    figsize: Size of output figure (default=(10, 10)).
    text_size: Size of output figure text (default=15).
  
  Returns:
    A labelled confusion matrix plot comparing y_true and y_pred.

  Example usage:
    make_confusion_matrix(y_true=test_labels, # ground truth test labels
                          y_pred=y_preds, # predicted labels
                          classes=class_names, # array of class label names
                          figsize=(15, 15),
                          text_size=10)
  """  
  # Create the confustion matrix
  cm = confusion_matrix(y_true, y_pred)
  cm_norm = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis] # normalize it
  n_classes = cm.shape[0] # find the number of classes we're dealing with

  # Plot the figure and make it pretty
  fig, ax = plt.subplots(figsize=figsize)
  cax = ax.matshow(cm, cmap=plt.cm.Blues) # colors will represent how 'correct' a class is, darker == better
  fig.colorbar(cax)

  # Are there a list of classes?
  if classes:
    labels = classes
  else:
    labels = np.arange(cm.shape[0])
  
  # Label the axes
  ax.set(title="Confusion Matrix",
         xlabel="Predicted label",
         ylabel="True label",
         xticks=np.arange(n_classes), # create enough axis slots for each class
         yticks=np.arange(n_classes), 
         xticklabels=labels, # axes will labeled with class names (if they exist) or ints
         yticklabels=labels)
  
  # Make x-axis labels appear on bottom
  ax.xaxis.set_label_position("bottom")
  ax.xaxis.tick_bottom()

  # Set the threshold for different colors
  threshold = (cm.max() + cm.min()) / 2.

  # Plot the text on each cell
  for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
    plt.text(j, i, f"{cm[i, j]} ({cm_norm[i, j]*100:.1f}%)",
             horizontalalignment="center",
             color="white" if cm[i, j] > threshold else "black",
             size=text_size)


In [None]:
# Make predictions with the most recent model
y_probs = model_17.predict(test_data)

# View the first 5 predictions
y_probs[:5]

In [None]:
# See the predicted class number and label for the first example
y_probs[0].argmax(), class_name[y_probs[0].argmax()]

In [None]:
# Convert all of the predictions from probabilities to labels
y_preds = y_probs.argmax(axis=1)

# View the first 10 prediction labels
y_preds[:10]

In [None]:
# Check out the non-prettified confussion matrix
from sklearn.metrics import confusion_matrix
confusion_matrix(y_true=test_labels,
                  y_pred=y_preds)

In [None]:


# Make a prettier confusion matrix
make_confusion_matrix(y_true=test_labels, 
                      y_pred=y_preds,
                      classes=class_name,
                      figsize=(15, 15),
                      text_size=10)


In [None]:
import random

# Create a function for plotting a random image along with its prediction
def plot_random_image(model, images, true_labels, classes):
  """Picks a random image, plots it and labels it with a predicted and truth label.

  Args:
    model: a trained model (trained on data similar to what's in images).
    images: a set of random images (in tensor form).
    true_labels: array of ground truth labels for images.
    classes: array of class names for images.
  
  Returns:
    A plot of a random image from `images` with a predicted class label from `model`
    as well as the truth class label from `true_labels`.
  """ 
  # Setup random integer
  i = random.randint(0, len(images))
  
  # Create predictions and targets
  target_image = images[i]
  pred_probs = model.predict(target_image.reshape(1, 28, 28)) # have to reshape to get into right size for model
  pred_label = classes[pred_probs.argmax()]
  true_label = classes[true_labels[i]]

  # Plot the target image
  plt.imshow(target_image, cmap=plt.cm.binary)

  # Change the color of the titles depending on if the prediction is right or wrong
  if pred_label == true_label:
    color = "green"
  else:
    color = "red"

  # Add xlabel information (prediction/true label)
  plt.xlabel("Pred: {} {:2.0f}% (True: {})".format(pred_label,
                                                   100*tf.reduce_max(pred_probs),
                                                   true_label),
             color=color) # set the color to green or red

In [None]:
# Check out a random image as well as its prediction
plot_random_image(model=model_14, 
                  images=test_data, 
                  true_labels=test_labels, 
                  classes=class_name)

In [None]:
# Find layers our of our most recent model
model_17.layers

In [None]:
# We can extract a particular by indexing
model_17.layers[1]

In [None]:


# Get the patterns of a layer in our network
weights, biases = model_14.layers[1].get_weights()

# Shape = 1 weight matrix the size of our input data (28x28) per neuron (4)
weights, weights.shape


In [None]:
model_17.summary("")

In [None]:
from tensorflow.keras.utils import plot_model

# See the inputs and outputs of each layer
plot_model(model_17, show_shapes=True)