<a href="https://colab.research.google.com/github/yyx462/ML/blob/main/Intro_to_NNs_in_Keras_(Binary_classification).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%matplotlib inline

# Introduction to Deep Learning with Keras and TensorFlow
**MATH498: Foundations of ML (University of Michigan)**

**Modified notebook from Daniel Moser (UT Southwestern Medical Center), to use keras v.2.8.0 and some further exploration**

## Prerequisite Python Modules

First, some software needs to be loaded into the Python environment.

In [None]:
import numpy as np                   # advanced math library
import matplotlib.pyplot as plt      # MATLAB like plotting routines
import random                        # for generating random numbers

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

from keras.models import Sequential  # Model type to be used
from keras.layers.core import Dense, Dropout, Activation # Types of layers to be used in our model
from keras.utils import np_utils                         # NumPy related tools

## Loading Training Data

In [None]:
# The Wisconsin Breast Cancer data dataset (https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(diagnostic))
X, y = load_breast_cancer(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

In [None]:
nb_classes = 2 # number of unique digits

Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

In [None]:
# The Sequential model is a linear stack of layers and is very common.
model = Sequential()

## The first hidden layer

In [None]:
# The first hidden layer is a set of 64 nodes (artificial neurons).
# Each node will receive an element from each input vector and apply some weight and bias to it.

# An "activation" is a non-linear function applied to the output of the layer above.
# It checks the new value of the node, and decides whether that artifical neuron has fired.
# The Rectified Linear Unit (ReLU) converts all negative inputs to nodes in the next layer to be zero.
# Those inputs are then not considered to be fired.
# Positive values of a node are unchanged.

model.add(Dense(64, input_shape=(30,), activation='relu'))

In [None]:
# Dropout zeroes a selection of random outputs (i.e., disables their activation)
# Dropout helps protect the model from memorizing or "overfitting" the training data.
model.add(Dropout(0.2))

## Adding the second hidden layer

In [None]:
# The second hidden layer appears identical to our first layer.
# However, instead of each of the 64-node receiving 30-inputs from the input image data,
# they receive 64 inputs from the output of the first 64-node layer.
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.2))

## The Final Output Layer

In [None]:
# The final layer of 2 neurons in fully-connected to the previous 64-node layer.
# The final layer of a FCN should be equal to the number of desired classes (10 in this case).

# The "softmax" activation represents a probability distribution over K different possible outcomes.
# Its values are all non-negative and sum to 1.
model.add(Dense(nb_classes, activation='softmax'))

In [None]:
# Summarize the built model

model.summary()

## Compiling the model

Keras is built on top of Theano and TensorFlow. Both packages allow you to define a *computation graph* in Python, which then compiles and runs efficiently on the CPU or GPU without the overhead of the Python interpreter.

Our predictions are probability distributions across the two possible outcomes (e.g. "we're 80% confident this is 1, 20% sure it's a 0"), and the target is a probability distribution with 100% for the correct category, and 0 for everything else. The cross-entropy is a measure of how different your predicted distribution is from the target distribution. [More detail at Wikipedia](https://en.wikipedia.org/wiki/Cross_entropy)

In [None]:
# Let's use the Adam optimizer for learning
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

## Train the model!
This is the fun part! 

The batch size determines over how much data per step is used to compute the loss function, gradients, and back propagation. Large batch sizes allow the network to complete it's training faster; however, there are other factors beyond training speed to consider.

Too large of a batch size smoothes the local minima of the loss function, causing the optimizer to settle in one because it thinks it found the global minimum.

Too small of a batch size creates a very noisy loss function, and the optimizer may never find the global minimum.

So a good batch size may take some trial and error to find!

In [None]:
history = model.fit(X_train, Y_train,
          batch_size=32, epochs=20, validation_split=0.1,
          verbose=1)

In [None]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()

The two numbers, in order, represent the value of the loss function of the network on the training set, and the overall accuracy of the network on the training data. But how does it do on data it did not train on?

## Evaluate Model's Accuracy on Test Data

In [None]:
score = model.evaluate(X_test, Y_test)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

## Experimenting with different settings:
- number of epochs
- different architectures (i.e. number of hidden layers, activation function, width of layers)
- different initialisation
- different optimisers

In [None]:
# The Sequential model is a linear stack of layers and is very common.
# Define the architecture

# https://keras.io/api/layers/core_layers/dense/

new_model = Sequential()
new_model.add(Dense(64, input_shape=(30,), activation='relu',kernel_initializer='glorot_uniform',
    bias_initializer='zeros'),) # input layer must have the right input shape
new_model.add(Dropout(0.2))

new_model.add(Dense(64, activation='relu',kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))
new_model.add(Dropout(0.2))

new_model.add(Dense(nb_classes, activation='softmax',kernel_initializer='glorot_uniform',
    bias_initializer='zeros')) # output layer must have the right output shape

# Choose how to optimize the model
new_model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# this is also where stopping_criteria can be added

# Choose how to train the model
new_model.fit(X_train, Y_train,
          batch_size=32, epochs=20, validation_split=0.1,
          verbose=1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f1ec666f6d0>