<img src="graphics/header.png" width="75%"/>

# **Deep Learning**
---

<center><img src="graphics/artificial_neural_network.png" width = "30%"/></center>

In this module, we will introduce the principles and fundamental mechanics of **deep learning**, the subfield of AI associated with artificial neural networks (ANNs) and which is driving the current generative AI boom.


After this module, learners will be able to:
1. Describe the biological inspiration for artificial neural networks (ANN).
2. Explain at a basic level how the Perceptron model (one of the simplest ANNs) processes input data.
3. Identify the important role of activation functions in ANNs.
3. Demonstrate your understanding of the Keras API by creating a simple neural network.

### **🚀 Let's get started.**

# Biological Neurons
The architecture of artificial neural networks was originally inspired by the structural anatomy of a biological neuron.  In a living brain, billions of neurons are connected together, forming a dense and complex network.

<figure><img src="graphics/nn.png" />
    <figcaption style="text-align:center;font-weight:bold;">
        (a) Biological neurons and (b) their artificial counterparts.
    </figcaption>
</figure>

### **Biological neurons...**
* Use electrical and chemical signals to pass information between different regions of the brain.
* Receive information through their dendrites and cell body.
* Transmit eletrical signals down their axons which triggers the release of neurotransmitters through their axon terminals.

In the context of ANNs, these processes are comparable to the flow of data, information, and error signals resulting from data samples being passed to the network's input layer.

One key difference: the electrical signals sent down biologcial axons do **not** vary in magnitude.

# Perceptron (Artificial Neuron)

A perceptron is a type of artificial neuron or computational unit used in artificial neural networks. A perceptron - also known as a *node* in deep learning - contains two functions, a net input function that sums up incoming inputs and an **activation** function. 

### **Perceptrons...**
* Are one of the simplest ANNs.
* Were inspired by biological neurons.
* Receive information through the lines (weights) displayed to the left.
* Apply an activation function to their received signal and output the result.

The numerical signals that are output from ANNs **do** vary in magnitude!

<center><img  src="graphics/perceptron.png" alt='biological neuron'/></center>
<center><a href="https://wiki.pathmind.com/neural-network">Image source.</a></center>

# Activation Functions
The choice of activation function is important as it can have a major impact on how well your model trains. Also, the non-linear transformation applied by activation functions allows networks to model a wide variety of non-linear problems.

### **Activation functions..**
* Are simple functions that transforms input values.
* Are used in ANNs to map neuron inputs to neuron outputs (also fittingly referred to as neuron *activations*).

<center><img width="500" src="graphics/activation_functions.webp"/></center>
<center><a href="https://machine-learning.paperspace.com/wiki/activation-function">Image source.</a></center>

> **✏️ Exercise:** Given a single neuron with input **x**, write a simple **linear activation** function using two lines of code.

In [1]:
# Write code to replicate a simple linear activation function (technically requires just two lines of code).


> **✏️ Exercise:** Given a single neuron with input **x**, write a simple **ReLU activation** function using two lines of code. **Hint:** we can use the built-in Python function `max`. When you pass two numbers `a` and `b` into the max function as `max(a,b)`, the function will return the larger value. *Examples:* <code>max(-2, 3) = 3</code> or <code>max(1.5, 1.1) = 1.5</code>

In [2]:
# Write your code for a simple ReLU activation function in this code block.


> **🤔 Question:** Is the ReLU a **linear** or **nonlinear** activation function?

In [3]:
# Explain why the ReLU activation function is either linear or nonlinear.


# Artificial Neural Networks (ANNs)
An artificial neural network (ANN) is comprised of artificial neurons connected together in layers. 

ANNs are...
* A subset of machine learning that can be used to model a wide range of problems.
* Composed of a collection of interconnected connected nodes that accept an input and produce an output.
* A fundamental component of deep learning.

<center><img src="graphics/neural_network_im.jpg" width="55%" alt="Neural Network" style="background:white;"/></center>

<center><a href="https://www.analyticsvidhya.com/blog/2016/08/evolution-core-concepts-deep-learning-neural-networks">Image source</a></center>

# Training a Neural Network

Training a neural network is the process of teaching the model to perform a specific task or learn a particular pattern from a given set of training data. It involves adjusting the model's internal parameters or weights to minimize the difference between its predictions and the desired outputs.

Assuming that training data has been prepared, the training process includes the following steps:

1. **Forward Propagation:** The training data is fed into the model, and its input features are multiplied by the weights and passed through activation functions in each layer. 
2. **Loss (Error) Calculation:** The loss function calculates the network's total error, the difference between the model's predictions and the actual target values.
3. **Backpropagation:** The gradients of the loss function with respect to the model's parameters are computed using the chain rule of calculus. 
4. **Parameter / Weight Update:**  The model's parameters are updated using optimization algorithms like stochastic gradient descent (SGD) or its variants.
5. **Iterate:** Repeat the first four steps for each training run or epoch.  Each time we run through the entire training dataset, we say that we trained for one *epoch*. 

# Creating an Artificial Neural Network (Multi-Layer Perceptron)
* An important part of an AI research project is evaluating different models to identify which works best for a given dataset and task.
* Let's train a deep learning model on the dataset from our machine learning notebook.
* This time, we'll be using a deep learning (DL) model that uses an artificial neural network called the multi-layer perceptron (MLP).
* We will be using the user-friendly Python library `Keras` to build our MLP model.

A full description of DL or the MLP is beyond the scope of this lesson. We refer interested learners to [Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press](https://www.deeplearningbook.org).

> **🎗️Knowledge check:**
> * Artificial neural networks were initially motivated by biological inspiration, but currently ANNs only loosely resemble the human brain.
> * A DL model like the MLP is often defined in terms of **layers** (one *input layer*, one or more *hidden layers*, and one *output layer*). The more hidden layers that are added to a model, the "deeper" it gets.
> * Each hidden layer is designed to transform the data from the layer before.
> * The final layer learns to predict an outcome based on all of these transformations.

<center><img src="graphics/ann.jpg" alt="ANN" width="400" height="400"></center>

First, let's load our dataset and split it into training and testing sets. (You likely saw this in a previous module, so we'll skip the explanation).

Just to change things up a bit, let's use **70% of the dataset to train** our model and **30% of the dataset to test** our model. (Last time we used *80%* for training and *20%* for testing.)

We will accomplish this in the `train_test_split` function by changing the `test_size` parameter to be equal to `0.3`, representing **30%**.

In [4]:
import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv('https://www.dropbox.com/scl/fi/zzsy0dlbwn8vqk9vaqaar/data_processed.csv?rlkey=05u63ywmemb3ubu9kday4p23g&dl=1')
X = df.iloc[:, 1:-1].values
y = df.iloc[:, -1].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

Let's create our first MLP model with the following specification:
* An input layer with `47` units (one per input variable)
* `1` hidden layer with `128` hidden units
* An additional `1` hidden layer with `64` units
* An output layer with `1` unit (for predicting 0 or 1 corresponding to our AKI outcome)

In [5]:
from tensorflow.keras import layers
from tensorflow.keras import Sequential

model = Sequential()
# Add the first hidden layer with 128 neurons (and implicitly create the input layer by specifying the "input_dim")
model.add(layers.Dense(units=128, input_dim=47, activation='relu'))

# Add the second hidden layer with 64 neurons
model.add(layers.Dense(units=64, activation='relu'))

# Add the output layer
model.add(layers.Dense(units=1, activation='sigmoid'))

# Compile the model for a classification problem such as ours.
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

2024-04-21 00:32:25.575291: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2024-04-21 00:32:25.575330: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2024-04-21 00:32:25.575691: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


### 🕵️ A Deeper Dive

Each hidden layer of our neural network will be created using the **Dense** class from Keras. For each layer, we must define the number of hidden units (also known as neurons). There are several optional arguments we may also pass, which can be viewed in the [Keras documentation page](https://keras.io/api/layers/core_layers/dense/). We can add many layers to our deep learning model using the .add() function of the Sequential class. You can think of a Sequential container as a list of hidden layers.

For the first layer of our neural network, we must tell Keras how many variables to expect in each input vector. From our previous data exploration, we know that each patient is defined by `47` different variables, so the input dimension to our network is `47`.

One reason why deep learning models are so powerful is their ability to model complex variable interactions through nonlinear activation functions. We have several choices for activation function. In our example, we will use the commonly chosen Rectified Linear Unit activation (ReLU).

Once we are satisfied with the hidden layers of our model, we need to add an output layer for generating class predictions. Our output layer will also be a Dense layer, but it will only have a single (1) unit. Instead of ReLU, we will use a sigmoid activation function, which is typically chosen for binary classification problems such as ours. Using a sigmoid activation on our output layer allows us to interpret the output as a prediction probability. In other words, the probability that a given input vector belongs to class 1.

Now that we have defined the architecture of our neural network, we will use the .compile() function to build it. In our example we are defining a few arguments that are associated with the training of our model:
* We are using a binary cross-entropy loss. This is an appropriate choise for binary classification.
* We will be using the Adam optimizer, which is a popular version of stochastic gradient descent (SGD).
* For this example, we are interested in our model's prediction accuracy, so we'll tell Keras to use the "accuracy" metric. 

# Training our neural network

Now it's time to train our prediction model! We will train (or, "fit") the model using our training dataset that we developed in a previous module.
* We will use the one-line function `.fit()` to train our entire deep learning model.
* We will specify some additional parameters to be used during the training process:
    * We will tell Keras to train the model for `10` epochs.
    * We will use a batch size of `64` samples. During each epoch, the model will pass in `64` samples at a time.
    * We will use a random `30%` of the training dataset as our **validation set** (different from the test set) for computing metrics while training.

In [6]:
model.fit(X_train, y_train, epochs=10, batch_size=64, validation_split= 0.3)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f55081dc730>

### **Done!🎉**
Before we celebrate too much, let's check the performance of our trained model on the test set that we already set aside.

**The model has never seen this particular data**, so our trained model's performance on the test set can provide us with an idea of how well the model might perform in the future (i.e., ***generalizability to unseen data***, one of the fundamental goals of machine learning).

We will use our model's `.evaluate()` function to compute the loss, as well as any metrics that we defined when compiling our model.

Since we told Keras to use `accuracy` when we compiled the model, we will also see the model's accuracy on the test data.

In [7]:
scores = model.evaluate(X_test, y_test)



> **✏️ Exercise:** Develop an MLP with `3` hidden layers (instead of 2), with the following number of neurons in each hidden layer:
> * Hidden layer 1: `512` neurons
> * Hidden layer 2: `256` neurons
> * Hidden layer 3: `128` neurons

> Create, compile, train, and evaluate this new MLP model.

In [8]:
# Code it! (feel free to re-use most of the code from before)


> **✏️ Exercise:** Create an MLP with `8` hidden layers (instead of 3), with **any number** of neurons in each hidden layer (get creative!). Code, compile, train, and evaluate this new MLP model.

In [9]:
# Code your own personal multilayer perceptron (MLP) in this code block.


### **🏆 Bonus: TensorFlow Neural Network Playground**
<center><img src="graphics/tensorflow_logo.png" /></center>

Interactively design, visualize, and analyze a custom deep learning network inside your own browser with the [TensorFlow Neural Network Playground](https://playground.tensorflow.org/).

---
> 🔗 Portions of this module are based on concepts from the University of Florida [Practicum AI](https://practicumai.org) course *Fundamentals of Deep Learning*.
---