# Getting started with Deep Learning

Tutors: Fabian Eitel (Fabian.Eitel@charite.de) and Talia Kimber

# 1. Aims of this session

Get a rough idea of how artifical neural networks (ANNs) work, how an implementation in Keras looks like and how suitable they are for tabular data.

# Learning goals


## Theory

* Building blocks of ANNs
* Model training

## Practical

* Learn to understand the basics using the Tensorflow playground
* Learn to read a model defintion in Python using Keras
* Run a pipeline of an ANN on the ADNI tabular data
* Investigate what filters learn at different layers

# References

* Stanford Course on Deep Learning http://cs231n.github.io/

## Theory


### Building blocks of artificial neural networks
Showing some of the blocks that can be used when training neural networks and some widely used examples.

__Layer types__
* Fully connected/linear/dense layers
* Convolutional layers
* Pooling layers and other down/upsampling layers
* Utility layers like input and output layers
* Batch normalization

__Activation types__
* Sigmoid
* Linear
* Tanh
* ReLU
* Leaky ReLU and other variants

__Regularizers__
* L1 regularization (used in LASSO)
* L2 regularization / almost the same as weight decay (used in Ridge regression)
* Dropout
* Early stopping

__Data functions__
* Normalization (e.g. using mean and standard deviation)
* Data augmentation
* Feature reduction (e.g. Principal Component Analysis [PCA])

__Cost functions__
Cost functions depend on your type of analysis, i.e. regression, binary classification, multi-class classification etc.
* Softmax
* Cross-entropy
* Binary cross-entropy
* Kullback-Leibler Divergence
* Smooth losses
* Mean-squared error

For more information on each topic view the course link in the references.

# 2. Playground excersises

__Introduction__


https://playground.tensorflow.org

Tensorflow playground is a neural network framework you can use in your browser. Unlike the name says its not based on the popular Tensorflow program. It allows to get some intuition on neural network workings.

Note: The links provided here always contain the right settings for the exercise. Always use the corresponding link.

__2.1 Exercise__

Use the XOR dataset with 1 hidden layer and try out different activation functions:


https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=xor&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4&seed=0.82689&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false

__2.2 Exercise__

What happens if you add more features one by one?
Start with X12. Maybe you can add extra layers and neurons too.

https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=xor&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4&seed=0.82689&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=true&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false

__2.3 Exercise__

Let's try a different dataset. Investigate the effects of the learning rate on the training results:

https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.001&regularizationRate=0&noise=0&networkShape=4,2&seed=0.19504&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false

Real data is never this clean, it is usually were noisy. Now, use the same model from above and add some noise to the data distribution (middle slider on the bottom left). How does it affect the data (you see it on the right) and your model performance?

__2.4 Exercise__

After you have added the noise, try out L1 and L2 regularization. What does it do?

__2.5 Exercise__

In this example the the network fluctuates a lot for some time. What would you change to reduce that effect?

https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=xor&regDataset=reg-plane&learningRate=0.3&regularizationRate=0&noise=25&networkShape=4,2&seed=0.84469&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false

__2.6 Exercise__

Use everything you have learned so far on the more challenging spiral data:

https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=spiral&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=25&networkShape=4,2&seed=0.07992&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false

Here is an example:

https://playground.tensorflow.org/#activation=relu&regularization=L2&batchSize=10&dataset=spiral&regDataset=reg-plane&learningRate=0.03&regularizationRate=0.03&noise=25&networkShape=5,4,2&seed=0.16124&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=true&xSquared=true&ySquared=true&cosX=false&sinX=true&cosY=false&sinY=true&collectStats=false&problem=classification&initZero=false&hideText=false

***First break***

# 3. Practical part

### Preparation

In [None]:
!pip install sklearn --upgrade

In [None]:
# Import required packages
import numpy as np
import pandas as pd

from sklearn.svm import SVC, LinearSVC
from sklearn.metrics import balanced_accuracy_score

In [None]:
import sklearn
sklearn.__version__

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
# Load data table
df = pd.read_csv("data/alzheimers_disease_rand.csv")
# Print first 5 rows
df.head()

In [None]:
# Using only year 1 after follow up (month 12) and removing all MCI subjects
df = df[df.VISCODE == "m12"]
df = df[df.DX != "MCI"]
# remove errors from labels
df = df[df.DX != "1600371"][df.DX != "1846260"]

In [None]:
df["DX"].value_counts()

In [None]:
# Remove rows containing N/A
df = df.dropna(subset=["Hippocampus", "DX", "Ventricles", "WholeBrain", "MMSE", "AGE", "PTGENDER"]) 

### Data splitting

In [None]:
# Get an array with the number of samples
indices = np.arange(len(df))
print("Order before shuffling: %s" % indices[:5])

# Shuffle that array
np.random.seed(42) # fix a seed so each random event can be repeated
np.random.shuffle(indices)
print("Order after shuffling: %s"  % indices[:5])

In [None]:
# Take first 80% as a training set
len_training = int(len(indices) * 0.8) # use int() function to remove decimals
print("Number of samples for training set: %i" % len_training)

# Select the first 80% indices
train_idx = indices[0:len_training] # pick 0 to the value of len_training from the indices array

In [None]:
# Take the remaining data and split it 50/50
remaining_samples = len(indices) - len_training
len_validation = int(np.ceil(remaining_samples/2)) # round up once
len_test = int(np.floor(remaining_samples/2)) # round down once

# Select from the indices array the individual groups
validation_idx = indices[len_training:len_training+len_validation]
test_idx = indices[len_training+len_validation:len(indices)]

In [None]:
print("Number of training samples: %i" % len(train_idx))
print("Number of validation samples: %i" % len(validation_idx))
print("Number of test samples: %i" % len(test_idx))
print("Total number of samples: %i" % (len(train_idx) + len(validation_idx) + len(test_idx)))

### Feature selection
We have to select relevant features for our model to use. Features, which are clinically relevant, should improve the performance of the model. Nevertheless, some features might decrease the performance of the model

Our pre-selected list of possible features is: ["Hippocampus", "Ventricles", "WholeBrain", "MMSE", "AGE", "PTGENDER"]
You can always come back to this section and try some different ones. Below is some code to use three of them.

In [None]:
X = df[["Hippocampus", "AGE", "Ventricles"]] # You can add or remove features here
#X.insert(column="SEX", value=(df["PTGENDER"]=="Male"), loc=2) # Categorical features like sex need to be translated into numerical values. You can uncomment this line to try it.

y = pd.get_dummies(df["DX"])["Dementia"]

X = X.reset_index(drop=True)
y = y.reset_index(drop=True)

# Normalize the data such that all features lay on the same range. We substract the mean of each column and divide by the standard deviation.
scaler = StandardScaler()
scaler.fit(X.loc[train_idx])

### Baseline

In [None]:
# We will start of with a simple Support Vector Machine/Classifier (SVM/SVC)
classifier = SVC(C=1, kernel='linear')

In [None]:
classifier.fit(X=scaler.transform(X.loc[train_idx]), y=y.loc[train_idx])
#classifier.fit(X=X.loc[train_idx], y=y.loc[train_idx])

In [None]:
# Training prediction
y_pred = classifier.predict(X=scaler.transform(X.loc[train_idx]))
print(balanced_accuracy_score(y_true=y.loc[train_idx], y_pred=y_pred))

# Validation prediction
y_pred = classifier.predict(X=scaler.transform(X.loc[validation_idx]))
print(balanced_accuracy_score(y_true=y.loc[validation_idx], y_pred=y_pred))

This is our baseline performance which we compare to. Normally you would try to optimize this as well first, but that is not the goal of todays lession.

### Artifical Neural Network

In [None]:
# Importing required packages for neural networks
from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D, Dropout
from keras.regularizers import l2
from keras.optimizers import Adam
from keras.models import load_model
from keras.callbacks import EarlyStopping

In [None]:
# Create a simple 2 layer neural network
model = Sequential()
model.add(Dense(units=8, activation='relu', input_dim=X.shape[1])) # Hidden layer
model.add(Dense(units=1, activation='sigmoid')) # Output layer
# Print the model 
model.summary()

### 3.1 Exercise
We want to classify between healthy controls and Alzheimer's patients. This makes two possible output classes i.e. labels 0 (HC) and 1 (AD). What kind of loss function would be suitable for this task?
In part 1 you can find different examples and in the Keras documentation you can find how to use them:
https://keras.io/losses/

Note that the program won't fail here yet if you choose the wrong one but only later.

In [None]:
# Compile the model with an optimizer, a metric and a loss function
learning_rate = 0.03
opti = Adam(lr=learning_rate)
model.compile(optimizer=opti, loss='@@@ TO DO @@@', metrics=['accuracy']) # replace the TO DO with the correct loss function

### 3.2 Exercise
Apply the neural network on our Alzheimer's data. Here we will train for only 5 epochs.

In [None]:
model.fit(x=scaler.transform(X.loc[train_idx]),
          y=y.loc[train_idx],
          epochs=5)

The result we obtained above is the training performance, next we will look at the validation performance.

In [None]:
# Predict on the validation set
y_pred_list = []
for idx, row in X.loc[validation_idx].iterrows():
    x = scaler.transform(row.values.reshape(1, -1)) # scale the data
    y_pred = model.predict(x)[0][0] # forward pass through the neural network
    y_pred_list.append(y_pred >= 0.5) # create a list and change the format
print(balanced_accuracy_score(y_true=y.loc[validation_idx], y_pred=y_pred_list)) # calculate the performance

***Second break***

As an intermission let's have a look at what the parameters in neural networks actually learn:

https://www.youtube.com/watch?v=AgkfIQ4IGaM

https://scs.ryerson.ca/~aharley/vis/conv/

### 3.3 Exercise

Now that we got everything running, let's try to optimize our model!

Let's train the network for more epochs and see how it changes our results.

First create a new instance of the model with un-trained parameters.

In [None]:
# Create a simple 2 layer neural network
model = Sequential()
model.add(Dense(units=8, activation='relu', input_dim=X.shape[1])) # Hidden layer
model.add(Dense(units=1, activation='sigmoid')) # Output layer
# Print the model 
model.summary()
# Compile the model with an optimizer, a metric and a loss function
learning_rate = 0.03
opti = Adam(lr=learning_rate)
model.compile(optimizer=opti, loss='binary_crossentropy', metrics=['accuracy']) # replace the TO DO with the correct loss function

In [None]:
model.fit(x=scaler.transform(X.loc[train_idx]),
          y=y.loc[train_idx],
          epochs=@@@ TODO @@@)

In [None]:
# Predict on the validation set
y_pred_list = []
for idx, row in X.loc[validation_idx].iterrows():
    x = scaler.transform(row.values.reshape(1, -1)) # scale the data
    y_pred = model.predict(x)[0][0] # forward pass through the neural network
    y_pred_list.append(y_pred >= 0.5) # create a list and change the format
print(balanced_accuracy_score(y_true=y.loc[validation_idx], y_pred=y_pred_list)) # calculate the performance

You can run the above 3 cells again each time you change the number of epochs. This way you can find a good hyperparameter without writing new lines of code.

What happens when you run it for a very long time, like 100 epochs? Run the above code again to find out.

### 3.4 Exercise

Make the network wider by increasing the number of units in the hidden layer. You will need to re-compile the network again.

In [None]:
# Create a simple 2 layer neural network
model = Sequential()
model.add(Dense(units=@@@ TO DO @@@, activation='relu', input_dim=X.shape[1])) # Hidden layer
model.add(Dense(units=1, activation='sigmoid')) # Output layer
# Print the model 
model.summary()
# Compile the model with an optimizer, a metric and a loss function
learning_rate = 0.03
opti = Adam(lr=learning_rate)
model.compile(optimizer=opti, loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
model.fit(x=scaler.transform(X.loc[train_idx]),
          y=y.loc[train_idx],
          epochs=20)

In [None]:
# Predict on the validation set
y_pred_list = []
for idx, row in X.loc[validation_idx].iterrows():
    x = scaler.transform(row.values.reshape(1, -1)) # scale the data
    y_pred = model.predict(x)[0][0] # forward pass through the neural network
    y_pred_list.append(y_pred >= 0.5) # create a list and change the format
print(balanced_accuracy_score(y_true=y.loc[validation_idx], y_pred=y_pred_list)) # calculate the performance

Again you should re-run the 3 cells several times to try out different settings.

### 3.5 Exercise

This time lets, add more depth to our network by adding more layers.

In [None]:
# Create a simple 2 layer neural network
model = Sequential()
model.add(Dense(units=8, activation='relu', input_dim=X.shape[1])) # Note: input_dim needs to be set on the first layer only
@@@ TODO @@@ # replace this with another dense layer, which activation do you choose and how many units? You can also add more than one extra layer!
model.add(Dense(units=1, activation='sigmoid')) # Output layer
# Print the model 
model.summary()
# Compile the model with an optimizer, a metric and a loss function
learning_rate = 0.03
opti = Adam(lr=learning_rate)
model.compile(optimizer=opti, loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
model.fit(x=scaler.transform(X.loc[train_idx]),
          y=y.loc[train_idx],
          epochs=20)

In [None]:
# Predict on the validation set
y_pred_list = []
for idx, row in X.loc[validation_idx].iterrows():
    x = scaler.transform(row.values.reshape(1, -1)) # scale the data
    y_pred = model.predict(x)[0][0] # forward pass through the neural network
    y_pred_list.append(y_pred >= 0.5) # create a list and change the format
print(balanced_accuracy_score(y_true=y.loc[validation_idx], y_pred=y_pred_list)) # calculate the performance

Again you should re-run the 3 cells several times to try out different settings.

### 3.6 Extra exercise

Open exercise: What else could you change? The learning rate? Can you add some regularization? Different layers with different activations? Try what comes to your mind and get some inspiration from the documentation.

Hint: for regularization lookt at https://keras.io/regularizers/

In [None]:
# Create a simple 2 layer neural network
model = Sequential()
model.add(Dense(units=8, activation='relu', input_dim=X.shape[1])) # Hidden layer, input_dim needs to be set on the first layer only
model.add(Dense(units=1, activation='sigmoid')) # Output layer
# Print the model 
model.summary()
# Compile the model with an optimizer, a metric and a loss function
learning_rate = 0.03
opti = Adam(lr=learning_rate)
model.compile(optimizer=opti, loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
model.fit(x=scaler.transform(X.loc[train_idx]),
          y=y.loc[train_idx],
          epochs=20)

In [None]:
# Predict on the validation set
y_pred_list = []
for idx, row in X.loc[validation_idx].iterrows():
    x = scaler.transform(row.values.reshape(1, -1)) # scale the data
    y_pred = model.predict(x)[0][0] # forward pass through the neural network
    y_pred_list.append(y_pred >= 0.5) # create a list and change the format
print(balanced_accuracy_score(y_true=y.loc[validation_idx], y_pred=y_pred_list)) # calculate the performance

## Before we go..

How does the model perform on the test dataset? In the end you perform this computation only once, with the best model you have found on the training and validation set. Run those cells again which gave you the best model. That way our _model_ variable will be the best one.

In [None]:
# Predict on the test set
y_pred_list = []
for idx, row in X.loc[test_idx].iterrows():
    x = scaler.transform(row.values.reshape(1, -1)) # scale the data
    y_pred = model.predict(x)[0][0] # forward pass through the neural network
    y_pred_list.append(y_pred >= 0.5) # create a list and change the format
print(balanced_accuracy_score(y_true=y.loc[test_idx], y_pred=y_pred_list)) # calculate the performance