# Deep Learning with Keras I (intro)
- Part II: [Deep Learning w Keras II (mnist data, mlp)](https://github.com/tm1611/Deep-Learning/blob/master/Deep%20Learning%20w%20Keras%20II%20(mnist%20data%2C%20mlp).ipynb)
- Part III: [Deep Learning w Keras III (student admissions, mlp)](https://github.com/tm1611/Deep-Learning/blob/master/Deep%20Learning%20w%20Keras%20III%20(student%20admissions%2C%20mlp)%20.ipynb)
- Part IV: [Deep Learning w Keras IV (imdb, mlp)](https://github.com/tm1611/Deep-Learning/blob/master/Deep%20Learning%20w%20Keras%20IV%20(imdb%2C%20mlp).ipynb)



Some ressources:
- Deep learning and neural networks:
 - [Neural network playlist (3Blue1Brown)](https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi)
 - [Book: Deep Learning (Goodfellow, Bengio, Courville)](https://www.deeplearningbook.org/) or as [pdf](https://github.com/janishar/mit-deep-learning-book-pdf)
- Keras
 - [Keras examples](https://github.com/keras-team/keras/tree/master/examples)
 - [Getting Started with Keras Sequential Model](https://keras.io/getting-started/sequential-model-guide/)
 
More links at the relevant point in the notebook.

# 1. Introduction
This notebook demonstrates the use of the Deep Learning framework **Keras**, followed by an example of how to apply this to a dataset.

### `Sequential` model: 
- [Sequential](https://keras.io/models/sequential/) 
- Linear stack of layers which is used to create a sequential model by passing a list of layer instances. 

Example: 
```python
from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential([
    Dense(32, input_shape=(784,)),
    Activation('relu'),
    Dense(10),
    Activation('softmax'),
]) 
```
Here, we define a model with 784 inputs $x_1, x_2, ..., x_{784}$, leading to a hidden layer of 32 nodes and `relu`activation function. The output layer consists of 10 nodes (e.g. for ten categories) with a `softmax` activation function. The input shape of the first layer has to be specified but the following layer adapts to the previous one. Then again, the output shape has to be specified. An equivalent way to specify the model is to use the `.add()` method, where layers are added to the model using `model.add(Dense(32, input_dim = 784))` and so forth. 

### Compilation using `.compile()`
Before training a model, we need to configure how the model is supposed to learn the function. Therefore, three arguments are required, namely `optimizer`, `loss function`, and `metric`: 
1. The `optimizer` (eg. SGD, RMSprop, Adagrad)
 - Optimization algorithms are necessary to minimize (respectively maximize) our objective function (loss function) and to learn the parameters of the model. 
 -  [Keras optimizers](https://keras.io/optimizers)
 - [How to train NN faster with optimizers?](https://towardsdatascience.com/how-to-train-neural-network-faster-with-optimizers-d297730b3713)
 - [Types of Optimization Algorithms used in NN ...](https://towardsdatascience.com/types-of-optimization-algorithms-used-in-neural-networks-and-ways-to-optimize-gradient-95ae5d39529f)
 
```python
# Import optimizers
from keras import optimizers

# specify keras model
model = ...

# define SGD optimizer
sgd = optimizers.SGD(lr = 0.01, decay = 1e-6, momentum = 0.9, nesterov = True)

# compile model
model.compile(loss = "mean_squared_error", optimizer = sgd)
```

2. A `loss function` is the function we want to minimize or maximize. It is used to evaluate a candidate solution (e.g. the current weights of the model). The optimizer minimizes or maximizes the loss of this function by tweaking the model parameters, leading finally to the parameters that are associated with the optimal loss. It is important that the chosen loss function represents our designated goals.      
 - [Keras loss functions](https://keras.io/losses)
 - [ML-Mastery: Loss and Loss Functions](https://machinelearningmastery.com/loss-and-loss-functions-for-training-deep-learning-neural-networks/)

```python
model.compile(loss = "mean_squared_error", optimizer = "sgd")

from keras import losses
model.compile(loss = losses.mean_squared_error, optimizer = "sgd")
```

3. A `metric` to judge the performance of the model. The loss function reduces all good/bad aspects of the complex system down to a single number. Unfortunately, this number doesn't tell us much about the actual performance of the model. Hence, we'd like to include a measure that tells us more about how good the model actually is. 
 - [Keras metrics](https://keras.io/metrics/)

#### Examples:`.compile()` 
```python
# Multi-class classification problem
model.compile(optimizer = "rmsprop",
              loss = "categorical_crossentropy",
              metrics = ["accuracy"])

# Binary classification problem
model.compile(optimizer = "rmsprop",
              loss = "binary_crossentropy",
              metrics = ["accuracy"])

# Mean squared error regression problem
model.compile(optimizer = "rmsprop",
              loss = "mse")
    
# Custom metrics #
# Import
import keras.backend as K

# Define own metric 
def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

# Compile model
model.compile(loss = "mean_squared_error",
              optimizer = "sgd",
              metrics = ["accuracy", mean_pred])
```



### Training model with `.fit()`
The last step is to fit the model to the data. Keras models use Numpy arrays of input data and labels (outcome variable y). In general, the `.fit()` function is used. This function trains the model for a given number of epochs (iterations on dataset). The documentation can be found [here](https://keras.io/models/sequential/). It is very flexible but its main arguments are:
- `x:` to specify training data
- `y:` to specify target data
- `batch_size:` Number of samples per gradient update. Default is 32. 
- `epochs:` Number of epochs to train the model. One epoch is one iteration over the entire x and y data.
- `shuffle:` Boolean to indicate whether to shuffle training data before each epoch (required if data are of some sort of order).
- Other arguments for more specific cases are available such as changing the verbosity mode, including validation data, etc.)



### Example Model I: Binary classification with dummy data.

In [1]:
# Imports
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras import utils

Using TensorFlow backend.


In [2]:
# Model
model = Sequential([
    Dense(32, input_dim = 100),
    Activation("relu"),
    Dense(1),
    Activation("sigmoid")
])

# Compile
model.compile(optimizer="rmsprop", 
              loss = "binary_crossentropy",
              metrics = ["accuracy"])

# Generate dummy data
data = np.random.random((1000, 100))
y = np.random.randint(2, size=(1000, 1))

# Train the model
model.fit(data, y, epochs = 5, batch_size = 32, shuffle = True)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x23b62fb3a90>

### Example Model II: Categorical classification with dummy data. 

In [3]:
# Model
model = Sequential([
    Dense(32, activation = "relu", input_dim = 100),
    Dense(10, activation = "softmax")
])

# Compile
model.compile(optimizer = "rmsprop",
             loss = "categorical_crossentropy",
             metrics = ["accuracy", "mse"])

# Dummy data
data = np.random.random((1000, 100))
y = np.random.randint(10, size = (1000, 1))

# y to categorical matrix
cat_y = utils.to_categorical(y, num_classes = 10)

# Fitting the model to data
model.fit(data, cat_y, epochs=10, batch_size=64, shuffle = True, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x23b642e0eb8>

**Notes**

Both models fit the model to the data and reaching a certain accuracy if we'd allow the model to run for many epochs. For example for around 100 epochs we reach an accuracy of more than 50%. 

However, this is due to the fact that the model "memorizes" these random data. Hence, the model may fit the training data well but this is in no way generalizable as we are dealing with random data. 

That's why it is important to include test data and cross-validation to ensure that we assess the fit of the model which the model did not see before (i.e. data that were not used for training the model). 

In [4]:
print("### The End ###")

### The End ###
