# Keras

## Validation

As mentioned in the previous lesson, it is good practice to set aside a validation set, which is then used during hyperparameter tuning. Afterwards, when you have decided upon a final model, the test set can then be used to determine an unbiased perforance of the model.

In [2]:
X_train_final, X_val, y_train_final, y_val = train_test_split(X_train, y_train, test_size=1000, random_state=42)

NameError: name 'train_test_split' is not defined

In [None]:
from keras.util import to_categorical()

In [None]:
from keras import models
from keras import layers
from keras import optimizers

model = models.Sequential()

model.add(layers.Dense(units, activation, input_shape))

The Dense() class indicates that this layer will be fully connected

the input_shape parameter is often optional. That is, in successive layers, Keras implies the required shape of the layer to be added based on the shape of the previous layer.

In [None]:
model.compile(optimizer=optimizers.RMSprop(lr=0.001),
              loss='mse',
              metrics=['accuracy'])

In [None]:
history = model.fit(x_train,
                    y_train,
                    epochs=20,
                    batch_size=512,
                    validation_data=(x_val, y_val))

we can retrieve further information regarding how the model training progressed from epoch to epoch. To do this, you can access the .history attribute of the returned object. Given our variable naming above, we would thus have:


In [None]:

history.history

This will return a dictionary of the metrics we indicated when compiling the model. By default, the loss criteria will always be included as well. So in our example, this dictionary will have two keys, one for the loss, and one for the accuracy. If you wish to plot learning curves for the loss or accuracy versus the epochs, you can then simply retrieve these lists. For example:

history.history['loss']

In [None]:
y_hat = model.predict(x)

## Visualizing loss 

In [None]:
# Modify acc and val_acc with proper metric

def visualize_training_results(results):
    history = results.history
    plt.figure()
    plt.plot(history['val_loss'])
    plt.plot(history['loss'])
    plt.legend(['val_loss', 'loss'])
    plt.title('Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.show()
    
    plt.figure()
    plt.plot(history['val_acc'])
    plt.plot(history['acc'])
    plt.legend(['val_acc', 'acc'])
    plt.title('Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.show()

## Early Stoping

Overfitting neural networks is something you want to avoid at all costs. However, it's not possible to know in advance how many epochs you need to train your model on, and running the model multiple times with varying number of epochs maybe helpful, but is a time-consuming process.

- Import `EarlyStopping` and `ModelCheckpoint` from `keras.callbacks` 
- Define a list, `early_stopping`: 
  - Monitor `'val_loss'` and continue training for 10 epochs before stopping 
  - Save the best model while monitoring `'val_loss'` 
 
> If you need help, consult [documentation](https://keras.io/callbacks/).  

In [3]:
from keras.callbacks import EarlyStopping, ModelCheckpoint

# Define the callbacks
early_stopping = [EarlyStopping(monitor='val_loss', patience=10), 
                  ModelCheckpoint(filepath='best_model.h5', monitor='val_loss', save_best_only=True)]

Using TensorFlow backend.


# Tuning

## Rules of Thumb Regarding Bias / Variance 


| High Bias? (training performance) | High variance? (validation performance)  |
|---------------|-------------|
| Use a bigger network|    More data     |
| Train longer | Regularization   |
| Look for other existing NN architextures |Look for other existing NN architextures | 

## Regularization 

Use regularization when the model overfits to the data.

### Dropout

In Keras, you specify Dropout using the Dropout layer, which is applied to input and hidden layers. The Dropout layers requires one argument, rate, which specifies the fraction of units to drop, usually between 0.2 and 0.5.

In [None]:
model = models.Sequential()
model.add(layers.Dense(5, activation='relu', input_shape=(500,)))
# Dropout applied to the input layer
model.add(layers.Dropout(0.3))
model.add(layers.Dense(5, activation='relu'))
# Dropout applied to the hidden layer
model.add(layers.Dropout(0.3))
model.add(layers.Dense(1, activation='sigmoid'))

### L2

Keras makes L2 regularization easy. Simply add the `kernel_regularizer=keras.regularizers.l2(lambda_coeff)` parameter to any model layer. The `lambda_coeff` parameter determines the strength of the regularization you wish to perform. 

In [None]:
# Add the input and first hidden layer
L2_model.add(layers.Dense(50, activation='relu', kernel_regularizer=regularizers.l2(0.005), input_shape=(2000,)))

# Add another hidden layer
L2_model.add(layers.Dense(25, kernel_regularizer=regularizers.l2(0.005), activation='relu'))

### l1

In [None]:
# Add the input and first hidden layer
L1_model.add(layers.Dense(50, activation='relu', kernel_regularizer=regularizers.l1(0.005), input_shape=(2000,)))

# Add a hidden layer
L1_model.add(layers.Dense(25, kernel_regularizer=regularizers.l1(0.005), activation='relu'))

## Initialization Startegies

### He Normal

In the cell below, sepcify the following in the first hidden layer:

100 units
'relu' activation
input_shape
kernel_initializer='he_normal'

In [None]:
he_model.add(
    layers.Dense(
        100, 
        kernel_initializer='he_normal', 
        activation='relu', 
        input_shape=(n_features,)))

### Lecun Init

100 units
'relu' activation
input_shape
kernel_initializer='lecun_normal'

## Optimization Strategies

### RMSprop

Compile the rmsprop_model with:

'rmsprop' as the optimizer
track 'mse' as the loss and metric

### Adam

Compile the adam_model with:

'Adam' as the optimizer

# Keras Wrapper

In [None]:
KerasRegressor(create_regularized_model,  
                                 epochs=150, 
                                 batch_size=256, 
                                 verbose=0)