# Python

## Introduction to Deep Learning in Python

#### 4. Fine-tuning keras models

#### Changing optimization parameters

- Import SGD from tensorflow.keras.optimizers.
- Create a list of learning rates to try optimizing with called lr_to_test. The learning rates in it should be .000001, 0.01, and 1.
- Using a for loop to iterate over lr_to_test:
    - Use the get_new_model() function to build a new, unoptimized model.
    - Create an optimizer called my_optimizer using the SGD() constructor with keyword argument lr=lr.
    - Compile your model. Set the optimizer parameter to be the SGD object you created above, and because this is a classification problem, use 'categorical_crossentropy' for the loss parameter.
    - Fit your model using the predictors and target.

In [None]:
# Import the SGD optimizer
from tensorflow.keras.optimizers import SGD

# Create list of learning rates: lr_to_test
lr_to_test = [0.000001, 0.01, 1]

# Loop over learning rates
for lr in lr_to_test:
    print("\n\nTesting model with learning rate: %f\n" % lr)

    # Build new model to test, unaffected by previous models
    model = get_new_model()

    # Create SGD optimizer with specified learning rate: my_optimizer
    my_optimizer = SGD(lr=lr)

    # Compile the model
    model.compile(optimizer=my_optimizer, loss="categorical_crossentropy")

    # Fit the model
    model.fit(predictors, target)

#### Evaluating model accuracy on validation dataset

- Compile your model using 'adam' as the optimizer and 'categorical_crossentropy' for the loss. To see what fraction of predictions are correct (the accuracy) in each epoch, specify the additional keyword argument metrics=['accuracy'] in model.compile().
- Fit the model using the predictors and target. Create a validation split of 30% (or 0.3). This will be reported in each epoch.

In [None]:
# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]
input_shape = (n_cols,)

# Specify the model
model = Sequential()
model.add(Dense(100, activation="relu", input_shape=input_shape))
model.add(Dense(100, activation="relu"))
model.add(Dense(2, activation="softmax"))

# Compile the model
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

# Fit the model
hist = model.fit(predictors, target, validation_split=0.3)

#### Early stopping: Optimizing the optimization

- Import EarlyStopping from tensorflow.keras.callbacks.
- Compile the model, once again using 'adam' as the optimizer, 'categorical_crossentropy' as the loss function, and metrics=['accuracy'] to see the accuracy at each epoch.
- Create an EarlyStopping object called early_stopping_monitor. Stop optimization when the validation loss hasn't improved for 2 epochs by specifying the patience parameter of EarlyStopping() to be 2.
- Fit the model using the predictors and target. Specify the number of epochs to be 30 and use a validation split of 0.3. In addition, pass [early_stopping_monitor] to the callbacks parameter.

In [None]:
# Import EarlyStopping
from tensorflow.keras.callbacks import EarlyStopping

# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]
input_shape = (n_cols,)

# Specify the model
model = Sequential()
model.add(Dense(100, activation="relu", input_shape=input_shape))
model.add(Dense(100, activation="relu"))
model.add(Dense(2, activation="softmax"))

# Compile the model
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

# Define early_stopping_monitor
early_stopping_monitor = EarlyStopping(patience=2)

# Fit the model
model.fit(predictors, target, epochs=30, validation_split=0.3, callbacks=[early_stopping_monitor])

#### Experimenting with wider networks

- Create model_2 to replicate model_1, but use 100 nodes instead of 10 for the first two Dense layers you add with the 'relu' activation. Use 2 nodes for the Dense output layer with 'softmax' as the activation.
- Compile model_2 as you have done with previous models: Using 'adam' as the optimizer, 'categorical_crossentropy' for the loss, and metrics=['accuracy'].
- Hit 'Submit Answer' to fit both the models and visualize which one gives better results! Notice the keyword argument verbose=False in model.fit(): This prints out fewer updates, since you'll be evaluating the models graphically instead of through text.


In [None]:
# Define early_stopping_monitor
early_stopping_monitor = EarlyStopping(patience=2)

# Create the new model: model_2
model_2 = Sequential()

# Add the first and second layers
model_2.add(Dense(100, activation="relu", input_shape=input_shape))
model_2.add(Dense(100, activation="relu"))

# Add the output layer
model_2.add(Dense(2, activation="softmax"))

# Compile model_2
model_2.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

# Fit model_1
model_1_training = model_1.fit(
    predictors,
    target,
    epochs=15,
    validation_split=0.2,
    callbacks=[early_stopping_monitor],
    verbose=False,
)

# Fit model_2
model_2_training = model_2.fit(
    predictors,
    target,
    epochs=15,
    validation_split=0.2,
    callbacks=[early_stopping_monitor],
    verbose=False,
)

# Create the plot
plt.plot(model_1_training.history["val_loss"], "r", model_2_training.history["val_loss"], "b")
plt.xlabel("Epochs")
plt.ylabel("Validation score")
plt.show()

#### Adding layers to a network

- Specify a model called model_2 that is like model_1, but which has 3 hidden layers of 10 units instead of only 1 hidden layer.
    - Use input_shape to specify the input shape in the first hidden layer.
    - Use 'relu' activation for the 3 hidden layers and 'softmax' for the output layer, which should have 2 units.
- Compile model_2 as you have done with previous models: Using 'adam' as the optimizer, 'categorical_crossentropy' for the loss, and metrics=['accuracy'].
- Hit 'Submit Answer' to fit both the models and visualize which one gives better results!

In [None]:
# The input shape to use in the first hidden layer
input_shape = (n_cols,)

# Create the new model: model_2
model_2 = Sequential()

# Add the first, second, and third hidden layers
model_2.add(Dense(10, activation="relu", input_shape=input_shape))
model_2.add(Dense(10, activation="relu"))
model_2.add(Dense(10, activation="relu"))

# Add the output layer
model_2.add(Dense(2, activation="softmax"))

# Compile model_2
model_2.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

# Fit model 1
model_1_training = model_1.fit(predictors, target, epochs=15, validation_split=0.4, verbose=False)

# Fit model 2
model_2_training = model_2.fit(predictors, target, epochs=15, validation_split=0.4, verbose=False)

# Create the plot
plt.plot(model_1_training.history["val_loss"], "r", model_2_training.history["val_loss"], "b")
plt.xlabel("Epochs")
plt.ylabel("Validation score")
plt.show()

#### Building your own digit recognition model

- Create a Sequential object to start your model. Call this model.
- Add the first Dense hidden layer of 50 units to your model with 'relu' activation. For this data, the input_shape is (784,).
- Add a second Dense hidden layer with 50 units and a 'relu' activation function.
- Add the output layer. Your activation function should be 'softmax', and the number of nodes in this layer should be the same as the number of possible outputs in this case: 10.
- Compile model as you have done with previous models: Using 'adam' as the optimizer, 'categorical_crossentropy' for the loss, and metrics=['accuracy'].
- Fit the model using X and y using a validation_split of 0.3 and 10 epochs.

In [None]:
# Create the model: model
model = Sequential()

# Add the first hidden layer
model.add(Dense(50, activation="relu", input_shape=(784,)))

# Add the second hidden layer
model.add(Dense(50, activation="relu"))

# Add the output layer
model.add(Dense(10, activation="softmax"))

# Compile the model
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

# Fit the model
model.fit(X, y, epochs=10, validation_split=0.3)