# Neural Net Analysis Notebook
## W207 Final Project
### T. P. Goter
### July 6, 2019

This workbook is used to assess various models created as part of the Facial Keypoint Detection project for W207.

In [14]:
# Import the packages we need
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
import numpy as np
import ipywidgets as widgets
from ipywidgets import interact, interact_manual

In [9]:
# Load the pkled dataframe for the baseline single layer neural net
bl_sl_df = pd.read_pickle("OutputData/single_layer_df.pkl")
bl_sl_df.sample(10)

Unnamed: 0,loss,mean_squared_error,val_loss,val_mean_squared_error,epoch,RMSE,val_RMSE,times,hunits,activation,optimizer,lrate
396,77.049095,77.049088,111.200488,111.200485,396,8.777761,10.545164,0.474865,150,relu,adam,0.001
76,2525.411537,2525.411377,2520.734831,2520.734863,76,50.253471,50.20692,0.320276,100,tanh,sgd,0.01
305,2525.412289,2525.412354,2520.73562,2520.735596,305,50.253481,50.206928,0.527155,100,tanh,adagrad,0.01
244,2525.41177,2525.411377,2520.735104,2520.735107,244,50.253471,50.206923,0.319467,100,sigmoid,sgd,0.01
306,2427.31786,2427.318359,2424.645512,2424.645752,306,49.267823,49.240692,0.448901,200,relu,sgd,0.01
261,174.564472,174.564453,198.644428,198.64444,261,13.212284,14.094128,0.673629,150,relu,adagrad,0.01
100,2525.412145,2525.411865,2520.735503,2520.735596,100,50.253476,50.206928,0.316765,100,sigmoid,sgd,0.01
238,2619.589233,2619.589111,2614.870316,2614.870605,238,51.181922,51.135806,0.294078,100,relu,sgd,0.01
186,2525.411534,2525.411865,2520.734893,2520.735107,186,50.253476,50.206923,0.355427,100,tanh,sgd,0.01
101,89.210516,89.210503,117.918552,117.918556,101,9.445131,10.859031,0.446759,150,relu,adam,0.001


In [28]:
# Create a plotting function to pass to the interact widget function
def plot_validation_loss(optimizer = bl_sl_df.optimizer.unique(), 
                    activation = bl_sl_df.activation.unique()):
    
    # Subset the baseline df by the specified optimizer and activation
    sub_df = bl_sl_df[bl_sl_df.optimizer.str.match(optimizer)]
    sub_df = sub_df[sub_df.activation.str.match(activation)]
     
    # Group the neural net data by optimizer and activation
    groups = sub_df.groupby(['hunits'])
    fig, axes = plt.subplots(1, 2, figsize=(15, 10))
    axes = axes.flatten()
    
    # Loop over the grouped data and plot out epoch timing and validation loss data
    for name, group in groups:
        axes[0].scatter(group.epoch, group.val_RMSE, label=str(name)+' Validation Loss')
    #     axes[0].scatter(group.epoch, group.RMSE, label=' '.join(name)+' Training Loss')
        axes[1].scatter(group.epoch, group.times*1000, label=str(name)+' Fit Time')
        axes[0].set_xlabel('Epoch')
        axes[0].set_ylabel('Root Mean Square Error')
        axes[1].set_xlabel('Epoch')
        axes[1].set_ylabel('Fit Time (milliseconds)')
        axes[0].set_ylim([0,sub_df.val_RMSE.max()])
        axes[1].set_ylim([0,1000])
        axes[0].legend()
        axes[1].legend()
        axes[0].set_title("{} Optimizer and {} Activation".format(group.optimizer.unique(), group.activation.unique()))
    
    # Adjust the spacing of the subplots
    fig.subplots_adjust(left=0.03, right=0.97, hspace=0.1, wspace=0.15)

    # Add an overarching title for these plots
    fig.suptitle("Performance Comparison for Single Layer, Fully Connected Neural Nets",
                 fontsize=18, y=0.93)

#     # Print out the table of data for viewing
#     print(sub_df)
interact_manual(plot_validation_loss)
print()

interactive(children=(Dropdown(description='optimizer', options=('adam', 'sgd', 'nadam', 'adagrad'), value='ad…




### Assessment of Baseline Results
1. Adam and Adagrad Optimizers are working well. 
2. Adam is faster and works well with 200 hidden units
3. Adagrad is slower buts works best with 100 hidden units.

In the evaluation above, both the hidden layer and the output layer used the activation function specified by the user. For the study below, the activation function of the output layer was set to softmax which is a multinomial classifier version of the sigmoid function. The plots below help to assess if the choise of activation function for the output layer significant alters are perception of which activation function and optimizers work well for our neural network.