## Multilayer Perceptron: Fit and evaluate a model

In this section, we will fit and evaluate a simple Multilayer Perceptron model.

A multilayer perceptron is a classic feed-forward artificial neural network, the core componet of deep learning. Or, it is a connected series of nodes, where each node represents a function or a model

When to use it?
- categorical or continuous target variable
- very complex relationships or performance is the only thing that matters
- when control over the training process is very important (a lot of parameters to tune)

When not to use it?
- image recognition (CNN), time series, etc
- transparency is important or interested in significance of predictors
- need a quick benchmark model
- limited data available

In [5]:
from sklearn.neural_network import MLPRegressor, MLPClassifier
print(MLPRegressor().get_params())
print(MLPClassifier().get_params())

{'activation': 'relu', 'alpha': 0.0001, 'batch_size': 'auto', 'beta_1': 0.9, 'beta_2': 0.999, 'early_stopping': False, 'epsilon': 1e-08, 'hidden_layer_sizes': (100,), 'learning_rate': 'constant', 'learning_rate_init': 0.001, 'max_fun': 15000, 'max_iter': 200, 'momentum': 0.9, 'n_iter_no_change': 10, 'nesterovs_momentum': True, 'power_t': 0.5, 'random_state': None, 'shuffle': True, 'solver': 'adam', 'tol': 0.0001, 'validation_fraction': 0.1, 'verbose': False, 'warm_start': False}
{'activation': 'relu', 'alpha': 0.0001, 'batch_size': 'auto', 'beta_1': 0.9, 'beta_2': 0.999, 'early_stopping': False, 'epsilon': 1e-08, 'hidden_layer_sizes': (100,), 'learning_rate': 'constant', 'learning_rate_init': 0.001, 'max_fun': 15000, 'max_iter': 200, 'momentum': 0.9, 'n_iter_no_change': 10, 'nesterovs_momentum': True, 'power_t': 0.5, 'random_state': None, 'shuffle': True, 'solver': 'adam', 'tol': 0.0001, 'validation_fraction': 0.1, 'verbose': False, 'warm_start': False}


hidden_layer_sizes determines how many hidden layers there will be and how many nodes in each layer. Controls the complexity of the relationships of the model that can capture (tradeoff between overfitting and training time)

activation function dictates the type of nonlinearity that is introduced to the model

learning rate facilitates both how quickly and whether or not the algorithm will find the optimal solution

### Read in Data

In [1]:
import joblib
import pandas as pd
from sklearn.model_selection import GridSearchCV
from sklearn.neural_network import MLPClassifier
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings('ignore', category=DeprecationWarning)

In [2]:
tr_features = pd.read_csv('data/train_features.csv')
tr_labels = pd.read_csv('data/train_labels.csv')

### Hyperparameter tuning

![hidden layer](image/hidden_layers.png)

In [6]:
def print_results(results):
    print('BEST PARAMS: {}\n'.format(results.best_params_))

    means = results.cv_results_['mean_test_score']
    stds = results.cv_results_['std_test_score']
    for mean, std, params in zip(means, stds, results.cv_results_['params']):
        print('{} (+/-{}) for {}'.format(round(mean, 3), round(std * 2, 3), params))

In [7]:
mlp = MLPClassifier()
parameters = {
    'hidden_layer_sizes': [(10,),(50,),(100,)],  #(node,# of layers)
    'activation': ['relu','tanh','logistic'],
    'learning_rate': ['constant','invscaling','adaptive']  #invscaleing gradually decrease lr; # adaptive keeps lr constant unless loss is not decreasing
}
cv = GridSearchCV(mlp, parameters, cv=5)
cv.fit(tr_features, tr_labels.values.ravel())

print_results(cv)









BEST PARAMS: {'activation': 'relu', 'hidden_layer_sizes': (100,), 'learning_rate': 'adaptive'}

0.717 (+/-0.08) for {'activation': 'relu', 'hidden_layer_sizes': (10,), 'learning_rate': 'constant'}
0.729 (+/-0.108) for {'activation': 'relu', 'hidden_layer_sizes': (10,), 'learning_rate': 'invscaling'}
0.717 (+/-0.072) for {'activation': 'relu', 'hidden_layer_sizes': (10,), 'learning_rate': 'adaptive'}
0.796 (+/-0.127) for {'activation': 'relu', 'hidden_layer_sizes': (50,), 'learning_rate': 'constant'}
0.783 (+/-0.109) for {'activation': 'relu', 'hidden_layer_sizes': (50,), 'learning_rate': 'invscaling'}
0.772 (+/-0.152) for {'activation': 'relu', 'hidden_layer_sizes': (50,), 'learning_rate': 'adaptive'}
0.792 (+/-0.103) for {'activation': 'relu', 'hidden_layer_sizes': (100,), 'learning_rate': 'constant'}
0.789 (+/-0.118) for {'activation': 'relu', 'hidden_layer_sizes': (100,), 'learning_rate': 'invscaling'}
0.803 (+/-0.116) for {'activation': 'relu', 'hidden_layer_sizes': (100,), 'learni



In [8]:
cv.best_estimator_

MLPClassifier(learning_rate='adaptive')

### Write out pickled model

In [9]:
joblib.dump(cv.best_estimator_, 'model/MLP_model.pkl')

['MLP_model.pkl']