# Fit feedforward Neural Network model With Dask
This notebook takes the "Fit feedforward Neural Network model" notebook and parallelizes the processes using Dask.  
It will skip over explanation of code unrelated to Dask. Refer to the "Fit feedforward Neural Network model" notebook for more details on this notebook. 

First initialize the scheduler

In [1]:
from dask.distributed import Client
client = Client()
client

0,1
Client  Scheduler: tcp://127.0.0.1:58921  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 4  Cores: 4  Memory: 8.30 GB


non-Dask related imports for the notebook

In [2]:
import pandas as pd 
import matplotlib.pyplot as plt
import time
from sklearn.neural_network import MLPRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV

from besos import eppy_funcs as ef
import besos.sampling as sampling
from besos.problem import EPProblem
from besos.evaluator import EvaluatorEP
from besos.evaluator import EvaluatorSR
from besos.parameters import wwr, RangeParameter, FieldSelector, FilterSelector, GenericSelector, Parameter, expand_plist
from parameters import RangeParameter, CategoryParameter, expand_plist

from parameter_sets import parameter_set

The evaluator can be parallized by passing in `multi=True`

In [3]:
parameters = parameter_set(7)
problem = EPProblem(parameters, ['Electricity:Facility'])
building = ef.get_building()
evaluator = EvaluatorEP(problem, building, multi=True)

Attempting to fix automatically
  f"Duplicate names found. (duplicate, repetitions): "


When df_apply is called, the dataframe will be processed concurrently. By passing in the `processes` parameter you can define the number of paritions the dataframe will be divided into.  
If you are running this notebook locally, you can open the Dask dashboard. A link is provided by the `client` object (refer to the first cell in the notebook where we initialized `Client`). On the dashboard, you can see what processes are running.

In [4]:
%%time
inputs = sampling.dist_sampler(sampling.lhs, problem, 50)
outputs = evaluator.df_apply(inputs, processes=4)
inputs

Wall time: 1min 23s


Unnamed: 0,Wall conductivity,Attic thickness,U-Factor,Solar Heat Gain Coefficient,Watts per Zone Floor Area_0,Watts per Zone Floor Area_1,Window to Wall Ratio
0,0.051132,0.266778,2.809561,0.683247,14.828226,11.360826,0.301161
1,0.035927,0.292057,2.253828,0.818023,14.270736,10.817628,0.544483
2,0.094331,0.116558,0.584823,0.560659,11.45804,10.611235,0.767403
3,0.193546,0.270409,1.926207,0.022743,12.939408,14.44563,0.845932
4,0.026092,0.182738,3.817963,0.128653,11.62891,10.778318,0.645282
5,0.033387,0.204,0.707997,0.190904,12.329613,12.932895,0.622546
6,0.062989,0.25021,3.264538,0.544098,10.722829,14.120385,0.665856
7,0.116989,0.185351,1.723034,0.765261,10.48975,13.097458,0.177529
8,0.183523,0.1455,4.103736,0.850848,14.507152,14.336161,0.790558
9,0.070074,0.21524,3.983072,0.798461,11.106142,14.219954,0.119119


## Set up model parameters
In this cell, we setup the model. More detail can be found in the "Fit feedforward Neural Network model"  notebook

In [5]:
train_in, test_in, train_out, test_out = train_test_split(inputs, outputs, test_size=0.2)

scaler = StandardScaler()
inputs = scaler.fit_transform(X=train_in)

scaler_out = StandardScaler()
outputs = scaler_out.fit_transform(X=train_out)

hyperparameters = {'hidden_layer_sizes':((len(parameters)*16,),(len(parameters)*16, len(parameters)*16)), 
              'alpha':[1, 10, 10**3]}

neural_net = MLPRegressor(max_iter=1000, early_stopping=False)
folds = 3

## Model fitting with Dask

Here, we use the NN model from ScikitLearn.  
In a [different example](FitNNTF.ipynb) we use TensorFlow (with and without the Keras wrapper).

Below we parallelize the model fit.  
Normally, SciketLearn uses joblib to parallelize model fitting. By specifying the parrallel backend to be Dask, joblib switches over to using the Dask scheduler.  
For this example, using Dask may not be any faster. This is because joblib also has the ability to parrallelize accross cores. 
An example where this tool would be useful is when Dask is using a ditributed network with access to more cores.

In [6]:
%%time
import joblib
with joblib.parallel_backend('dask'):
    clf = GridSearchCV(neural_net, hyperparameters, iid=True, cv=folds)
    clf.fit(inputs, outputs.ravel())

print(f'Best performing model $R^2$ score on training set: {clf.best_score_}')
print(f'Model $R^2$ parameters: {clf.best_params_}')
print(f'Best performing model $R^2$ score on a separate test set: {clf.best_estimator_.score(scaler.transform(test_in), scaler_out.transform(test_out))}')

Best performing model $R^2$ score on training set: 0.9550875295895773
Model $R^2$ parameters: {'alpha': 1, 'hidden_layer_sizes': (112,)}
Best performing model $R^2$ score on a separate test set: 0.9941036624393594
Wall time: 8.71 s


## Surrogate Modelling Evaluator object
We can wrap the fitted model in a BESOS `Evaluator`.  
This has identical behaviour to the original EnergyPlus Evaluator object.

To parrallelize the surrogate model evaluator we simply pass in `multi=True` again.
The parrallelization occurs when calling the df_apply function.

In [7]:
def evaluation_func(ind, scaler=scaler):
    ind = scaler.transform(X=[ind])
    return ((scaler_out.inverse_transform(clf.predict(ind))[0],),())

NN_SM = EvaluatorSR(evaluation_func, problem, multi=True)

## Running a large surrogate evaluation
Here we bump up the sample count to 50,000 and partition the data into 4. (if you have more cores available, feel free to try increasing the proccesses)

In [8]:
%%time
inputs = sampling.dist_sampler(sampling.lhs, problem, 50000)
outputs = NN_SM.df_apply(inputs, processes=4)
results = inputs.join(outputs)
results.head()

Wall time: 24.8 s


Unnamed: 0,Wall conductivity,Attic thickness,U-Factor,Solar Heat Gain Coefficient,Watts per Zone Floor Area_0,Watts per Zone Floor Area_1,Window to Wall Ratio,Electricity:Facility
0,0.098669,0.23958,2.084795,0.54236,13.784494,12.896825,0.936752,2148340000.0
1,0.093888,0.280632,2.244941,0.43161,11.399425,10.238392,0.527872,1839201000.0
2,0.036105,0.254011,1.42762,0.510177,12.025844,13.875442,0.768495,2076042000.0
3,0.07453,0.211882,4.729027,0.234969,13.72065,14.188977,0.355187,2191315000.0
4,0.105957,0.128902,0.380137,0.563954,11.695128,13.097326,0.805476,2027911000.0
