# Keras + Tensorflow and Hypteropt Python tutorial
Made by Ties van der Heijden, TU Delft

In this exercise we will contintue with Dutch DAM price forecasting. This time we will give a detailed specification of our Neural Network, and we will optimize hyperparameters using HyperOpt.

To do this, the following packages are necessary:
- Numpy
- Pandas
- Matplotlib
- Tensorflow

And some specific functions are handy:
- SciKit Learn: KFold, StandardScaler
- Pathlib: Path


PS: Be sure to create a new environment for TF + Keras + Hyperopt, since pip and anaconda don't work too well together and can cause errors in the future. Better to have them conflict in a new python environment.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import tensorflow
from tensorflow import keras
import tensorflow.keras.backend as K

from hyperopt import hp, fmin, tpe, STATUS_OK, Trials, space_eval

from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from pathlib import Path

import pickle

from tensorflow.keras.layers import LeakyReLU

## Keras model creation - MLP

Start the function with the following command:
```python
tensorflow.keras.backend.clear_session()
```
Else Keras will keep all the trained models stored in the RAM, which causes the memory to slowly fill up.<br>

First we will build a function that returns a keras MLP as a function of its hyperparameters. Use the following hyperparameters:
- Hidden nodes in layer 1
- Hidden nodes in layer 2
- Activation function of the hidden layers
- Loss function (see keras.losses)
- Dropout rate
- Weight initialization (see keras.initializers)

We will fix some things in the model:
- Use the SGD algorithm to train the model. The optimizers parameters can be included as variables for the function (lr, momentum and nesterov), see keras.optimizers.SGD.
- For kernel regularization we will use an L2 regularizer with 1e-4 penalty term (see keras.regularizers). This enforces some sparsity to the solution.

Build the following Keras sequential model<br>
Layer 1: Hidden layer 1 (see keras.layers.Dense)<br>
Layer 2: Dropout layer (see keras.layers.Dropout)<br>
Layer 3: Hidden layer 2<br>
Layer 4: Output layer - think about the activation function to be used in the output layer.<br>
**Compile the model with the specified loss function and optimizer, and return it!**<br>

Make sure that the function returns the model, so that the following code would work:<br>
model = model_build_function(params)<br>
fit = model.fit(x = ..., y = ..., batch_size = ..., epochs = ...)<br>

<ins>Handy link:<ins><br>
https://keras.io/api/

In [2]:
def neural_net(params):

    tensorflow.keras.backend.clear_session()

    print ('Params testing: ', params)
    model = Sequential()
    model.add(Dense(params['units1'], input_dim = x_train_array.shape[1], kernel_initializer=params['weight_init']))
    # model.add(PReLU())
    model.add(Dropout(params['dropout1']))

    model.add(Dense(params['units2'], kernel_initializer=params['weight_init']))
    # model.add(PReLU())
    model.add(Dropout(params['dropout2']))

    model.add(Dense(24))
    model.add(Activation(params['activation']))

    sgd_optimizer = keras.optimizers.SGD(lr=params['learning_rate']/1000, decay=1e-7, momentum=params['momentum'], nesterov=params['nesterov'])

    model.compile(loss = params['loss'], optimizer = sgd_optimizer, metrics = ["mae"])

    model.fit(x_train_array, y_train_array, epochs=params['nb_epochs'], batch_size=params['batch_size'], verbose = 1, validation_data = (x_val_array, y_val_array))

    preds  = model.predict(x_val_array, batch_size = params['batch_size'], verbose = 1)
    acc = mean_absolute_error(y_val_array, preds)
    print('MAE:', acc)
    sys.stdout.flush()
    return {'loss': -acc, 'status': STATUS_OK}

# Define the hyperparameter search space

(1) Define the hyperparameters:
- Hidden nodes layer 1 and 2, which need to take integer values only. The types of parameters that are available can be found in the Hyperopt FMin wiki. A quantized uniform distribution could be used here. To limit the search space, the domain can be divided in steps of 5 nodes. For hidden layer one, search between 150 and 300 nodes per layer. For hidden layer two, search between 50 and 200 nodes per layer.
- Dropout rate, which needs to take continuous values lower than 1. A unifor distribution can be used, search between 0 and 0.5.
- Activation function for the hidden layers. This is a clear case of the 'choice' function in hyperopt. Try 'ReLu' and the 'LeakyReLu'. Optional: add a nested uniform distribution for the alpha parameter of the LeakyReLu.
- Loss function. another 'choice' parameter. Try the RMSE and MAE.
- Weight initialization. Use the choice-type parameter to try both 'RandomNormal' and 'RandomUniform' (see Keras Initializers doc).
- Learning rate of the SGD, to easy things make it a quantized unfirom distribution between 1 and 20 in steps of 1 and divide this by 1000 in your loop.
- Momentum, make this a uniform distribution between 0 and 0.5.
- Nesterov, which is a Boolean that can be described using the choice function.
- Epochs, which is a integer value between 100 and 300. Steps of 10 can be used.
- Batch size, which can take an integer value from 50 to 200. Note: if the optimization crashes due to memory issues, reduce the batch size.

(2) Define the search space:
In HyperOpt, a search space is defined as a python dictionary with the hyperparameters. Like in the following example:
```python
    n1 = hp.quniform('Hidden nodes layer 1', 150, 300, 5)
    n2 = hp.quniform('Hidden nodes layer 2', 50, 200, 5)
    
    search_space = {
        'Hidden nodes layer 1': n1,
        'Hidden nodes layer 2': n2
    }
```



<ins>Handy link:<ins><br>
http://hyperopt.github.io/hyperopt/#documentation <br>
https://github.com/hyperopt/hyperopt/wiki/FMin <br>
https://keras.io/api/

In [3]:
space = { 'choice': hp.choice('layers_number',
                             [{'layers': 'two'},
                             {'layers': 'three',
                             'units3': hp.choice('units3', [32, 64, 256]),
                             'dropout3': hp.choice('dropout3', np.linspace(0.1, 0.3, 3, dtype=float))
                             }]),

            'units1': hp.quniform('units1', 150, 300, 5),
            # 'units1': hp.choice('units1', [150, 768, 1024]),

            'units2': hp.quniform('units2', 50, 200, 5),
            # 'units2': hp.choice('units2', [128, 256, 512]),
            #'units3': hp.choice('units3', [32, 64, 256]),

            'dropout1': hp.uniform('dropout1', 0, 0.5),
            # 'dropout1': hp.choice('dropout1', np.linspace(0.3, 0.5, 3, dtype=float)),

            'dropout2': hp.uniform('dropout2', 0, 0.5),
            # 'dropout2': hp.choice('dropout2', np.linspace(0.1, 0.3, 3, dtype=float)),

            # 'dropout3': hp.uniform('dropout3', 0, 0.5),
            #'dropout3': hp.choice('dropout3', np.linspace(0.1, 0.3, 3, dtype=float)),

            'batch_size' : hp.choice('batch_size', [100]),

            'nb_epochs' :  hp.choice('nb_epochs', [100, 300, 10]),

            'activation': hp.choice('activation', ['relu', LeakyReLU(alpha=0.05)]),
            
            'loss': hp.choice('loss', ['mae', 'rmse']),

            'weight_init': hp.choice('weight_init', ['random_normal', 'random_uniform']),

            'momentum': hp.uniform('momentum', 0, 0.5),

            'learning_rate': hp.quniform('learning_rate', 1, 20, 1),

            'nesterov': hp.choice('nesterov', [True, False]),
            
        }

In [4]:
# space = { 'choice': hp.choice('layers_number',
#                              [{'layers': 'two'},
#                              {'layers': 'three',
#                              'units3': hp.choice('units3', [32, 64, 256]),
#                              'dropout3': hp.choice('dropout3', np.linspace(0.1, 0.3, 3, dtype=float))
#                              }]),

#             'units1': hp.choice('units1', [150, 768, 1024]),
#             'units2': hp.choice('units2', [128, 256, 512]),
#             #'units3': hp.choice('units3', [32, 64, 256]), 

#             'dropout1': hp.choice('dropout1', np.linspace(0.3, 0.5, 3, dtype=float)),
#             'dropout2': hp.choice('dropout2', np.linspace(0.1, 0.3, 3, dtype=float)),
#             #'dropout3': hp.choice('dropout3', np.linspace(0.1, 0.3, 3, dtype=float)),

#             'batch_size' : hp.choice('batch_size', [128, 256, 512]),

#             'nb_epochs' :  hp.choice('nb_epochs', [30, 50, 100]),
            
#         }

## Build your train function

Now you can build the last piece of the puzzle needed to optimize hyperparameters of your MLP.<br>

Define a function that takes a dictionary of hyperparameters as input. Make sure to redefine integer values as such, since hyperopt returns floats from quantized distributions.

The function should
(1) Read the hyperparameters.
(2) Loop over a 5-Fold Cross Validation (see scikit-learn KFold function) in which:
- A Keras model is declared with given hyperparameters.
- Scale the input-features using the scikitlearn StandardScaler. Scale the test-set with the scaling factors from the training set. This has to be done in the KFold loop to prevent information leakage.
- The model is trained over the train set in the given fold.
- The trained model is evaluated over the test set of the given fold, using the <ins>Mean Absolute Error!</ins> <br>
note: you can read in your data before calling the function, this saves you a lot of runtime. 
(3) Make a python list with the MAE (for example called 'losses') of the five folds and return a dictionary in the following format:
```python
    {'loss': np.mean(losses), 'status': STATUS_OK, 'losses': losses}
```


## Ready to loop

(1) Read in your data, no need to scale them. Just make sure to have an input features array (X) and a target array (y).<br>
(2) Declare a hyperopt trials object.
(3) Run the search! Let's use the Tree Parzen Estimator algorithm. Use the fmin function: 
```python 
def train(hyperparameters):
    ...
    return dict

search_space = {...}
trials = Trials()
X, y = load_data()

best = fmin(fn = train, 
            space = search_space, 
            algo = tpe.suggest, 
            max_evals = 500, 
            trials = trials, 
            show_progressbar = True
           )
```
(4) Save your trials object! This can be stored as a pickle. Don't mess with pickles, since they can potentially form safety hazards for your PC. Here is an example of proper pickle-usage:
```python
save_trials_path = Path(path_to_folder)
with open(save_trials_path / 'trials.pickle', 'wb') as pickle_file:
    pickle.dump(trials, pickle_file)

...rest of code
```

Note: run it once first with max_evals = 1 to check if everything works. Also, if this takes ages you can reduce the search space by having some hyperparameters fixed (for example by only using the MAE loss, fixing SGD parameters to the standard, using fixed epochs and/or batch_size), this would allow for a smaller amount of evals. This assignment is just to show what is possible on a big computer, this might not be feasible on your own PC.

In [5]:
from sklearn.model_selection import KFold, StratifiedKFold
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split

from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import GridSearchCV

from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD, Adam
from keras.utils import np_utils
from keras.layers.advanced_activations import LeakyReLU, PReLU
from keras.layers.normalization import BatchNormalization
from keras.regularizers import l1, l2, l1_l2

# (1) Load data

feat_X = pd.read_pickle(f"./variables/feat_X.pkl")
feat_y = pd.read_pickle(f"./variables/feat_y.pkl")

from sklearn.model_selection import train_test_split

perc_test=0.1

X_train, X_val, y_train, y_val = train_test_split(feat_X, feat_y, test_size=perc_test,shuffle=False, random_state=1236548)
print('Number of samples in the training set:', X_train.shape[0])
print('Number of samples in the test set:', X_val.shape[0])

print(X_train.shape)
print(X_val.shape)
print(y_train.shape)
print(y_val.shape)

x_train_array = np.array(X_train, dtype = float)
y_train_array = np.array(y_train)
x_val_array = np.array(X_val, dtype = float)
y_val_array = np.array(y_val)

print(x_train_array.shape)
print(x_val_array.shape)
print(y_train_array.shape)
print(y_val_array.shape)

# (2) Trials object

trials = Trials()

# (3) Run

best = fmin(neural_net, space, algo=tpe.suggest, max_evals = 1, trials=trials)
print('best: ')
print(best)

# (4) Save

save_trials_path = Path('./')
with open(save_trials_path / 'trials.pickle', 'wb') as pickle_file:
    pickle.dump(trials, pickle_file)

==]
 - ETA: 0s - loss: 6.3831 - mae: 6.3831

 - 0s 18ms/step - loss: 6.3831 - mae: 6.3831 - val_loss: 4.5722 - val_mae: 4.5722

Epoch 250/300
 1/17 [>.............................]
 - ETA: 0s - loss: 6.4448 - mae: 6.4448

 - ETA: 0s - loss: 6.4343 - mae: 6.4343

 - ETA: 0s - loss: 6.4274 - mae: 6.4274

 - ETA: 0s - loss: 6.4177 - mae: 6.4177

 - 0s 13ms/step - loss: 6.4177 - mae: 6.4177 - val_loss: 5.6303 - val_mae: 5.6303

Epoch 251/300
 1/17 [>.............................]
 - ETA: 0s - loss: 6.3999 - mae: 6.3999

 2/17 [==>...........................]
 - ETA: 0s - loss: 6.2833 - mae: 6.2833

 - ETA: 0s - loss: 6.3563 - mae: 6.3563

 - 0s 13ms/step - loss: 6.4082 - mae: 6.4082 - val_loss: 5.1943 - val_mae: 5.1943

Epoch 252/300
 1/17 [>.............................]
 - ETA: 0s - loss: 7.0732 - mae: 7.0732

 - ETA: 0s - loss: 6.5438 - mae: 6.5438

 - ETA: 0s - loss: 6.4112 - mae: 6.4112

 - 0s 11ms/step - loss: 6.4217 - mae: 6.4217 - val_loss: 4.5485 - val_mae: 4.5485

Epoc

In [6]:
fig, ax = plt.subplots(figsize=(20, 5))

x = PRD.index
y = PRD
ax.plot(x, y, label='Forecast')

x = dam2.index
y = dam2['Price']
ax.plot(x, y, label='Actual')

ax.set_title(f'Price forecast and realisation for test data');
ax.set_xlabel(r'Date');
ax.set_ylabel(r'Price');
ax.legend();

NameError: name 'PRD' is not defined

In [None]:
# keras.backend.clear_session()

# def create_model():
#     # initialize model
#     model = keras.models.Sequential()

#     # add input layer
#     model.add(keras.layers.Dense(
#         units=50,
#         # input_dim=X_train_centered.shape[1],
#         input_dim=50,
#         kernel_initializer='glorot_uniform',
#         bias_initializer='zeros',
#         activation='tanh',
#         kernel_regularizer=keras.regularizers.l2(1e-4)
#     ))

#     model.add(keras.layers.Dropout(0.2))

#     # add hidden layer
#     model.add(
#         keras.layers.Dense(
#             units=50,
#             input_dim=50,
#             kernel_initializer='glorot_uniform',
#             bias_initializer='zeros',
#             activation='tanh',
#             kernel_regularizer=keras.regularizers.l2(1e-4)
#         ))

#     # add output layer
#     model.add(
#         keras.layers.Dense(
#             # units=y_train_onehot.shape[1],
#             units=5,
#             input_dim=50,
#             kernel_initializer='glorot_uniform',
#             bias_initializer='zeros',
#             activation='softmax',
#             kernel_regularizer=keras.regularizers.l2(1e-4)
#         ))

#     # define SGD optimizer
#     sgd_optimizer = keras.optimizers.SGD(
#         lr=0.001, decay=1e-7, momentum=0.9
#     )
#     # compile model
#     model.compile(
#         optimizer=sgd_optimizer,
#         loss='categorical_crossentropy'
#     )

#     return model

In [None]:
# ### HYPEROPT ###

# from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
# import sys

# import pandas as pd
# import numpy as np

# np.random.seed(6669)

# from sklearn.model_selection import KFold, StratifiedKFold
# from sklearn.metrics import mean_absolute_error
# from sklearn.model_selection import train_test_split

# from keras.wrappers.scikit_learn import KerasRegressor
# from sklearn.model_selection import GridSearchCV

# from keras.models import Sequential
# from keras.layers.core import Dense, Dropout, Activation
# from keras.optimizers import SGD, Adam
# from keras.utils import np_utils
# from keras.layers.advanced_activations import LeakyReLU, PReLU
# from keras.layers.normalization import BatchNormalization
# from keras.regularizers import l1, l2, l1_l2

# import tensorflow as tf
# tf.python.control_flow_ops = tf

# # Based on Faron's stacker. Thanks!

# ID = 'id'
# TARGET = 'loss'
# NFOLDS = 5
# SEED = 669
# NROWS = None
# DATA_DIR = "../../"

# TRAIN_FILE = "./train.csv"
# TEST_FILE = "./test.csv"
# SUBMISSION_FILE = "./sample_submission.csv"

# train = pd.read_csv(TRAIN_FILE, nrows=NROWS)
# test = pd.read_csv(TEST_FILE, nrows=NROWS)

# train_indices = train[ID]
# test_indices = test[ID]

# y_train_full = train["loss"]
# y_train_ravel = train[TARGET].ravel()

# train.drop([ID, TARGET], axis=1, inplace=True)
# test.drop([ID], axis=1, inplace=True)

# print("{},{}".format(train.shape, test.shape))

# ntrain = train.shape[0]
# ntest = test.shape[0]
# train_test = pd.concat((train, test)).reset_index(drop=True)

# features = train.columns

# cats = [feat for feat in features if 'cat' in feat]
# for feat in cats:
#     train_test[feat] = pd.factorize(train_test[feat], sort=True)[0]
    
# train = train_test.iloc[:ntrain, :]

# # Using train_test_split in order to create random split for Keras,
# # otherwise it'll use last part of data when
# # validation_split is provided in the model parameters.

# X_train, X_val, y_train, y_val = train_test_split(train, y_train_full, test_size = 0.15)

# feat_X = pd.read_pickle(f"./variables/feat_X.pkl")
# feat_y = pd.read_pickle(f"./variables/feat_y.pkl")

# from sklearn.model_selection import train_test_split

# perc_test=0.1

# X_train, X_val, y_train, y_val = train_test_split(feat_X, feat_y, test_size=perc_test,shuffle=False, random_state=1236548)
# print('Number of samples in the training set:', X_train.shape[0])
# print('Number of samples in the test set:', X_val.shape[0])

# print(X_train.shape)
# print(X_val.shape)
# print(y_train.shape)
# print(y_val.shape)

# x_train_array = np.array(X_train, dtype = float)
# y_train_array = np.array(y_train)
# x_val_array = np.array(X_val, dtype = float)
# y_val_array = np.array(y_val)

# print(x_train_array.shape)
# print(x_val_array.shape)
# print(y_train_array.shape)
# print(y_val_array.shape)

# # Unfortunately, I didn't manage to implement proper KFold when using Hyperopt.
# # This can be done easily using GridSearch.
# # Code for 5-fold CV in further section.


# # Parameters search space, can be adjusted according to your needs.

# space = { 'choice': hp.choice('layers_number',
#                              [{'layers': 'two'},
#                              {'layers': 'three',
#                              'units3': hp.choice('units3', [32, 64, 256]),
#                              'dropout3': hp.choice('dropout3', np.linspace(0.1, 0.3, 3, dtype=float))
#                              }]),

#             'units1': hp.choice('units1', [512, 768, 1024]),
#             'units2': hp.choice('units2', [128, 256, 512]),
#             #'units3': hp.choice('units3', [32, 64, 256]), 

#             'dropout1': hp.choice('dropout1', np.linspace(0.3, 0.5, 3, dtype=float)),
#             'dropout2': hp.choice('dropout2', np.linspace(0.1, 0.3, 3, dtype=float)),
#             #'dropout3': hp.choice('dropout3', np.linspace(0.1, 0.3, 3, dtype=float)),

#             'batch_size' : hp.choice('batch_size', [128, 256, 512]),

#             'nb_epochs' :  hp.choice('nb_epochs', [30, 50, 100]),
            
#         }


# # Architecture of NN loosely based on Danijel Kivaranovic Keras script. Thanks!

# def neural_net(params):   

#     print ('Params testing: ', params)
#     model = Sequential()
#     model.add(Dense(params['units1'], input_dim = x_train_array.shape[1]))
#     model.add(PReLU())
#     model.add(Dropout(0.4))

#     model.add(Dense(params['units2']))
#     model.add(PReLU())
#     model.add(Dropout(params['dropout2']))

#     if params['choice']['layers'] == 'three':
#         model.add(Dense(params['choice']['units3'])) 
#         model.add(PReLU())
#         model.add(Dropout(params['choice']['dropout3']))    

#     model.add(Dense(24))
#     model.add(Activation('linear'))
#     model.compile(loss = 'mae', optimizer = 'adam', metrics = ["mae"])
    

#     model.fit(x_train_array, y_train_array, epochs=params['nb_epochs'],
#               batch_size=params['batch_size'], verbose = 1, validation_data = (x_val_array, y_val_array))

#     preds  = model.predict(x_val_array, batch_size = params['batch_size'], verbose = 1)
#     acc = mean_absolute_error(y_val_array, preds)
#     print('MAE:', acc)
#     sys.stdout.flush() 
#     return {'loss': -acc, 'status': STATUS_OK}

# trials = Trials()
# best = fmin(neural_net, space, algo=tpe.suggest, max_evals = 1, trials=trials)
# print('best: ')
# print(best)

In [None]:
# space = {'choice':


# hp.choice('num_layers',
#     [
#                     {'layers':'two',
                     
                                                    
#                     },
        
#                      {'layers':'three',
                      
                      
#                       'units3': hp.choice('units3', [64, 128, 256, 512]),
#                       'dropout3': hp.choice('dropout3', [0.25,0.5,0.75])
                                
#                     }
        
    
#     ]),
    
#     'units1': hp.choice('units1', [64, 128, 256, 512]),
#     'units2': hp.choice('units2', [64, 128, 256, 512]),
                 
#     'dropout1': hp.choice('dropout1', [0.25,0.5,0.75]),
#     'dropout2': hp.choice('dropout2', [0.25,0.5,0.75]),
    
#     'batch_size' : hp.choice('batch_size', [28,64,128]),

#     'nb_epochs' :  100,
#     'optimizer': 'adadelta',
#     'activation': 'relu'
    
    
#     }

In [None]:
# def neural_net(params):   

#     print ('Params testing: ', params)
#     model = Sequential()
#     model.add(Dense(params['units1'], input_dim = x_train_array.shape[1]))
#     model.add(PReLU())
#     model.add(Dropout(0.4))

#     model.add(Dense(params['units2']))
#     model.add(PReLU())
#     model.add(Dropout(params['dropout2']))

#     if params['choice']['layers'] == 'three':
#         model.add(Dense(params['choice']['units3'])) 
#         model.add(PReLU())
#         model.add(Dropout(params['choice']['dropout3']))    

#     model.add(Dense(24))
#     model.add(Activation('linear'))
#     model.compile(loss = 'mae', optimizer = 'adam', metrics = ["mae"])
    

#     model.fit(x_train_array, y_train_array, epochs=params['nb_epochs'],
#               batch_size=params['batch_size'], verbose = 1, validation_data = (x_val_array, y_val_array))

#     preds  = model.predict(x_val_array, batch_size = params['batch_size'], verbose = 1)
#     acc = mean_absolute_error(y_val_array, preds)
#     print('MAE:', acc)
#     sys.stdout.flush() 
#     return {'loss': -acc, 'status': STATUS_OK}

In [None]:
# #Objective function that hyperopt will minimize

# def root_mean_squared_error(y_true, y_pred):
#     return K.sqrt(K.mean(K.square(y_pred - y_true)))

# def objective(params):
    
#     # import ml_metrics
    
#     from keras.models import Sequential
#     from keras.layers.core import Dense, Dropout, Activation
#     from keras.optimizers import Adadelta
#     from keras.layers.normalization import BatchNormalization
#     from keras.callbacks import Callback

#     from sklearn.metrics import mean_absolute_error as MAE
#     from sklearn.metrics import mean_squared_error as MSE
    
#     print ('Params testing: ', params)
#     print ('\n ')
    
#     model = Sequential()
#     model.add(Dense(params['units1'], input_dim = X_train.shape[1]))
#     model.add(Activation(params['activation']))
#     model.add(Dropout(params['dropout1']))
#     model.add(BatchNormalization())
    
#     model.add(Dense(params['units2']))
#     model.add(Activation(params['activation']))
#     model.add(Dropout(params['dropout2']))
#     model.add(BatchNormalization())
    
#     if params['choice']['layers']== 'three':
#         model.add(Dense(params['choice']['units3'])) 
#         model.add(Activation(params['activation']))
#         model.add(Dropout(params['choice']['dropout3']))
#         model.add(BatchNormalization())
#         patience=25
#     else:
#         patience=15
     
#     model.add(Dense(1))    #End in a single output node for regression style output
#     # model.compile(loss=root_mean_squared_error, optimizer=params['optimizer'])

#     model.compile(
#         optimizer=params['optimizer'],
#         loss='sparse_categorical_crossentropy'
#     )
    
#     #object of class for call back early stopping 
#     # val_call = clsvalidation_kappa(validation_data=(X_val, y_val), patience=patience, filepath='"../input/best.h5') #instantiate object

#     #includes the call back object
#     model.fit(X_train, y_train, epochs=params['nb_epochs'], batch_size=params['batch_size'], verbose = 0)
     
#     #predict the test set
#     preds=model.predict(X_val, batch_size = 5000, verbose = 0)
    
#     predClipped = np.clip(np.round(preds.astype(int).ravel()), 1, 8) #simple rounding of predictionto int
#     # score=ml_metrics.quadratic_weighted_kappa(y_test.values.ravel(),predClipped)

#     score = MAE(y_val, preds)
 
#     return {'loss': score, 'status': STATUS_OK, 'rounds': val_call.best_rounds}

# trials = Trials()

# best = fmin(objective, space, algo=tpe.suggest, trials=trials, max_evals=100)

# print (best)
# print (trials.best_trial)

In [None]:
# ### HYPEROPT ###

# from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
# import sys

# import pandas as pd
# import numpy as np

# np.random.seed(6669)

# from sklearn.model_selection import KFold, StratifiedKFold
# from sklearn.metrics import mean_absolute_error
# from sklearn.model_selection import train_test_split

# from keras.wrappers.scikit_learn import KerasRegressor
# from sklearn.model_selection import GridSearchCV

# from keras.models import Sequential
# from keras.layers.core import Dense, Dropout, Activation
# from keras.optimizers import SGD, Adam
# from keras.utils import np_utils
# from keras.layers.advanced_activations import LeakyReLU, PReLU
# from keras.layers.normalization import BatchNormalization
# from keras.regularizers import l1, l2, l1_l2

# import tensorflow as tf
# tf.python.control_flow_ops = tf

# # Based on Faron's stacker. Thanks!

# ID = 'id'
# TARGET = 'loss'
# NFOLDS = 5
# SEED = 669
# NROWS = None
# DATA_DIR = "../../"

# TRAIN_FILE = "./train.csv"
# TEST_FILE = "./test.csv"
# SUBMISSION_FILE = "./sample_submission.csv"

# train = pd.read_csv(TRAIN_FILE, nrows=NROWS)
# test = pd.read_csv(TEST_FILE, nrows=NROWS)

# train_indices = train[ID]
# test_indices = test[ID]

# y_train_full = train["loss"]
# y_train_ravel = train[TARGET].ravel()

# train.drop([ID, TARGET], axis=1, inplace=True)
# test.drop([ID], axis=1, inplace=True)

# print("{},{}".format(train.shape, test.shape))

# ntrain = train.shape[0]
# ntest = test.shape[0]
# train_test = pd.concat((train, test)).reset_index(drop=True)

# features = train.columns

# cats = [feat for feat in features if 'cat' in feat]
# for feat in cats:
#     train_test[feat] = pd.factorize(train_test[feat], sort=True)[0]
    
# train = train_test.iloc[:ntrain, :]

# # Using train_test_split in order to create random split for Keras,
# # otherwise it'll use last part of data when
# # validation_split is provided in the model parameters.

# X_train, X_val, y_train, y_val = train_test_split(train, y_train_full, test_size = 0.15)

# feat_X = pd.read_pickle(f"./variables/feat_X.pkl")
# feat_y = pd.read_pickle(f"./variables/feat_y.pkl")

# from sklearn.model_selection import train_test_split

# perc_test=0.1

# X_train, X_val, y_train, y_val = train_test_split(feat_X, feat_y, test_size=perc_test,shuffle=False, random_state=1236548)
# print('Number of samples in the training set:', X_train.shape[0])
# print('Number of samples in the test set:', X_val.shape[0])

# print(X_train.shape)
# print(X_val.shape)
# print(y_train.shape)
# print(y_val.shape)

# x_train_array = np.array(X_train, dtype = float)
# y_train_array = np.array(y_train)
# x_val_array = np.array(X_val, dtype = float)
# y_val_array = np.array(y_val)

# print(x_train_array.shape)
# print(x_val_array.shape)
# print(y_train_array.shape)
# print(y_val_array.shape)

# # Unfortunately, I didn't manage to implement proper KFold when using Hyperopt.
# # This can be done easily using GridSearch.
# # Code for 5-fold CV in further section.


# # Parameters search space, can be adjusted according to your needs.

# space = { 'choice': hp.choice('layers_number',
#                              [{'layers': 'two'},
#                              {'layers': 'three',
#                              'units3': hp.choice('units3', [32, 64, 256]),
#                              'dropout3': hp.choice('dropout3', np.linspace(0.1, 0.3, 3, dtype=float))
#                              }]),

#             'units1': hp.choice('units1', [512, 768, 1024]),
#             'units2': hp.choice('units2', [128, 256, 512]),
#             #'units3': hp.choice('units3', [32, 64, 256]), 

#             'dropout1': hp.choice('dropout1', np.linspace(0.3, 0.5, 3, dtype=float)),
#             'dropout2': hp.choice('dropout2', np.linspace(0.1, 0.3, 3, dtype=float)),
#             #'dropout3': hp.choice('dropout3', np.linspace(0.1, 0.3, 3, dtype=float)),

#             'batch_size' : hp.choice('batch_size', [128, 256, 512]),

#             'nb_epochs' :  hp.choice('nb_epochs', [30, 50, 100]),
            
#         }


# # Architecture of NN loosely based on Danijel Kivaranovic Keras script. Thanks!

# def neural_net(params):   

#     print ('Params testing: ', params)
#     model = Sequential()
#     model.add(Dense(params['units1'], input_dim = x_train_array.shape[1]))
#     model.add(PReLU())
#     model.add(Dropout(0.4))

#     model.add(Dense(params['units2']))
#     model.add(PReLU())
#     model.add(Dropout(params['dropout2']))

#     if params['choice']['layers'] == 'three':
#         model.add(Dense(params['choice']['units3'])) 
#         model.add(PReLU())
#         model.add(Dropout(params['choice']['dropout3']))    

#     model.add(Dense(24))
#     model.add(Activation('linear'))
#     model.compile(loss = 'mae', optimizer = 'adam', metrics = ["mae"])
    

#     model.fit(x_train_array, y_train_array, epochs=params['nb_epochs'],
#               batch_size=params['batch_size'], verbose = 1, validation_data = (x_val_array, y_val_array))

#     preds  = model.predict(x_val_array, batch_size = params['batch_size'], verbose = 1)
#     acc = mean_absolute_error(y_val_array, preds)
#     print('MAE:', acc)
#     sys.stdout.flush() 
#     return {'loss': -acc, 'status': STATUS_OK}

# trials = Trials()
# best = fmin(neural_net, space, algo=tpe.suggest, max_evals = 1, trials=trials)
# print('best: ')
# print(best)

In [None]:
# # we install the necessary packages
# !pip install networkx==1.11 # para instala hyperopt correctamente, si no, da errores
# !pip install hyperopt
# # necessary imports
# import sys
# import time
# import numpy as np
# from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
# from keras.models import Sequential
# from keras.layers import Dense, Dropout, Activation, Flatten
# from keras.layers import Conv2D, MaxPooling2D
# from keras.constraints import max_norm
# from keras.optimizers import Adam
# from sklearn.model_selection import train_test_split
# from keras.utils import to_categorical
# from keras.callbacks import EarlyStopping
# from keras.datasets import cifar10
# SEED = 42
# (X_train, y_train), (X_test, y_test) = cifar10.load_data()
# validation_split = 0.1
# X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=validation_split, random_state=SEED)
# # Let's convert the data to float and then divide it by 255 to normalize it
# # Due to image characteristics they can only get values from 0 to 255
# X_train = X_train.astype('float32') / 255.
# X_val = X_val.astype('float32') / 255.
# X_test = X_test.astype('float32') / 255.
# # let's convert the labels with one-hot encoding
# n_classes = 10
# y_train = to_categorical(y_train, n_classes)
# y_val = to_categorical(y_val, n_classes)
# y_test = to_categorical(y_test, n_classes)
# # we define the search space
# # we'll vary:
# # - the number of filters in our conv layers
# # - the dropout percentage
# # - the number of neurons in the dense layer
# space = {
#     'n_filters_conv': hp.choice('n_filters_conv', [32, 64, 128]),
#     'dropout': hp.uniform('dropout', 0.0, 0.5),
#     'neurons_dense': hp.choice('neurons_dense', [256, 512, 1024]), 
# }
# def get_callbacks(pars):
#   callbacks = [EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=2, verbose=0, mode='auto')]
#   return callbacks
# def mi_cnn(pars):
#   print ('Parameters: ', pars)
#   model = Sequential()
 
#   # First convolutional block
#   model.add(Conv2D(pars['n_filters_conv'], kernel_size=(3, 3), activation='relu', input_shape=(32, 32, 3)))
#   model.add(MaxPooling2D(pool_size=(2, 2)))
#   model.add(Dropout(pars['dropout']))
# # second convolutional block
#   model.add(Conv2D(pars['n_filters_conv'], kernel_size=(3, 3), activation='relu'))
#   model.add(MaxPooling2D(pool_size=(2, 2)))
#   model.add(Dropout(pars['dropout']))
# # third convolutional block
#   model.add(Conv2D(pars['n_filters_conv'], kernel_size=(3, 3), activation='relu'))
#   model.add(MaxPooling2D(pool_size=(2, 2)))
#   model.add(Dropout(pars['dropout']))
# # Classifier block
#   model.add(Flatten())
#   model.add(Dense(pars['neurons_dense'], activation='relu', kernel_constraint=max_norm(3.)))
#   model.add(Dropout(pars['dropout']))
#   model.add(Dense(10, activation='softmax'))
# # We compile the model
#   model.compile(loss='categorical_crossentropy',
#                 optimizer=Adam(lr=0.0001, decay=1e-6),
#                 metrics=['accuracy'])
# # We train the model
#   history = model.fit(X_train, 
#                       y_train,
#                       batch_size=128,
#                       shuffle=True,
#                       epochs=5,
#                       validation_data=(X_val, y_val),
#                       verbose = 0,
#                       callbacks = get_callbacks(pars))
# best_epoch_loss = np.argmin(history.history['val_loss'])
# best_val_loss = np.min(history.history['val_loss'])
# best_val_acc = np.max(history.history['val_acc'])

# print('Epoch {} - val acc: {} - val loss: {}'.format(best_epoch_loss, best_val_acc, best_val_loss))
# sys.stdout.flush()

# return {'loss': best_val_loss, 'best_epoch': best_epoch_loss, 'eval_time': time.time(), 'status': STATUS_OK, 'model': model, 'history': history}
# trials = Trials()
# best = fmin(mi_cnn, space, algo=tpe.suggest, max_evals=10, trials=trials)
# print(best)