## Hyperas tuning on the patent text cosine similarities for citations

Model architecture first defined by the number of features (31) and then tunes on width as well as depth. Also:
    - optionally adds 1 or 2 layers with the possibility of lower or similar density 
    - gives options of two learnable activation functions as options (PReLU and ELU), both of whch will avoid dead         neurons
    - gives a validation metric at the end of every trial of loss, a more reliable metric than accuracy

After a preliminary tuning script choosing these hyperparameters, training will use these hyperparameters with the Adam optimizer.

In [1]:
import os

import subprocess
import platform
import datetime as dt
import pickle
import joblib

import numpy as np
from pandas import read_csv

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

from hyperas import optim
from hyperas.distributions import choice, uniform
from hyperopt import (Trials, 
                      STATUS_OK, 
                      tpe)

import keras
from keras.layers.core import (Dense,
                               Dropout, 
                               Activation)
from keras.layers import (BatchNormalization, 
                          PReLU, 
                          ELU) 
from keras.models import Sequential
from keras.utils import np_utils

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [2]:
# Using AMD gpu with PlaidML and Metal
os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"

In [3]:
# Reproducibility code - basic information is stored as sheer number of parameters in neural network
# makes true reproducibility nearly impossible
# from: https://making.dia.com/abc-always-be-benchmarking-5854252fc6b1

# save python environment when run
open('hyperas_cosine_similarities_citation_tuning_environment.txt', 'wb').write(subprocess.check_output(['pip', 'list']))
# underlying platform
system = platform.uname()
# python version
python_version = platform.python_version()
# date and time of run
date = dt.datetime.today().strftime("%Y-%m-%d %H:%M:%S")

config = {
    'system': system,
    'python_version': python_version,
    'date': date
}

pickle.dump(config, open('hyperas_cosine_similarities_citation_tuning_config.p', 'wb'))

In [4]:
FEATURES = 'Patent_text_independent_variables-11-13-19.csv'
TARGET = 'Citation_count_y-11-13-19.csv'

def data(features: str=None,
         target: str=None):
    """Import data, drop ids, scale data between 0 and 1 for activation functions"""
    FEATURES = read_csv(features, header=0)
    TARGET = read_csv(target, header=0)
    train_x, test_x, train_y, test_y= train_test_split(features, 
                                                       target, 
                                                       test_size=0.05, 
                                                       random_state=1)
    x_scaler = MinMaxScaler().fit(train_x)
    train_x = x_scaler.transform(train_x)
    test_x = x_scaler.transform(test_x)
    joblib.dump(x_scaler, "Patent_text_cosine_similarity_data_MinMaxScaler.save")
    
    return train_x, train_y, test_x, test_y

In [5]:
def model(train_x, 
          train_y, 
          test_x, 
          test_y):
    """Define model"""
    
    '''
    Activation choices - choice of PReLU or ELU, both are 
    learnable and will help avoid oversaturation at either 
    end of the function range (0, 1) that leads to dead 
    neurons--a problem with ReLU and this data since there 
    are many dichotomous binary variables
    '''
    def Activation(name):
        if name == 'PReLU':
            return keras.layers.PReLU()
        if name == 'ELU':
            return keras.layers.ELU()
    
    activationChoice = {{choice(['PReLU', 'ELU'])}}
        
    model = Sequential() 
    model.add(Dense(31, input_shape=(31,), kernel_initializer='he_normal'))
    model.add(Activation(activationChoice))
    model.add(Dense({{choice([22, 24, 26])}}, kernel_initializer='he_normal'))
    model.add(Activation(activationChoice))
    model.add(Dense({{choice([20, 22, 24])}}, kernel_initializer='he_normal'))
    model.add(Activation(activationChoice))
    model.add(Dense({{choice([18, 20, 22])}}, kernel_initializer='he_normal'))
    model.add(Activation(activationChoice))
    
    # Add a fifth layer?
    if {{choice(['none', 'fifth_layer'])}} == 'fifth_layer':
        model.add(Dense({{choice([12, 14, 16, 18])}}, kernel_initializer='he_normal'))
        model.add(Activation(activationChoice))                
        
        # if we add a fifth layer, maybe add a less dense sixth layer?
        if {{choice(['none', 'sixth_layer'])}} == 'sixth_layer':
            model.add(Dense({{choice([10, 12, 14, 16])}}, kernel_initializer='he_normal'))
            model.add(Activation(activationChoice))    
    
    model.add(Dense({{choice([6, 8, 12, 14])}}, kernel_initializer='he_normal'))
    model.add(Activation(activationChoice))
    model.add(Dense({{choice([4, 6, 8, 12])}}, kernel_initializer='he_normal'))
    model.add(Activation(activationChoice))
    model.add(Dense(1, kernel_initializer='he_normal'))
    
    model.compile(loss='mean_squared_error', 
                  metrics=['mae'], 
                  optimizer='adam')

    result = model.fit(train_x, train_y,
              batch_size=16,
              epochs=5,
              verbose=2,
              validation_split=0.05)
    
    # get the lowest validation loss of each training epoch
    validation_loss = np.amin(result.history['val_loss']) 
    print('Best loss of epoch:', validation_loss)
    
    return {'loss': validation_loss, 'status': STATUS_OK, 'model': model}

In [6]:
train_x, train_y, test_x, test_y = data(FEATURES, TARGET)

In [7]:
best_run, best_model = optim.minimize(model=model,
                                     data=data,
                                     algo=tpe.suggest,
                                     max_evals=100,    # 100 trials should be enough since a parzen tree is used to 
                                     trials=Trials(),  # choose hyperparameters -- if the loss for a choice isn't 
                                     eval_space=True,  # good, Hyperas won't choose it again - and no optimizers
                                     notebook_name='Hyperas_tuning_for_citation_counts')

>>> Imports:
#coding=utf-8

try:
    import os
except:
    pass

try:
    import numpy as np
except:
    pass

try:
    from pandas import read_csv
except:
    pass

try:
    import subprocess
except:
    pass

try:
    import platform
except:
    pass

try:
    import datetime as dt
except:
    pass

try:
    import pickle
except:
    pass

try:
    import joblib
except:
    pass

try:
    from sklearn.model_selection import train_test_split
except:
    pass

try:
    from sklearn.preprocessing import MinMaxScaler
except:
    pass

try:
    from hyperas import optim
except:
    pass

try:
    from hyperas.distributions import choice, uniform
except:
    pass

try:
    from hyperopt import Trials, STATUS_OK, tpe
except:
    pass

try:
    import keras
except:
    pass

try:
    from keras.layers.core import Dense, Dropout, Activation
except:
    pass

try:
    from keras.layers import BatchNormalization, PReLU, ELU
except:
    pass

try:
    from keras.models import Sequential
except:


 - 584s - loss: 0.8343 - mean_absolute_error: 0.7165 - val_loss: 0.8288 - val_mean_absolute_error: 0.7135

Best loss of epoch:                                                                  
0.8288487806110523                                                                   
Train on 4507301 samples, validate on 237227 samples                                 
Epoch 1/5                                                                            
 - 450s - loss: 0.9229 - mean_absolute_error: 0.7635 - val_loss: 0.8752 - val_mean_absolute_error: 0.7357

Epoch 2/5                                                                            
 - 439s - loss: 0.8639 - mean_absolute_error: 0.7329 - val_loss: 0.8542 - val_mean_absolute_error: 0.7313

Epoch 3/5                                                                            
 - 433s - loss: 0.8539 - mean_absolute_error: 0.7272 - val_loss: 0.8478 - val_mean_absolute_error: 0.7221

Epoch 4/5                                               

0.8270472767148265                                                                     
Train on 4507301 samples, validate on 237227 samples                                   
Epoch 1/5                                                                              
 - 642s - loss: 0.9172 - mean_absolute_error: 0.7621 - val_loss: 0.8573 - val_mean_absolute_error: 0.7299

Epoch 2/5                                                                              
 - 638s - loss: 0.8593 - mean_absolute_error: 0.7315 - val_loss: 0.8479 - val_mean_absolute_error: 0.7267

Epoch 3/5                                                                              
 - 640s - loss: 0.8467 - mean_absolute_error: 0.7241 - val_loss: 0.8369 - val_mean_absolute_error: 0.7183

Epoch 4/5                                                                              
 - 645s - loss: 0.8408 - mean_absolute_error: 0.7204 - val_loss: 0.8314 - val_mean_absolute_error: 0.7134

Epoch 5/5                                   

Epoch 4/5                                                                              
 - 859s - loss: 0.8411 - mean_absolute_error: 0.7206 - val_loss: 0.8365 - val_mean_absolute_error: 0.7150

Epoch 5/5                                                                              
 - 863s - loss: 0.8376 - mean_absolute_error: 0.7185 - val_loss: 0.8628 - val_mean_absolute_error: 0.7301

Best loss of epoch:                                                                    
0.8365258524598952                                                                     
Train on 4507301 samples, validate on 237227 samples                                   
Epoch 1/5                                                                              
 - 990s - loss: 0.9129 - mean_absolute_error: 0.7591 - val_loss: 0.8634 - val_mean_absolute_error: 0.7338

Epoch 2/5                                                                              
 - 804s - loss: 0.8561 - mean_absolute_error: 0.7292 - val_loss

Epoch 5/5                                                                              
 - 801s - loss: 0.8355 - mean_absolute_error: 0.7172 - val_loss: 0.8393 - val_mean_absolute_error: 0.7201

Best loss of epoch:                                                                    
0.8351743343182192                                                                     
Train on 4507301 samples, validate on 237227 samples                                   
Epoch 1/5                                                                              
 - 804s - loss: 0.9123 - mean_absolute_error: 0.7590 - val_loss: 0.8621 - val_mean_absolute_error: 0.7309

Epoch 2/5                                                                              
 - 795s - loss: 0.8560 - mean_absolute_error: 0.7289 - val_loss: 0.8451 - val_mean_absolute_error: 0.7256

Epoch 3/5                                                                              
 - 814s - loss: 0.8446 - mean_absolute_error: 0.7226 - val_loss

Best loss of epoch:                                                                    
0.8247945574935982                                                                     
Train on 4507301 samples, validate on 237227 samples                                   
Epoch 1/5                                                                              
 - 1087s - loss: 0.9239 - mean_absolute_error: 0.7659 - val_loss: 0.8686 - val_mean_absolute_error: 0.7372

Epoch 2/5                                                                              
 - 941s - loss: 0.8605 - mean_absolute_error: 0.7322 - val_loss: 0.8497 - val_mean_absolute_error: 0.7285

Epoch 3/5                                                                              
 - 859s - loss: 0.8485 - mean_absolute_error: 0.7252 - val_loss: 0.8344 - val_mean_absolute_error: 0.7203

Epoch 4/5                                                                              
 - 854s - loss: 0.8422 - mean_absolute_error: 0.7215 - val_los

0.8287092217097966                                                                     
Train on 4507301 samples, validate on 237227 samples                                   
Epoch 1/5                                                                              
 - 1171s - loss: 0.9173 - mean_absolute_error: 0.7616 - val_loss: 0.8591 - val_mean_absolute_error: 0.7330

Epoch 2/5                                                                              
 - 1150s - loss: 0.8581 - mean_absolute_error: 0.7309 - val_loss: 0.8424 - val_mean_absolute_error: 0.7218

Epoch 3/5                                                                              
 - 1162s - loss: 0.8470 - mean_absolute_error: 0.7242 - val_loss: 0.8435 - val_mean_absolute_error: 0.7240

Epoch 4/5                                                                              
 - 1232s - loss: 0.8417 - mean_absolute_error: 0.7208 - val_loss: 0.8354 - val_mean_absolute_error: 0.7170

Epoch 5/5                               

Train on 4507301 samples, validate on 237227 samples                                    
Epoch 1/5                                                                               
 - 1670s - loss: 0.9180 - mean_absolute_error: 0.7621 - val_loss: 0.8663 - val_mean_absolute_error: 0.7343

Epoch 2/5                                                                               
 - 3292s - loss: 0.8615 - mean_absolute_error: 0.7322 - val_loss: 0.8501 - val_mean_absolute_error: 0.7231

Epoch 3/5                                                                               
 - 3597s - loss: 0.8487 - mean_absolute_error: 0.7249 - val_loss: 0.8459 - val_mean_absolute_error: 0.7237

Epoch 4/5                                                                               
 - 3473s - loss: 0.8424 - mean_absolute_error: 0.7211 - val_loss: 0.8371 - val_mean_absolute_error: 0.7166

Epoch 5/5                                                                               
 - 3453s - loss: 0.8385 - mean_abs

Epoch 4/5                                                                               
 - 1644s - loss: 0.8375 - mean_absolute_error: 0.7177 - val_loss: 0.8373 - val_mean_absolute_error: 0.7169

Epoch 5/5                                                                               
 - 1641s - loss: 0.8340 - mean_absolute_error: 0.7154 - val_loss: 0.8323 - val_mean_absolute_error: 0.7091

Best loss of epoch:                                                                     
0.8322584835010653                                                                      
Train on 4507301 samples, validate on 237227 samples                                    
Epoch 1/5                                                                               
 - 1514s - loss: 0.9196 - mean_absolute_error: 0.7637 - val_loss: 0.8803 - val_mean_absolute_error: 0.7413

Epoch 2/5                                                                               
 - 1507s - loss: 0.8632 - mean_absolute_error: 0.7344

In [8]:
print("Hyper-parameters of best model in order:")

print(best_run)

Hyper-parameters of best model in order:
{'Dense': 22, 'Dense_1': 24, 'Dense_2': 20, 'Dense_3': 'none', 'Dense_4': 18, 'Dense_5': 'sixth_layer', 'Dense_6': 16, 'Dense_7': 6, 'Dense_8': 8, 'activationChoice': 'PReLU'}
