## Hyperas tuning on the patent text cosine similarities for patents

Model architecture first defined by the number of features (31) and then tunes on width as well as depth. Also:
    - optionally adds 1 or 2 layers with the possibility of lower or similar density 
    - gives options of two learnable activation functions as options (PReLU and ELU), both of whch will avoid dead         neurons
    - gives a validation metric at the end of every trial of loss, a more reliable metric than accuracy

After a preliminary tuning script choosing these hyperparameters, training will use these hyperparameters with the Adam optimizer.

In [1]:
import os

import subprocess
import platform
import datetime as dt
import pickle
import joblib

import numpy as np
from pandas import read_csv

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

from hyperas import optim
from hyperas.distributions import choice, uniform
from hyperopt import (Trials, 
                      STATUS_OK, 
                      tpe)

import keras
from keras.layers.core import (Dense, 
                               Dropout, 
                               Activation)
from keras.layers import (BatchNormalization, 
                          PReLU, 
                          ELU) 
from keras.models import Sequential
from keras.utils import np_utils

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [None]:
# Using AMD gpu with PlaidML and Metal  -- PlaidML stopped working for me after updating MacOS :(
os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"

In [2]:
# Reproducibility code - basic information is stored as sheer number of parameters in neural network
# makes true reproducibility nearly impossible
# from: https://making.dia.com/abc-always-be-benchmarking-5854252fc6b1

# save python environment when run
open('hyperas_cosine_similarities_patent_tuning_environment.txt', 'wb').write(subprocess.check_output(['pip', 'list'], shell=True))
# underlying platform
system = platform.uname()
# python version
python_version = platform.python_version()
# date and time of run
date = dt.datetime.today().strftime("%Y-%m-%d %H:%M:%S")

config = {
    'system': system,
    'python_version': python_version,
    'date': date
}

pickle.dump(config, open('hyperas_cosine_similarities_patent_tuning_config.p', 'wb'))

In [3]:
FEATURES = 'Patent_text_independent_variables-11-13-19.csv'
TARGET = 'Patent_count_y-11-13-19.csv'

def data(features: str=None,
         target: str=None):
    """Import data, drop ids, scale data between 0 and 1 for activation functions"""
    features = read_csv(features, header = 0)
    target = read_csv(target, header = 0)
    train_x, test_x, train_y, test_y= train_test_split(features, 
                                                       target, 
                                                       test_size=0.05, 
                                                       random_state=1)
    x_scaler = MinMaxScaler().fit(train_x)
    train_x = x_scaler.transform(train_x)
    test_x = x_scaler.transform(test_x)
    joblib.dump(x_scaler, "Patent_text_cosine_similarity_data_MinMaxScaler.save")
    
    return train_x, train_y, test_x, test_y

In [4]:
def model(train_x, 
          train_y, 
          test_x, 
          test_y):
    """Define model"""
    
    '''
    Activation choices - choice of PReLU or ELU, both 
    are learnable and will help avoid oversaturation 
    at either end of the function range (0, 1) that 
    leads to dead neurons--a problem with ReLU and 
    this data since there are many dichotomous binary 
    variables
    '''
    def Activation(name):
        """fcn necessary for advanced activations with Hyperas"""
        if name == 'PReLU':
            return keras.layers.PReLU()
        if name == 'ELU':
            return keras.layers.ELU()
    
    activation_choice = {{choice(['PReLU', 'ELU'])}}
        
    model = Sequential() 
    model.add(Dense(31, input_shape=(31,), kernel_initializer='he_normal'))
    model.add(Activation(activation_choice))
    model.add(Dense({{choice([22, 24, 26])}}, kernel_initializer='he_normal'))
    model.add(Activation(activation_choice))
    model.add(Dense({{choice([20, 22, 24])}}, kernel_initializer='he_normal'))
    model.add(Activation(activation_choice))
    model.add(Dense({{choice([18, 20, 22])}}, kernel_initializer='he_normal'))
    model.add(Activation(activation_choice))
    
    # Add a fifth layer?
    if {{choice(['none', 'fifth_layer'])}} == 'fifth_layer':
        model.add(Dense({{choice([12, 14, 16, 18])}}, kernel_initializer='he_normal'))
        model.add(Activation(activation_choice))                
        
        # if we add a fifth layer, maybe add a less dense sixth layer?
        if {{choice(['none', 'sixth_layer'])}} == 'sixth_layer':
            model.add(Dense({{choice([10, 12, 14, 16])}}, kernel_initializer='he_normal'))
            model.add(Activation(activation_choice))    
    
    model.add(Dense({{choice([6, 8, 12, 14])}}, kernel_initializer='he_normal'))
    model.add(Activation(activation_choice))
    model.add(Dense({{choice([4, 6, 8, 12])}}, kernel_initializer='he_normal'))
    model.add(Activation(activation_choice))
    model.add(Dense(1, kernel_initializer='he_normal'))
    
    model.compile(loss='mean_squared_error', 
                  metrics=['mae'], 
                  optimizer='adam')

    result = model.fit(train_x, 
                       train_y,
                       batch_size=16,
                       epochs=5,
                       verbose=2,
                       validation_split=0.05)
    
    # get the lowest validation loss of each training epoch
    validation_loss = np.amin(result.history['val_loss']) 
    print('Best loss of epoch:', validation_loss)
    
    return {'loss': validation_loss, 'status': STATUS_OK, 'model': model}

In [5]:
train_x, train_y, test_x, test_y = data(FEATURES, TARGET)

In [6]:
best_run, best_model = optim.minimize(model=model,
                                     data=data,
                                     algo=tpe.suggest,
                                     max_evals=100,    # 100 trials should be enough since a parzen tree is used to 
                                     trials=Trials(),  # choose hyperparameters -- if the loss for a choice isn't 
                                     eval_space=True,  # good, Hyperas won't choose it again - and no optimizers
                                     notebook_name='Hyperas_tuning_for_patent_counts')

>>> Imports:
#coding=utf-8

try:
    import os
except:
    pass

try:
    import numpy as np
except:
    pass

try:
    from pandas import read_csv
except:
    pass

try:
    import subprocess
except:
    pass

try:
    import platform
except:
    pass

try:
    import datetime as dt
except:
    pass

try:
    import pickle
except:
    pass

try:
    import joblib
except:
    pass

try:
    from sklearn.model_selection import train_test_split
except:
    pass

try:
    from sklearn.preprocessing import MinMaxScaler
except:
    pass

try:
    from hyperas import optim
except:
    pass

try:
    from hyperas.distributions import choice, uniform
except:
    pass

try:
    from hyperopt import Trials, STATUS_OK, tpe
except:
    pass

try:
    import keras
except:
    pass

try:
    from keras.layers.core import Dense, Dropout, Activation
except:
    pass

try:
    from keras.layers import BatchNormalization, PReLU, ELU
except:
    pass

try:
    from keras.models import Sequential
except:


 - 1002s - loss: 0.0659 - mean_absolute_error: 0.1824 - val_loss: 0.0654 - val_mean_absolute_error: 0.1850

Epoch 5/5                                                                                
 - 1060s - loss: 0.0656 - mean_absolute_error: 0.1818 - val_loss: 0.0651 - val_mean_absolute_error: 0.1750

Best loss of epoch:                                                                      
0.06495177080754577                                                                      
Train on 4507301 samples, validate on 237227 samples                                     
Epoch 1/5                                                                                
 - 561s - loss: 0.0780 - mean_absolute_error: 0.2045 - val_loss: 0.0676 - val_mean_absolute_error: 0.1866

Epoch 2/5                                                                                
 - 536s - loss: 0.0669 - mean_absolute_error: 0.1836 - val_loss: 0.0658 - val_mean_absolute_error: 0.1832

Epoch 3/5                     

Epoch 3/5                                                                                  
 - 749s - loss: 0.0669 - mean_absolute_error: 0.1851 - val_loss: 0.0658 - val_mean_absolute_error: 0.1758

Epoch 4/5                                                                                  
 - 749s - loss: 0.0664 - mean_absolute_error: 0.1840 - val_loss: 0.0671 - val_mean_absolute_error: 0.1800

Epoch 5/5                                                                                  
 - 757s - loss: 0.0659 - mean_absolute_error: 0.1828 - val_loss: 0.0647 - val_mean_absolute_error: 0.1787

Best loss of epoch:                                                                        
0.06467627899550073                                                                        
Train on 4507301 samples, validate on 237227 samples                                       
Epoch 1/5                                                                                  
 - 766s - loss: 0.0786 - mean_absol

 - 906s - loss: 0.0788 - mean_absolute_error: 0.2046 - val_loss: 0.0745 - val_mean_absolute_error: 0.1915

Epoch 2/5                                                                                  
 - 901s - loss: 0.0687 - mean_absolute_error: 0.1879 - val_loss: 0.0665 - val_mean_absolute_error: 0.1802

Epoch 3/5                                                                                  
 - 904s - loss: 0.0669 - mean_absolute_error: 0.1845 - val_loss: 0.0660 - val_mean_absolute_error: 0.1861

Epoch 4/5                                                                                  
 - 905s - loss: 0.0661 - mean_absolute_error: 0.1829 - val_loss: 0.0677 - val_mean_absolute_error: 0.1907

Epoch 5/5                                                                                  
 - 995s - loss: 0.0658 - mean_absolute_error: 0.1823 - val_loss: 0.0646 - val_mean_absolute_error: 0.1839

Best loss of epoch:                                                                        
0.064

0.06461547854240132                                                                        
Train on 4507301 samples, validate on 237227 samples                                       
Epoch 1/5                                                                                  
 - 817s - loss: 0.0783 - mean_absolute_error: 0.2037 - val_loss: 0.0679 - val_mean_absolute_error: 0.1868

Epoch 2/5                                                                                  
 - 813s - loss: 0.0673 - mean_absolute_error: 0.1844 - val_loss: 0.0653 - val_mean_absolute_error: 0.1790

Epoch 3/5                                                                                  
 - 810s - loss: 0.0659 - mean_absolute_error: 0.1815 - val_loss: 0.0648 - val_mean_absolute_error: 0.1761

Epoch 4/5                                                                                  
 - 813s - loss: 0.0653 - mean_absolute_error: 0.1804 - val_loss: 0.0646 - val_mean_absolute_error: 0.1784

Epoch 5/5           

Epoch 5/5                                                                                  
 - 864s - loss: 0.0651 - mean_absolute_error: 0.1797 - val_loss: 0.0644 - val_mean_absolute_error: 0.1727

Best loss of epoch:                                                                        
0.06440287490519031                                                                        
Train on 4507301 samples, validate on 237227 samples                                       
Epoch 1/5                                                                                  
 - 875s - loss: 0.0791 - mean_absolute_error: 0.2067 - val_loss: 0.0690 - val_mean_absolute_error: 0.1905

Epoch 2/5                                                                                  
 - 866s - loss: 0.0681 - mean_absolute_error: 0.1879 - val_loss: 0.0689 - val_mean_absolute_error: 0.1919

Epoch 3/5                                                                                  
 - 867s - loss: 0.0667 - mean_absol

 - 981s - loss: 0.0666 - mean_absolute_error: 0.1839 - val_loss: 0.0686 - val_mean_absolute_error: 0.1853

Epoch 4/5                                                                                  
 - 979s - loss: 0.0659 - mean_absolute_error: 0.1823 - val_loss: 0.0654 - val_mean_absolute_error: 0.1854

Epoch 5/5                                                                                  
 - 979s - loss: 0.0655 - mean_absolute_error: 0.1813 - val_loss: 0.0639 - val_mean_absolute_error: 0.1769

Best loss of epoch:                                                                        
0.06392346696624854                                                                        
Train on 4507301 samples, validate on 237227 samples                                       
Epoch 1/5                                                                                  
 - 988s - loss: 0.0794 - mean_absolute_error: 0.2068 - val_loss: 0.0694 - val_mean_absolute_error: 0.1932

Epoch 2/5           

Epoch 2/5                                                                                   
 - 1137s - loss: 0.0675 - mean_absolute_error: 0.1863 - val_loss: 0.0648 - val_mean_absolute_error: 0.1792

Epoch 3/5                                                                                   
 - 1158s - loss: 0.0662 - mean_absolute_error: 0.1834 - val_loss: 0.0656 - val_mean_absolute_error: 0.1834

Epoch 4/5                                                                                   
 - 2812s - loss: 0.0656 - mean_absolute_error: 0.1818 - val_loss: 0.0645 - val_mean_absolute_error: 0.1812

Epoch 5/5                                                                                   
 - 1564s - loss: 0.0653 - mean_absolute_error: 0.1810 - val_loss: 0.0653 - val_mean_absolute_error: 0.1748

Best loss of epoch:                                                                         
0.0644506305758485                                                                          
Train on 4

0.06535769468491341                                                                        
Train on 4507301 samples, validate on 237227 samples                                       
Epoch 1/5                                                                                  
 - 1409s - loss: 0.0769 - mean_absolute_error: 0.2025 - val_loss: 0.0669 - val_mean_absolute_error: 0.1826

Epoch 2/5                                                                                  
 - 1434s - loss: 0.0683 - mean_absolute_error: 0.1877 - val_loss: 0.0666 - val_mean_absolute_error: 0.1814

Epoch 3/5                                                                                  
 - 1365s - loss: 0.0676 - mean_absolute_error: 0.1857 - val_loss: 0.0673 - val_mean_absolute_error: 0.1849

Epoch 4/5                                                                                  
 - 1371s - loss: 0.0675 - mean_absolute_error: 0.1849 - val_loss: 0.0659 - val_mean_absolute_error: 0.1849

Epoch 5/5       

In [8]:
print("Hyper-parameters of best model in order:")

print(best_run)

Hyper-parameters of best model in order:
{'Dense': 22, 'Dense_1': 24, 'Dense_2': 20, 'Dense_3': 'fifth_layer', 'Dense_4': 18, 'Dense_5': 'sixth_layer', 'Dense_6': 16, 'Dense_7': 6, 'Dense_8': 8, 'activationChoice': 'PReLU'}
