# Final Step Hyperparametertuning

Fine tuning the model for the 4square dataset.

## Imports

In [1]:
import collections
import functools
import os
import time

import numpy as np
import tensorflow as tf
import pandas as pd
import numpy as np

from tensorflow import feature_column
from tensorflow.keras import layers
import keras_tuner as kt
from sklearn.model_selection import train_test_split
from tqdm import tqdm
import import_ipynb

In [2]:
from model_helper import ModelHelper

importing Jupyter notebook from model_helper.ipynb


## DataSet

This time, only the zones used in the physical test with flowers are used for training.

In [3]:
df = pd.read_csv("./4square/processed_transformed.csv")
df.head(10)

Unnamed: 0,cat_id,user_id,clock_sin,clock_cos,day_sin,day_cos,month_sin,month_cos,week_day_sin,week_day_cos
0,0,470,-1.0,0.000654,0.587785,0.809017,0.866025,-0.5,0.781831,0.62349
1,1,979,-0.999998,0.001818,0.587785,0.809017,0.866025,-0.5,0.781831,0.62349
2,2,69,-0.999945,0.010472,0.587785,0.809017,0.866025,-0.5,0.781831,0.62349
3,3,395,-0.999931,0.011708,0.587785,0.809017,0.866025,-0.5,0.781831,0.62349
4,4,87,-0.999914,0.01309,0.587785,0.809017,0.866025,-0.5,0.781831,0.62349
5,5,484,-0.999848,0.017452,0.587785,0.809017,0.866025,-0.5,0.781831,0.62349
6,6,642,-0.999796,0.020215,0.587785,0.809017,0.866025,-0.5,0.781831,0.62349
7,7,292,-0.99979,0.020506,0.587785,0.809017,0.866025,-0.5,0.781831,0.62349
8,2,428,-0.999622,0.027485,0.587785,0.809017,0.866025,-0.5,0.781831,0.62349
9,8,877,-0.99962,0.027558,0.587785,0.809017,0.866025,-0.5,0.781831,0.62349


In [4]:
# the number of different locations defines the vocabulary size
categories = df.cat_id
vocab_size = categories.nunique()

print('vocabulary size:', vocab_size)

vocabulary size: 27


Init the ModelHelper and set all needed parameters such as the different column_names and the vocab_size.

In [5]:
mh = ModelHelper(df, 17)

In [6]:
mh.set_target_column_name('cat_id')
mh.set_vocab_size(vocab_size)

numerical_column_names = ['clock_sin', 'clock_cos', 'day_sin', 'day_cos', 'month_sin', 'month_cos', 'week_day_sin', 'week_day_cos']
column_names = ['cat_id'] + numerical_column_names
mh.set_column_names(column_names)
mh.set_numerical_column_names(numerical_column_names)

mh.set_client_column_name('user_id')
mh.set_client_column_ids()

In [7]:
count = df.user_id.value_counts()

idx = count.loc[count.index[:100]].index # count >= 100
df = df.loc[df.user_id.isin(idx)]

An array is created containing all visited locations for every user.
The original data is sorted by time (ascending).
Thus, the array contains a sequence of visited locations by user.

In [8]:
mh.create_users_locations_from_df()

100%|██████████| 1083/1083 [00:00<00:00, 2276.45it/s]


[       cat_id  user_id  clock_sin  clock_cos       day_sin   day_cos  \
 0           0      470  -1.000000   0.000654  5.877853e-01  0.809017   
 626        11      470  -0.102069   0.994777  5.877853e-01  0.809017   
 650        11      470  -0.057709   0.998333  5.877853e-01  0.809017   
 652        11      470  -0.055531   0.998457  5.877853e-01  0.809017   
 654        11      470  -0.053135   0.998587  5.877853e-01  0.809017   
 ...       ...      ...        ...        ...           ...       ...   
 60782       0      470  -1.000000   0.000800  7.431448e-01 -0.669131   
 64836      22      470  -0.966170  -0.257906  4.067366e-01 -0.913545   
 64980       0      470  -1.000000   0.000873  4.067366e-01 -0.913545   
 67196       0      470  -1.000000   0.000800  2.079117e-01 -0.978148   
 69288       0      470  -1.000000   0.000945  5.665539e-16 -1.000000   
 
        month_sin  month_cos  week_day_sin  week_day_cos  
 0       0.866025  -0.500000      0.781831      0.623490  
 626

It is necessary to first split the data in train, valid and test for each user.
Then, these are merged together again later on.
This is done to ensure that the sequences are kept together and not split randomly for the users.

In [9]:
mh.concat_split_users_locations()

In [10]:
print(len(mh.df_train), 'train examples')
print(len(mh.df_val), 'validation examples')
print(len(mh.df_test), 'test examples')

144763 train examples
36729 validation examples
45936 test examples


Split the data and create the batch datasets.

In [11]:
#mh.split_data_sliding()
mh.split_data()
print(len(mh.list_test))
mh.list_test[0]

2703


Unnamed: 0,cat_id,clock_sin,clock_cos,day_sin,day_cos,month_sin,month_cos,week_day_sin,week_day_cos
31429,23,0.27032,0.96277,-0.994522,0.104528,0.866025,-0.5,0.0,1.0
31480,23,0.439613,0.898187,-0.994522,0.104528,0.866025,-0.5,0.0,1.0
31728,22,0.953323,-0.301954,-0.994522,0.104528,0.866025,-0.5,0.0,1.0
31729,22,0.952971,-0.303063,-0.994522,0.104528,0.866025,-0.5,0.0,1.0
31916,22,0.057782,-0.998329,-0.994522,0.104528,0.866025,-0.5,0.0,1.0
32041,22,-0.153561,-0.988139,-0.994522,0.104528,0.866025,-0.5,0.0,1.0
32044,22,-0.158086,-0.987425,-0.994522,0.104528,0.866025,-0.5,0.0,1.0
32045,22,-0.164976,-0.986298,-0.994522,0.104528,0.866025,-0.5,0.0,1.0
32052,22,-0.170066,-0.985433,-0.994522,0.104528,0.866025,-0.5,0.0,1.0
32055,22,-0.173147,-0.984896,-0.994522,0.104528,0.866025,-0.5,0.0,1.0


In [12]:
mh.set_batch_size(16)
mh.create_and_batch_datasets(multi_target=False)

For keras tuner to work properly the create model function has to be changed.
It will now get `hp` which is a reference to a container for hyperparameter spaces of keras tuner.
This way, it is possible to create new hyperparameter spaces that keras tuner then can utilize.
Three spaces are defined in the create_keras_model function, one for the size of the embedding layer and two for the number of RNN-Units.

In [13]:
# Create a model
def create_keras_model(hp):
  N = mh.total_window_length
  batch_size = mh.batch_size
  number_of_places = mh.vocab_size

  # Shortcut to the layers package
  l = tf.keras.layers

  # List of numeric feature columns to pass to the DenseLayer
  numeric_feature_columns = []

  # Handling numerical columns
  for header in numerical_column_names:
	# Append all the numerical columns defined into the list
    numeric_feature_columns.append(feature_column.numeric_column(header, shape=N-1))

  feature_inputs={}
  for c_name in numerical_column_names:
    feature_inputs[c_name] = tf.keras.Input((N-1,), batch_size=batch_size, name=c_name)

  # We cannot use an array of features as always because we have sequences
  # We have to do one by one in order to match the shape
  num_features = []
  for c_name in numerical_column_names:
    f =  feature_column.numeric_column(c_name, shape=(N-1))
    feature = l.DenseFeatures(f)(feature_inputs)
    feature = tf.expand_dims(feature, -1)
    num_features.append(feature)

  # Declare the dictionary for the locations sequence as before
  sequence_input = {
      'cat_id': tf.keras.Input((N-1,), batch_size=batch_size, dtype=tf.dtypes.int32, name='cat_id') # add batch_size=batch_size in case of stateful GRU
  }

  # Handling the categorical feature sequence using one-hot
  cat_one_hot = feature_column.sequence_categorical_column_with_vocabulary_list(
      'cat_id', [i for i in range(number_of_places)])

  # one-hot encoding
  hp_embedding = hp.Int('embedding', min_value=16, max_value=256, step=48)
  cat_feature = feature_column.embedding_column(cat_one_hot, hp_embedding)

  # With an input sequence we can't use the DenseFeature layer, we need to use the SequenceFeatures
  sequence_features, sequence_length = tf.keras.experimental.SequenceFeatures(cat_feature)(sequence_input)


  input_sequence = l.Concatenate(axis=2)([sequence_features] + num_features)

  # Rnn
  hp_rnn_units1 = hp.Int('rnn_units1', min_value=32, max_value=512, step=32)
  recurrent = l.GRU(hp_rnn_units1,
                    batch_size=batch_size, #in case of stateful
                    return_sequences=True,
                    stateful=True,
                    recurrent_initializer='glorot_uniform')(input_sequence)

  hp_rnn_units2 = hp.Int('rnn_units2', min_value=32, max_value=512, step=32)
  recurrent_2 = l.GRU(hp_rnn_units2,
                      batch_size=batch_size, #in case of stateful
                      stateful=True,
                      recurrent_initializer='glorot_uniform')(recurrent)


  # Softmax output layer
  # Last layer with an output for each places
  output = layers.Dense(number_of_places, activation='softmax')(recurrent_2)


  # To return the Model, we need to define its inputs and outputs
  # In out case, we need to list all the input layers we have defined
  inputs = list(feature_inputs.values()) + list(sequence_input.values())

  # Return the Model
  return tf.keras.Model(inputs=inputs, outputs=output)

Before compiling the model another hyperparameter space is added

In [14]:
def compile_keras_model(hp, model):
    hp_lr = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])
    hp_optimizer = hp.Choice('optimizer', values=["adam", "sgd", "adamax"])
    optimizer = tf.keras.optimizers.get(hp_optimizer)
    optimizer.learning_rate.assign(hp_lr)
    # Compile the model
    model.compile(optimizer=optimizer,
                  loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                  metrics=tf.keras.metrics.SparseCategoricalAccuracy())
    return model

In [15]:
mh.create_tuner(create_keras_model, compile_keras_model)

INFO:tensorflow:Reloading Oracle from existing project keras_tuner\mobility_prediction\oracle.json
INFO:tensorflow:Reloading Tuner from keras_tuner\mobility_prediction\tuner0.json


In [23]:
mh.tuner_search()

Trial 42 Complete [00h 01m 35s]
val_sparse_categorical_accuracy: 0.18287037312984467

Best val_sparse_categorical_accuracy So Far: 0.2606481611728668
Total elapsed time: 00h 24m 12s
INFO:tensorflow:Oracle triggered exit


In [17]:
print(f"""
The hyperparameter search is complete.
The optimal number of rnn units in the first GRU layer is {mh.best_hps.get('rnn_units1')}.
The optimal number of rnn units in the second GRU layer is {mh.best_hps.get('rnn_units2')}.
The optimal optimizer is {mh.best_hps.get('optimizer')}.
The optimal learning rate for the optimizer is {mh.best_hps.get('learning_rate')}.
The optimal embedding dimensionality is {mh.best_hps.get('embedding')}.
""")


The hyperparameter search is complete.
The optimal number of rnn units in the first GRU layer is 224.
The optimal number of rnn units in the second GRU layer is 128.
The optimal optimizer is sgd.
The optimal learning rate for the optimizer is 0.01.
The optimal embedding dimensionality is 64.



In [18]:
mh.tuner_find_best_epoch()

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [19]:
mh.tuner_eval_model()

Epoch 1/19
Epoch 2/19
Epoch 3/19
Epoch 4/19
Epoch 5/19
Epoch 6/19
Epoch 7/19
Epoch 8/19
Epoch 9/19
Epoch 10/19
Epoch 11/19
Epoch 12/19
Epoch 13/19
Epoch 14/19
Epoch 15/19
Epoch 16/19
Epoch 17/19
Epoch 18/19
Epoch 19/19


In [20]:
mh.model.summary()

Model: "functional_5"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
clock_cos (InputLayer)          [(16, 16)]           0                                            
__________________________________________________________________________________________________
clock_sin (InputLayer)          [(16, 16)]           0                                            
__________________________________________________________________________________________________
day_cos (InputLayer)            [(16, 16)]           0                                            
__________________________________________________________________________________________________
day_sin (InputLayer)            [(16, 16)]           0                                            
_______________________________________________________________________________________

In [21]:
mh.set_num_epochs(15)
mh.fit_model()

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 00007: early stopping


In [22]:
mh.evaluate_model()



After training the model with the optimized parameters the accuracy is very similar to the other models that used the same parameters as the preceding work.
Since those values are already the result of hyperparameter tuning it can be assumed that any further fine-tuning will not result in a substantial improvement.