Skip to content

2 Models Architectures

Valerio Bonometti edited this page Apr 12, 2020 · 11 revisions

Hyperband Parameters

Maximum Number of Epochs: 30
Hyperband Iterations: 1
Stopping Strategy: Early stopping monitoring validation loss.
                   Minum delta: 0.0001.
                   Patience: 5 epochs.

Best Perfroming Models

Time-Distributed ElasticNet Regression

Hyperparameters

embedding_area_output: 100
embedding_context_output: 150
embedding_days_week_output: 175
embedding_days_year_output: 150
embedding_hours_output: 75

churn_reg_l1_l2: 0.016780656610256604
delta_reg_l1_l2: 2.1613433724429403e-05
surv_sess_reg_l1_l2: 0.00014899005111328035
surv_time_reg_l1_l2: 0.003653678972814141

use_batch_norm: True
dropout_rate: 0.5
use_dropout_spatial: False

optimizer: Adam
batch_size: 256
adam_learning_rate: 0.005197428515718439

Final Architecture


Time-Distributed MultilayerPerceptron

Hyperparameters

embedding_area_output: 175
embedding_context_output: 50
embedding_days_week_output: 175
embedding_days_year_output: 175
embedding_hours_output: 100

layers_global_features: 2

units_layer_0_global_features: 128
units_layer_1_global_features: 128

activation_layer_0_global_features: relu
activation_layer_1_global_features: elu

use_batch_norm: False
dropout_rate: 0.2
use_dropout_spatial: False

optimizer: Adam
batch_size: 256
adam_learning_rate: 0.001570498124718636

Final Architecture


Melchior

All the recurrent layers use long short-term memory cells

Hyperparameters

embedding_area_output: 175
embedding_context_output: 175
embedding_days_week_output: 50
embedding_days_year_output: 125
embedding_hours_output: 125

lstm_layers_days_week: 1
lstm_layers_days_year: 1
lstm_layers_env: 1
lstm_layers_features: 1
lstm_layers_hours: 1
lstm_layers_shared: 1

lstm_units_layer_0_days_week: 50
lstm_units_layer_0_days_year: 125
lstm_units_layer_0_env: 125
lstm_units_layer_0_features: 100
lstm_units_0_hours: 75
lstm_units_layer_0_shared: 100

td_layers_churn: 1
td_layers_days_week_cont: 1
td_layers_days_year_cont: 1
td_layers_delta: 1
td_layers_feat_cont: 1
td_layers_hour_cont: 1
td_layers_shared: 1
td_layers_survival_sess: 1
td_layers_survival_time: 1

td_units_layer_0_churn: 160
td_units_layer_0_days_week_cont: 224
td_units_layer_0_days_year_cont: 160
td_units_layer_0_delta: 128
td_units_layer_0_feat_cont: 192
td_units_layer_0_hour_cont: 64
td_units_layer_0_shared: 128
td_units_layer_0_survival_sess: 192
td_units_layer_0_survival_time: 160

td_activation_layer_0_churn: relu
td_activation_layer_0_days_week_cont: elu
td_activation_layer_0_days_year_cont: lelu
td_activation_layer_0_delta: lelu
td_activation_layer_0_feat_cont: elu
td_activation_layer_0_hour_cont: lelu
td_activation_layer_0_shared: elu
td_activation_layer_0_survival_sess: elu
td_activation_layer_0_survival_time: lelu

use_batch_norm: False
dropout_rate: 0.45
use_dropout_spatial: True

optimizer: Adam
batch_size: 256
adam_learning_rate:  0.0013588361708218421

Final Architecture


Model Characteristics

N.B.

Each model's hyperparameters are the best hyperparameters with respect to the loss on the validation set.

This means that even if Melchior has many more parameters than Time-Distributed Multilayer Perceptron, from the tuning process it resulted that increasing the size of this last one would have not had a beneficial effect on its perfromance.