-
Notifications
You must be signed in to change notification settings - Fork 0
2 Models Architectures
Valerio Bonometti edited this page Apr 12, 2020
·
11 revisions
Maximum Number of Epochs: 30
Hyperband Iterations: 1
Stopping Strategy: Early stopping monitoring validation loss.
Minum delta: 0.0001.
Patience: 5 epochs.
Time-Distributed ElasticNet Regression
Hyperparameters
embedding_area_output: 100
embedding_context_output: 150
embedding_days_week_output: 175
embedding_days_year_output: 150
embedding_hours_output: 75
churn_reg_l1_l2: 0.016780656610256604
delta_reg_l1_l2: 2.1613433724429403e-05
surv_sess_reg_l1_l2: 0.00014899005111328035
surv_time_reg_l1_l2: 0.003653678972814141
use_batch_norm: True
dropout_rate: 0.5
use_dropout_spatial: False
optimizer: Adam
batch_size: 256
adam_learning_rate: 0.005197428515718439
Final Architecture
Time-Distributed MultilayerPerceptron
Hyperparameters
embedding_area_output: 175
embedding_context_output: 50
embedding_days_week_output: 175
embedding_days_year_output: 175
embedding_hours_output: 100
layers_global_features: 2
units_layer_0_global_features: 128
units_layer_1_global_features: 128
activation_layer_0_global_features: relu
activation_layer_1_global_features: elu
use_batch_norm: False
dropout_rate: 0.2
use_dropout_spatial: False
optimizer: Adam
batch_size: 256
adam_learning_rate: 0.001570498124718636
Final Architecture
Melchior
All the recurrent layers use long short-term memory cells
Hyperparameters
embedding_area_output: 175
embedding_context_output: 175
embedding_days_week_output: 50
embedding_days_year_output: 125
embedding_hours_output: 125
lstm_layers_days_week: 1
lstm_layers_days_year: 1
lstm_layers_env: 1
lstm_layers_features: 1
lstm_layers_hours: 1
lstm_layers_shared: 1
lstm_units_layer_0_days_week: 50
lstm_units_layer_0_days_year: 125
lstm_units_layer_0_env: 125
lstm_units_layer_0_features: 100
lstm_units_0_hours: 75
lstm_units_layer_0_shared: 100
td_layers_churn: 1
td_layers_days_week_cont: 1
td_layers_days_year_cont: 1
td_layers_delta: 1
td_layers_feat_cont: 1
td_layers_hour_cont: 1
td_layers_shared: 1
td_layers_survival_sess: 1
td_layers_survival_time: 1
td_units_layer_0_churn: 160
td_units_layer_0_days_week_cont: 224
td_units_layer_0_days_year_cont: 160
td_units_layer_0_delta: 128
td_units_layer_0_feat_cont: 192
td_units_layer_0_hour_cont: 64
td_units_layer_0_shared: 128
td_units_layer_0_survival_sess: 192
td_units_layer_0_survival_time: 160
td_activation_layer_0_churn: relu
td_activation_layer_0_days_week_cont: elu
td_activation_layer_0_days_year_cont: lelu
td_activation_layer_0_delta: lelu
td_activation_layer_0_feat_cont: elu
td_activation_layer_0_hour_cont: lelu
td_activation_layer_0_shared: elu
td_activation_layer_0_survival_sess: elu
td_activation_layer_0_survival_time: lelu
use_batch_norm: False
dropout_rate: 0.45
use_dropout_spatial: True
optimizer: Adam
batch_size: 256
adam_learning_rate: 0.0013588361708218421
Final Architecture
N.B.
Each model's hyperparameters are the best hyperparameters with respect to the loss on the validation set.
This means that even if Melchior has many more parameters than Time-Distributed Multilayer Perceptron, from the tuning process it resulted that increasing the size of this last one would have not had a beneficial effect on its perfromance.