# Machine Learning with H2O - Tutorial 3c: Regression Models (Ensembles)

<hr>

**Objective**:

- This tutorial explains how to create stacked ensembles of regression models for better out-of-bag performance.

<hr>

**Wine Quality Dataset:**

- Source: https://archive.ics.uci.edu/ml/datasets/Wine+Quality
- CSV (https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv)

<hr>
    
**Steps**:

1. Build GBM models using random grid search and extract the best one.
2. Build DRF models using random grid search and extract the best one. 
3. Build DNN models using random grid search and extract the best one.
4. Use model stacking to combining different models.


<hr>

**Full Technical Reference:**

- http://docs.h2o.ai/h2o/latest-stable/h2o-docs/booklets/RBooklet.pdf
- http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/stacked-ensembles.html

<br>


In [1]:
# Start and connect to a local H2O cluster
suppressPackageStartupMessages(library(h2o))
h2o.init(nthreads = -1)


H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /tmp/RtmpddUCFh/h2o_joe_started_from_r.out
    /tmp/RtmpddUCFh/h2o_joe_started_from_r.err


Starting H2O JVM and connecting: .. Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         1 seconds 905 milliseconds 
    H2O cluster version:        3.10.3.5 
    H2O cluster version age:    10 days  
    H2O cluster name:           H2O_started_from_R_joe_qbs574 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   5.21 GB 
    H2O cluster total cores:    8 
    H2O cluster allowed cores:  8 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    R Version:                  R version 3.3.2 (2016-10-31) 



<br>

In [2]:
# Import wine quality data from a local CSV file
wine = h2o.importFile("winequality-white.csv")
head(wine, 5)



fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
7.0,0.27,0.36,20.7,0.045,45,170,1.001,3.0,0.45,8.8,6
6.3,0.3,0.34,1.6,0.049,14,132,0.994,3.3,0.49,9.5,6
8.1,0.28,0.4,6.9,0.05,30,97,0.9951,3.26,0.44,10.1,6
7.2,0.23,0.32,8.5,0.058,47,186,0.9956,3.19,0.4,9.9,6
7.2,0.23,0.32,8.5,0.058,47,186,0.9956,3.19,0.4,9.9,6


In [3]:
# Define features (or predictors)
features = colnames(wine)  # we want to use all the information
features = setdiff(features, 'quality')    # we need to exclude the target 'quality'
features

In [4]:
# Split the H2O data frame into training/test sets
# so we can evaluate out-of-bag performance
wine_split = h2o.splitFrame(wine, ratios = 0.8, seed = 1234)

wine_train = wine_split[[1]] # using 80% for training
wine_test = wine_split[[2]]  # using the rest 20% for out-of-bag evaluation

In [5]:
dim(wine_train)

In [6]:
dim(wine_test)

<br>

## Define Search Criteria for Random Grid Search

In [7]:
# define the criteria for random grid search
search_criteria = list(strategy = "RandomDiscrete",
                       max_models = 9,
                       seed = 1234)

<br>

## Step 1: Build GBM Models using Random Grid Search and Extract the Best Model

In [8]:
# define the range of hyper-parameters for GBM grid search
# 27 combinations in total
hyper_params <- list(
    sample_rate = c(0.7, 0.8, 0.9),
    col_sample_rate = c(0.7, 0.8, 0.9),
    max_depth = c(3, 5, 7)
)

In [9]:
# Set up GBM grid search
# Add a seed for reproducibility
# Set up GBM grid search
# Add a seed for reproducibility
gbm_rand_grid <- h2o.grid(
  
    # Core parameters for model training
    x = features,
    y = 'quality',
    training_frame = wine_train,
    ntrees = 10000,
    nfolds = 5,
    seed = 1234,

    # Parameters for grid search
    grid_id = "gbm_rand_grid",
    hyper_params = hyper_params,
    algorithm = "gbm",
    search_criteria = search_criteria,

    # Parameters for early stopping
    stopping_metric = "MSE",
    stopping_rounds = 15,
    score_tree_interval = 1,
    
    # Parameters required for stacked ensembles
    fold_assignment = "Modulo",
    keep_cross_validation_predictions = TRUE
  
)



In [10]:
# Sort and show the grid search results
gbm_rand_grid <- h2o.getGrid(grid_id = "gbm_rand_grid", sort_by = "mse", decreasing = FALSE)
print(gbm_rand_grid)

H2O Grid Details

Grid ID: gbm_rand_grid 
Used hyper parameters: 
  -  col_sample_rate 
  -  max_depth 
  -  sample_rate 
Number of models: 9 
Number of failed models: 0 

Hyper-Parameter Search Summary: ordered by increasing mse
  col_sample_rate max_depth sample_rate             model_ids
1             0.9         7         0.9 gbm_rand_grid_model_5
2             0.8         7         0.7 gbm_rand_grid_model_4
3             0.7         7         0.7 gbm_rand_grid_model_1
4             0.9         7         0.7 gbm_rand_grid_model_6
5             0.7         5         0.8 gbm_rand_grid_model_0
6             0.8         3         0.9 gbm_rand_grid_model_7
7             0.7         3         0.7 gbm_rand_grid_model_8
8             0.9         3         0.9 gbm_rand_grid_model_2
9             0.8         3         0.8 gbm_rand_grid_model_3
                  mse
1 0.41467703216892454
2  0.4188744246328386
3 0.42294704197026883
4  0.4285238866231086
5 0.44601214899796604
6 0.46338551281728

In [11]:
# Extract the best model from random grid search
best_gbm_model_id <- gbm_rand_grid@model_ids[[1]] # top of the list
best_gbm_from_rand_grid <- h2o.getModel(best_gbm_model_id)
summary(best_gbm_from_rand_grid)

Model Details:

H2ORegressionModel: gbm
Model Key:  gbm_rand_grid_model_5 
Model Summary: 
  number_of_trees number_of_internal_trees model_size_in_bytes min_depth
1             168                      168              103536         7
  max_depth mean_depth min_leaves max_leaves mean_leaves
1         7    7.00000         13         82    43.80953

H2ORegressionMetrics: gbm
** Reported on training data. **

MSE:  0.09975218
RMSE:  0.3158357
MAE:  0.2350127
RMSLE:  0.04701275
Mean Residual Deviance :  0.09975218



H2ORegressionMetrics: gbm
** Reported on cross-validation data. **
** 5-fold cross-validation on training data (Metrics computed for combined holdout predictions) **

MSE:  0.414677
RMSE:  0.6439542
MAE:  0.4747976
RMSLE:  0.09641845
Mean Residual Deviance :  0.414677


Cross-Validation Metrics Summary: 
                        mean           sd cv_1_valid  cv_2_valid cv_3_valid
mae               0.47480187  0.014146665 0.49126652  0.44133535  0.4956603
mse               0.4

<br>

## Step 2: Build DRF Models using Random Grid Search and Extract the Best Model

In [12]:
# define the range of hyper-parameters for DRF grid search
# 27 combinations in total
hyper_params <- list(
    sample_rate = c(0.5, 0.6, 0.7),
    col_sample_rate_per_tree = c(0.7, 0.8, 0.9),
    max_depth = c(3, 5, 7)
)

In [13]:
# Set up DRF grid search
# Add a seed for reproducibility
drf_rand_grid <- h2o.grid(
  
    # Core parameters for model training
    x = features,
    y = 'quality',
    training_frame = wine_train,
    ntrees = 200,
    nfolds = 5,
    seed = 1234,

    # Parameters for grid search
    grid_id = "drf_rand_grid",
    hyper_params = hyper_params,
    algorithm = "randomForest",
    search_criteria = search_criteria,
    
    # Parameters required for stacked ensembles
    fold_assignment = "Modulo",
    keep_cross_validation_predictions = TRUE
  
)



In [14]:
# Sort and show the grid search results
drf_rand_grid <- h2o.getGrid(grid_id = "drf_rand_grid", sort_by = "mse", decreasing = FALSE)
print(drf_rand_grid)

H2O Grid Details

Grid ID: drf_rand_grid 
Used hyper parameters: 
  -  col_sample_rate_per_tree 
  -  max_depth 
  -  sample_rate 
Number of models: 9 
Number of failed models: 0 

Hyper-Parameter Search Summary: ordered by increasing mse
  col_sample_rate_per_tree max_depth sample_rate             model_ids
1                      0.9         7         0.7 drf_rand_grid_model_5
2                      0.9         7         0.5 drf_rand_grid_model_6
3                      0.8         7         0.5 drf_rand_grid_model_4
4                      0.7         7         0.5 drf_rand_grid_model_1
5                      0.7         5         0.6 drf_rand_grid_model_0
6                      0.9         3         0.7 drf_rand_grid_model_2
7                      0.8         3         0.6 drf_rand_grid_model_3
8                      0.8         3         0.7 drf_rand_grid_model_7
9                      0.7         3         0.5 drf_rand_grid_model_8
                  mse
1 0.48533899185762636
2   0.4

In [15]:
# Extract the best model from random grid search
best_drf_model_id <- drf_rand_grid@model_ids[[1]] # top of the list
best_drf_from_rand_grid <- h2o.getModel(best_drf_model_id)
summary(best_drf_from_rand_grid)

Model Details:

H2ORegressionModel: drf
Model Key:  drf_rand_grid_model_5 
Model Summary: 
  number_of_trees number_of_internal_trees model_size_in_bytes min_depth
1             200                      200              239751         7
  max_depth mean_depth min_leaves max_leaves mean_leaves
1         7    7.00000         70        111    90.26500

H2ORegressionMetrics: drf
** Reported on training data. **
** Metrics reported on Out-Of-Bag training samples **

MSE:  0.4881925
RMSE:  0.6987078
MAE:  0.55672
RMSLE:  0.1038554
Mean Residual Deviance :  0.4881925



H2ORegressionMetrics: drf
** Reported on cross-validation data. **
** 5-fold cross-validation on training data (Metrics computed for combined holdout predictions) **

MSE:  0.485339
RMSE:  0.6966628
MAE:  0.5534049
RMSLE:  0.1036737
Mean Residual Deviance :  0.485339


Cross-Validation Metrics Summary: 
                        mean           sd cv_1_valid cv_2_valid cv_3_valid
mae               0.55340695  0.005552894  0.55971

<br>

## Step 3: Build DNN Models using Random Grid Search and Extract the Best Model

In [16]:
# define the range of hyper-parameters for DNN grid search
# 81 combinations in total
hyper_params <- list(
    activation = c('tanh', 'rectifier', 'maxout'),
    hidden = list(c(50), c(50,50), c(50,50,50)),
    l1 = c(0, 1e-3, 1e-5),
    l2 = c(0, 1e-3, 1e-5)
)

In [17]:
# Set up DNN grid search
# Add a seed for reproducibility
dnn_rand_grid <- h2o.grid(
  
    # Core parameters for model training
    x = features,
    y = 'quality',
    training_frame = wine_train,
    epochs = 20,
    nfolds = 5,
    seed = 1234,

    # Parameters for grid search
    grid_id = "dnn_rand_grid",
    hyper_params = hyper_params,
    algorithm = "deeplearning",
    search_criteria = search_criteria,
    
    # Parameters required for stacked ensembles
    fold_assignment = "Modulo",
    keep_cross_validation_predictions = TRUE
  
)



In [18]:
# Sort and show the grid search results
dnn_rand_grid <- h2o.getGrid(grid_id = "dnn_rand_grid", sort_by = "mse", decreasing = FALSE)
print(dnn_rand_grid)

H2O Grid Details

Grid ID: dnn_rand_grid 
Used hyper parameters: 
  -  activation 
  -  hidden 
  -  l1 
  -  l2 
Number of models: 9 
Number of failed models: 0 

Hyper-Parameter Search Summary: ordered by increasing mse
  activation       hidden     l1     l2             model_ids
1     Maxout [50, 50, 50] 1.0E-5 1.0E-5 dnn_rand_grid_model_3
2  Rectifier     [50, 50] 1.0E-5    0.0 dnn_rand_grid_model_2
3     Maxout     [50, 50]    0.0 1.0E-5 dnn_rand_grid_model_8
4       Tanh [50, 50, 50] 1.0E-5 1.0E-5 dnn_rand_grid_model_7
5     Maxout         [50] 1.0E-5  0.001 dnn_rand_grid_model_6
6       Tanh [50, 50, 50]    0.0 1.0E-5 dnn_rand_grid_model_0
7     Maxout         [50]    0.0    0.0 dnn_rand_grid_model_5
8     Maxout         [50]  0.001    0.0 dnn_rand_grid_model_4
9       Tanh [50, 50, 50]  0.001    0.0 dnn_rand_grid_model_1
                 mse
1 0.5132317444689928
2 0.5147930385440149
3 0.5231170352359251
4 0.5243904925311967
5 0.5257152424817406
6 0.5276392369040392
7 0.5300169

In [19]:
# Extract the best model from random grid search
best_dnn_model_id <- dnn_rand_grid@model_ids[[1]] # top of the list
best_dnn_from_rand_grid <- h2o.getModel(best_dnn_model_id)
summary(best_dnn_from_rand_grid)

Model Details:

H2ORegressionModel: deeplearning
Model Key:  dnn_rand_grid_model_3 
Status of Neuron Layers: predicting quality, regression, gaussian distribution, Quadratic loss, 11,451 weights/biases, 143.8 KB, 81,920 training samples, mini-batch size 1
  layer units   type dropout       l1       l2 mean_rate rate_rms momentum
1     1    11  Input  0.00 %                                              
2     2    50 Maxout  0.00 % 0.000010 0.000010  0.001362 0.000463 0.000000
3     3    50 Maxout  0.00 % 0.000010 0.000010  0.002507 0.000914 0.000000
4     4    50 Maxout  0.00 % 0.000010 0.000010  0.035343 0.053778 0.000000
5     5     1 Linear         0.000010 0.000010  0.000370 0.000208 0.000000
  mean_weight weight_rms mean_bias bias_rms
1                                          
2   -0.002575   0.198772  0.427465 0.066787
3   -0.031066   0.149890  0.957756 0.031819
4   -0.021807   0.144457  0.817316 0.199981
5    0.000514   0.119923  0.022593 0.000000

H2ORegressionMetrics: deeplea

<br>

## Model Stacking

In [20]:
# Define a list of models to be stacked
# i.e. best model from each grid
all_ids = list(best_gbm_model_id, best_drf_model_id, best_dnn_model_id)

In [21]:
# Stack models
# GLM as the default metalearner
ensemble = h2o.stackedEnsemble(x = features,
                               y = 'quality',
                               training_frame = wine_train,
                               model_id = "my_ensemble",
                               base_models = all_ids)



<br>

## Comparison of Model Performance on Test Data

In [22]:
cat('Best GBM model from Grid (MSE) : ', h2o.performance(best_gbm_from_rand_grid, wine_test)@metrics$MSE, "\n")
cat('Best DRF model from Grid (MSE) : ', h2o.performance(best_drf_from_rand_grid, wine_test)@metrics$MSE, "\n")
cat('Best DNN model from Grid (MSE) : ', h2o.performance(best_dnn_from_rand_grid, wine_test)@metrics$MSE, "\n")
cat('Stacked Ensembles        (MSE) : ', h2o.performance(ensemble, wine_test)@metrics$MSE, "\n")

Best GBM model from Grid (MSE) :  0.4013943 
Best DRF model from Grid (MSE) :  0.4781568 
Best DNN model from Grid (MSE) :  0.5543555 
Stacked Ensembles        (MSE) :  0.3989076 


<br>

<br>