Error in explain function with H2O GBM regression model - Error in if (r2 > max) { : missing value where TRUE/FALSE needed #47

andresrcs · 2017-11-10T16:28:49Z

Hi, can you please check again into issue #46 ?
Just for curiosity I tried droping the month.lbl variable and now I dont get the warning message but stil have the same error message even though my training data covers the full feature space.

library(tidyverse)
library(h2o)
library(lime)

dataset_url <- "https://www.dropbox.com/s/t3o1zvzq0t7emz4/sales.RDS?raw=1"
sales_aug <- readRDS(gzcon(url(dataset_url)))

sales_aug <- sales_aug %>% select(-month.lbl) # Dropping factor variable with non full feature range

train <- sales_aug %>% filter(month <= 8)
valid <- sales_aug %>% filter(month == 9)
test <- sales_aug %>% filter(month >= 10)

h2o.init()
h2o.no_progress()
train <- as.h2o(train)
valid <- as.h2o(valid)
test <- as.h2o(test)

y <- "amount"
x <- setdiff(names(train), y)

leaderboard <- h2o.automl(x, y, training_frame = train, validation_frame = valid, leaderboard_frame = test, max_runtime_secs = 30, stopping_metric = "MSE", seed = 12345)
gbm_model <- leaderboard@leader

explainer <- lime(as.data.frame(train), gbm_model, bin_continuous = FALSE)
explanation <- explain(as.data.frame(test[1:5,]), explainer, n_features = 5)
#> Error in if (r2 > max) {: missing value where TRUE/FALSE needed

The text was updated successfully, but these errors were encountered:

thomasp85 · 2017-11-14T13:19:47Z

Can i get you to try it with the latest version of lime from GitHub?

andresrcs · 2017-11-14T14:26:33Z

The previous error message is gone but now there is a new one

explanation <- explain(as.data.frame(test[1:5,]), explainer, n_features = 5)
#> Error in glmnet(x[, c(features, j), drop = FALSE], y, weights = weights,  : x should be a matrix with 2 or more columns

thomasp85 · 2017-11-14T20:11:39Z

Ok, so the reason for that error is quite specific to your dataset. Basically you have a single column (index.num) whose range is so extreme that, due to the fact that you do not bin continuous variables, completely dominates your dataset when it comes to calculating the similarity of the permutations. Basically all permutations gets weighted with 0 resulting in errors in the model fit.

Based on the name and the values I would throw that column out unless you have very good reasons to keep it. If you really need it, then either play with the kernel_size parameter or use bin_continuous = TRUE (the latter will give more interpretable explanations anyway)

thomasp85 · 2017-11-14T22:09:21Z

I've added a meaningful error message for cases like yours were the similarity of the permutations to the original observation is zero and a local model cannot be created

thomasp85 added a commit that referenced this issue Nov 14, 2017

Fix #45 and #47

fbfdbad

thomasp85 closed this as completed Nov 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in explain function with H2O GBM regression model - Error in if (r2 > max) { : missing value where TRUE/FALSE needed #47

Error in explain function with H2O GBM regression model - Error in if (r2 > max) { : missing value where TRUE/FALSE needed #47

andresrcs commented Nov 10, 2017 •

edited

Loading

thomasp85 commented Nov 14, 2017

andresrcs commented Nov 14, 2017

thomasp85 commented Nov 14, 2017

thomasp85 commented Nov 14, 2017

Error in explain function with H2O GBM regression model - Error in if (r2 > max) { : missing value where TRUE/FALSE needed #47

Error in explain function with H2O GBM regression model - Error in if (r2 > max) { : missing value where TRUE/FALSE needed #47

Comments

andresrcs commented Nov 10, 2017 • edited Loading

thomasp85 commented Nov 14, 2017

andresrcs commented Nov 14, 2017

thomasp85 commented Nov 14, 2017

thomasp85 commented Nov 14, 2017

andresrcs commented Nov 10, 2017 •

edited

Loading