Error during explain with H2O GBM/XGB models - "NA/NaN/Inf in 'x'" #45

dkincaid · 2017-11-01T03:11:25Z

I'm trying to use the package with an H2O xgboost model (I've also tried it with GBM and get the same thing. The error is:

Error in glm.fit(x = x_fit, y = y, weights = weights, family = gaussian()) : 
  NA/NaN/Inf in 'x'

Here is the code I'm running:

explainer <- lime::lime(as.data.frame(wellnessTrain), mdl)

explanation <- lime::explain(as.data.frame(wellnessTest),
                       explainer, n_labels = 1, n_features = 2)

This is caused by having some NA values in the data frame, but I thought that this had already been fixed in issue #8. I verified this by removing the three columns that have NA values as a test. These NA values are meaningful and H2O's GBM and XGBoost handle them by creating a category for the missing value after binning the unmissing feature values. Is there any easy fix here?

The text was updated successfully, but these errors were encountered:

dkincaid · 2017-11-02T02:41:46Z

I have found that I can use feature_select="tree", but none of the other possible types.

thomasp85 · 2017-11-14T13:22:07Z

Where do you get wellness data from? and how is your model made? I would need it to reproduce your example...

dkincaid · 2017-11-14T20:17:09Z

Unfortunately I can't share that data, but let me try to put together a reproducible example with some publicly available data. I was hoping that something I was doing would look wrong.

thomasp85 · 2017-11-14T20:18:03Z

Not obviously so - I’ll look into it if you can make a reprex

dkincaid · 2017-11-14T20:53:18Z

Here is a reproducible example using the Iris data. I'm showing a successful run with the full iris data frame and then run the same thing against a data frame where I randomly set some values to NA. Hopefully this gives you something to work with. I appreciate you taking a look at it.

# Create a data frame from the Iris data and randomly set some values to NA
myIris <- purrr::map_df(iris[,-5], function(x) {x[sample(c(TRUE, NA), prob = c(0.8, 0.2), size = length(x), replace = TRUE)]})

myIris <- cbind(myIris, Species=iris$Species)

library(h2o)
h2o.init()

# First show that it's successful without any missing data
full_iris_frame <- as.h2o(iris)
full_mdl <- h2o.gbm(training_frame = full_iris_frame, y = "Species")

full_explainer <- lime::lime(dplyr::select(as.data.frame(full_iris_frame), -Species), full_mdl)

full_explanation <- lime::explain(dplyr::select(as.data.frame(full_iris_frame)[1:4,], -Species),
                             full_explainer, n_labels = 3 , n_features = 3)

# Now try to run it on the data that has some missing values
iris_frame <- as.h2o(myIris)
mdl <- h2o.gbm(training_frame = iris_frame,
               y = "Species")

explainer <- lime::lime(dplyr::select(as.data.frame(iris_frame), -Species), mdl)

explanation <- lime::explain(dplyr::select(as.data.frame(iris_frame)[1:4,], -Species),
                             explainer, n_labels = 3 , n_features = 3)

thomasp85 · 2017-11-14T22:07:55Z

So, the support for NA implemented earlier were only considering NA values in the training data - not NA values in new observations to explain. I've just pushed an update that ignores NA columns in new observations so that you don't get the error.

dkincaid · 2017-11-14T22:14:01Z

That is fantastic! Thanks for such a quick fix. I really love this package.

thomasp85 closed this as completed in fbfdbad Nov 14, 2017

dkincaid mentioned this issue Dec 1, 2017

Problems with missing values (NA) with caret created GBM model #58

Closed

j-ghatak mentioned this issue Dec 11, 2019

Exactly with same data GBM&Lime explain() works fine, but failing for RandomForest and CRF with NA/NaN/Inf in 'y' #172

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error during explain with H2O GBM/XGB models - "NA/NaN/Inf in 'x'" #45

Error during explain with H2O GBM/XGB models - "NA/NaN/Inf in 'x'" #45

dkincaid commented Nov 1, 2017 •

edited

Loading

dkincaid commented Nov 2, 2017

thomasp85 commented Nov 14, 2017

dkincaid commented Nov 14, 2017

thomasp85 commented Nov 14, 2017

dkincaid commented Nov 14, 2017 •

edited

Loading

thomasp85 commented Nov 14, 2017

dkincaid commented Nov 14, 2017

Error during explain with H2O GBM/XGB models - "NA/NaN/Inf in 'x'" #45

Error during explain with H2O GBM/XGB models - "NA/NaN/Inf in 'x'" #45

Comments

dkincaid commented Nov 1, 2017 • edited Loading

dkincaid commented Nov 2, 2017

thomasp85 commented Nov 14, 2017

dkincaid commented Nov 14, 2017

thomasp85 commented Nov 14, 2017

dkincaid commented Nov 14, 2017 • edited Loading

thomasp85 commented Nov 14, 2017

dkincaid commented Nov 14, 2017

dkincaid commented Nov 1, 2017 •

edited

Loading

dkincaid commented Nov 14, 2017 •

edited

Loading