New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error during explain with H2O GBM/XGB models - "NA/NaN/Inf in 'x'" #45
Comments
I have found that I can use |
Where do you get wellness data from? and how is your model made? I would need it to reproduce your example... |
Unfortunately I can't share that data, but let me try to put together a reproducible example with some publicly available data. I was hoping that something I was doing would look wrong. |
Not obviously so - I’ll look into it if you can make a reprex |
Here is a reproducible example using the Iris data. I'm showing a successful run with the full iris data frame and then run the same thing against a data frame where I randomly set some values to NA. Hopefully this gives you something to work with. I appreciate you taking a look at it. # Create a data frame from the Iris data and randomly set some values to NA
myIris <- purrr::map_df(iris[,-5], function(x) {x[sample(c(TRUE, NA), prob = c(0.8, 0.2), size = length(x), replace = TRUE)]})
myIris <- cbind(myIris, Species=iris$Species)
library(h2o)
h2o.init()
# First show that it's successful without any missing data
full_iris_frame <- as.h2o(iris)
full_mdl <- h2o.gbm(training_frame = full_iris_frame, y = "Species")
full_explainer <- lime::lime(dplyr::select(as.data.frame(full_iris_frame), -Species), full_mdl)
full_explanation <- lime::explain(dplyr::select(as.data.frame(full_iris_frame)[1:4,], -Species),
full_explainer, n_labels = 3 , n_features = 3)
# Now try to run it on the data that has some missing values
iris_frame <- as.h2o(myIris)
mdl <- h2o.gbm(training_frame = iris_frame,
y = "Species")
explainer <- lime::lime(dplyr::select(as.data.frame(iris_frame), -Species), mdl)
explanation <- lime::explain(dplyr::select(as.data.frame(iris_frame)[1:4,], -Species),
explainer, n_labels = 3 , n_features = 3) |
So, the support for |
That is fantastic! Thanks for such a quick fix. I really love this package. |
I'm trying to use the package with an H2O xgboost model (I've also tried it with GBM and get the same thing. The error is:
Here is the code I'm running:
This is caused by having some NA values in the data frame, but I thought that this had already been fixed in issue #8. I verified this by removing the three columns that have NA values as a test. These NA values are meaningful and H2O's GBM and XGBoost handle them by creating a category for the missing value after binning the unmissing feature values. Is there any easy fix here?
The text was updated successfully, but these errors were encountered: