Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate h2o with lime #40

Closed
mdancho84 opened this issue Sep 28, 2017 · 12 comments
Closed

Integrate h2o with lime #40

mdancho84 opened this issue Sep 28, 2017 · 12 comments

Comments

@mdancho84
Copy link
Contributor

@thomasp85
Thanks for your work getting lime setup in R. Already had several use cases where I implemented h2o and lime together... the two work really well. If you're OK with it, I'll work on an integration to get most of the classes (at least the major h2o models) into lime.
-Matt

@thomasp85
Copy link
Owner

That sounds great. Please do

mdancho84 added a commit to mdancho84/lime that referenced this issue Sep 29, 2017
@mdancho84
Copy link
Contributor Author

mdancho84 commented Sep 30, 2017

PR submitted. It's relatively straightforward with simple predict_model and model_type functions for H2OModel class. Added tests and codecov is at 85%. I did not change any docs or the vignette.

@thomasp85
Copy link
Owner

It’s supposed to be straightforward so that’s perfect🙂

mdancho84 added a commit to mdancho84/lime that referenced this issue Oct 4, 2017
@mdancho84
Copy link
Contributor Author

Hey Thomas, everything should be ready to go with PR42. Let me know if anything else needs to be changed.

@mdancho84
Copy link
Contributor Author

Merge conflict with NEWS.md was resolved. Let me know if anything else. Thanks.

thomasp85 pushed a commit that referenced this issue Oct 16, 2017
@dkincaid
Copy link

dkincaid commented Nov 2, 2017

Did you try this with the H2O XGBoost model? I'm getting a weird error when I try to use it with one. It's working fine for gbm, but not xgboost. I simply changed from calling h2o.gbm to h2o.xgboost. This is H2O version 3.14.0.7. The error seems to be happening inside the H2O code, but I've been unable to decipher what it means.

Here is the error I'm seeing:

java.lang.IllegalArgumentException: Domain must have 2 class labels, but is [] for binomial metrics.
	at hex.ModelMetricsBinomial.make(ModelMetricsBinomial.java:92)
	at hex.ModelMetricsBinomial.make(ModelMetricsBinomial.java:71)
	at hex.tree.xgboost.XGBoostModel.makePreds(XGBoostModel.java:351)
	at hex.tree.xgboost.XGBoostModel.makeMetrics(XGBoostModel.java:301)
	at hex.tree.xgboost.XGBoostModel.score(XGBoostModel.java:462)
	at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:352)
	at water.H2O$H2OCountedCompleter.compute(H2O.java:1263)
	at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
	at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
	at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
	at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
	at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

Error: java.lang.IllegalArgumentException: Domain must have 2 class labels, but is [] for binomial metrics.

@mdancho84
Copy link
Contributor Author

Can you provide a reproducible example. Also maybe set up as a new issue.

@dkincaid
Copy link

dkincaid commented Nov 2, 2017

I will do that. Seems like h2o's xgboost implementation is doing something odd with the response domain. My response variable is TRUE/FALSE, but it's showing up as 1/0 when using predict(). I'll try to put together a small reproducible example and open a new issue.

@NkululekoThangelane
Copy link

@mdancho84 Were you able to intergrate H2o with Lime? Do you know of anyone working on H2O python itergration with Lime.

@mdancho84
Copy link
Contributor Author

Yes, h2o and lime are integrated into Thomas's R lime package so you can directly use h2o model output without creating custom predict_model and model_type functions.

I don't know if @marcotcr's python lime library has a direct integration, but I do know that it is possible to use with H2O. See here: https://marcotcr.github.io/lime/tutorials/Tutorial_H2O_continuous_and_cat.html

@123saga
Copy link

123saga commented Mar 15, 2018

Hi @thomasp85 I have tried lime::explain() on a H2O logistic regression model, with 5 features and nrow(test_data) ~ 10000, it's taking forever, how can I optimize the execution. looping to through 10-100 records every time ? thanks !

My example:
explanation <- explain(as.data.frame(valid_cv_h2o), explainer, n_labels = 1, n_features = 5,kernel_width = 0.5)

@mdancho84
Copy link
Contributor Author

Explaining 10K observations is a lot. Remember, lime is local, not global. Maybe try 40 to get a small sample of what’s going on locally via lime explanation. I recommend a simple correlation analysis for 10K observations to get global relationships.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants