-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using lime() on xgboost object #1
Comments
Thanks for the report – I'll take a look. Keep in mind that |
Two things: currently lime only works with classifiers not regressors. Also, on my system I get an error when running |
Thanks for taking a look at this!
xgbDMatrix.obj<- xgb.DMatrix(data=x,label = y) mod = xgboost(data = x,label = y,nrounds = 100,objective = "binary:logistic") # Variant 2 using a regular matrix + vector as data input
|
Sorry for letting this hang - I was pulled away by higher priority stuff. Several problems in your example: lime takes data.frame input in order to ensure that variables are named. I might add a default coercion to data.frame which would make matrices work, but still unsure whether this is a good idea. The predict function does not return a data.frame containing the probabilities of each class, which is the required predict output... I would generally recommend using xgboost through the caret package as this is the target API lime is coded up against (soon to be joined be mlr) |
Thank you for getting back to me. So I tried running this from caret instead, and I got something up and running. However, when moving to real data I ran into some problems: 1: it seems that lime(x=x,...) cannot handle when x contains missing data. It throws the error: Error in quantile.default(x[[i]], seq(0, 1, length.out = n_bins + 1)) : A workaround is to remove the missing data before passing it lime, but as there is indeed information in the missing data which was used during model training, that is not an optimal solution. 2: even without missing data, I get an error when using bin_continuous=T: Error in cut.default(x[[i]], unique(bin_cuts[[i]]), labels = FALSE, include.lowest = TRUE) : Everything works fine if if put bin_continuous=F with no missing data. 3: When re-runinng the explain-function which is the output from lime(), I get slightly different results every time. I guess this is due to the randomness in the permutations. Is it possible to somewhere control the number of permutations used? I guess that could resolve that issue with the expense of an increase in the running time. 4: Finally if you could point out how I might create my own prediction based on native xgboost, that would be excellent (the caret wrapper does not support all of xgboost's features). Do I need to overwrite the prediction function of the xgboost class? Thank you so much for your support so far! |
|
library(lime) set.seed(123) fitControl <- trainControl(method="none", caretFit.glm <- train(y=y.factor, explain <- lime(x=x,model=caretFit.glm,bin_continuous = T) # Throws error
|
|
Thank you! Regarding 3 I wasn't aware of the n_permutations parameter. That is exactly what I was looking for :) What is the default value? |
5000 I'll try to improve the docs on this in the future. In general R does not have good support for documenting functions created by other functions... |
Thanks! |
Hi, thank you for LIME package, it is so interesting for me. |
@martinju Hi, I have the same 1. and 2. issues with a XGB model. |
@pommedeterresautee The intercept issue is the one that throws the following error: Error in cut.default(x[[i]], unique(bin_cuts[[i]]), labels = FALSE, include.lowest = TRUE) : I figured out just like @martinju that this is an intercept problem, meaning that it appears whenever one of the columns of you data is a constant. I also have a NA issue, the lime() function does not work if your data has NAs, which is problematic for real data. I don't know how I'll come around that, probably create as many "VarXIsNa" variables as needed. Do you think lime will be able to handle NAs soon? Thanks! |
@belariow Can you open a separate issue for the NA support as it is hidden within this xgboost thread |
What about something like this within the
For example:
Just my $0.02 since I a new to this repo and work. Thx for doing this though! |
For instance, on a multiclass XGBoost classification, it wouldn't work as you need first to coerce to a matrix (col: the cat), in this case there are some parameters to add when you call predict. |
Can I get you to confirm that the CRAN release works fine for you? |
Thanks for the work with this. I really think the new predict_model and model_type-functions would be very useful. Reading the help function of these new functions (and the source code) it seems xgboost should work out of the box as predict_model.xgb.Booster is already defined. However, I don't understand how I should use it. Passing something else than a data.frame (or character vector) to the lime()-function just throws an error. Examples:
Do you mind correcting my very simple example? |
You would convert the matrix to a data.frame and then (if needed) convert it back inside predict_model |
We might add direct support for matrices but most models expect data.frame input so I’m unsure whether we can do it gracefully |
XGBoost expects a matrix with its own format ( |
Ok, thank you for the pointers. I got the lime function to pass without error, but the explainer still does not work as xgboost requires a matrix (or xgb.DMatrix) to perform predictions. I added a pull request to fix this (in addition to handling NAs in the training data). See #37 |
Fixed in a74a2ac |
Setting the seed value eliminates randomness but if I run it with different seed values than the result changes. So now the problem is - which result should I trust. If you say it's R-square than most of the times the R-square is nearly the same. |
Hi, and thank you for an excellent package!
I am trying to apply the lime package to a model fitted with xgboost (using the original xgboost package), but the lime function does not seem to accept the input format even if the predict function works fine.
Example using both xgbDmatrix and a regular matrix
x = matrix(rnorm(100*10),ncol=10)
y = rnorm(100)
xgbDMatrix.obj<- xgb.DMatrix(data=x,
label = y)
mod = xgb.train(data = xgbDMatrix.obj,nrounds = 100) # Variant 1, using xgbDMatrix format of data input
#predict(mod,x,type="prob") # works fine
explain <- lime(x=x,model=mod) # Throws error
mod = xgboost(data = x,label = y,nrounds = 100) # Variant 2 using a regular matrix + vector as data input
#predict(mod,x,type="prob") # works fine
explain <- lime(x=x,model=mod) # Throws error
In the readme you mention manually building a predict function.
If that is the solution here, could you please provide some guidelines on how to do that?
The text was updated successfully, but these errors were encountered: