-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add text support #6
Conversation
documentation of data
refactoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a bunch. Overall it looks good!
Can I get you to address the comments as well as making sure it passes travis?
DESCRIPTION
Outdated
hrbrthemes, | ||
magrittr, | ||
purrr, | ||
e1071 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see where e1071 is used? Why move it from suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was in relation with some code I have not pushed. But I am not ready at all, I will move it back to suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE
R/character.R
Outdated
labels = labels, n_labels = n_labels, n_features = number_features_explain, | ||
feature_method = feature_selection_method) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lime method should return a new function rather than do the explanation directly. It does not make much sense for text data but do for e.g. tabular data, and we need to keep the interface consistent. The arguments of the returned function should be the same across methods...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the consistency requirement.
Can you tell me why you do that for data.frame? Is it to memorize some pre treatments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - It needs to be trained on the training data in order to make sensible permutations. Also it makes it possible to do some one-off computations once and not every time an explanation is required...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not super comfortable with the factory pattern on R, much more a Java stuff :-)
Would it be a possible solution to use something like https://cran.r-project.org/web/packages/memoise/ ? Like you do the costly pre treatment, cache the result of the pre treatment, and return the final result. The user recall the function which requires the same pre treatment, this time, there is no costly computation, just retrieve stuff from cache? It is just a suggestion, I have not worked on data.frame part at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work fine for the time consuming steps, but would not solve the need to pre-train the explainer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super exited about that approach as it also adds changes to the API (instead of different calling conventions it is different calling progressions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively would be to have an explicit create_explainer
method that returns a new lime
object that wraps both the model and any preprocessing. This object is then passed on to the lime
method..? I could definitely see the benefits of this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best approach right now is to make the text version compliant with the current approach and then I'll try to come up with an alternative approach in a new branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, I moved all the parameters in the main function, and as you can see in the demo, I use currying to avoid double call ! Will need to put stuff in documentation, but I think it's quite ok that way
R/character.R
Outdated
predict(model, data, type = "prob") | ||
) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the infrastructure around the calling predict a bit, which makes this redundant, but lets merge it in and we can update the the text implementation to match tabular after that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it an external function?
it s a way to make people able to use any algo and not just carret. (even if there is no predict function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep (will push later today). Basically I export a predict_model
generic that people can define for their model class (making it possible to support models with non-conforming predict
methods. The default simply calls predict with the right type argument, but I've also included a method for mlr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTHING TO DO (FOR NOW)
R/permute_cases.R
Outdated
set_names(d, nm = .)}) | ||
|
||
dict_size <- length(tokens) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal preference is to avoid %<>%
inside packages as it is a bit less clear whats going on (it is not a clear assignment and R itself doesn't understand it as such). Can we keep it to using the %>%
pipe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep you are right, there is no big gain in doing that way. Will refactor that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE
R/permute_cases.R
Outdated
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you walk me through the new implementation of the permutation? Is it just a rewrite doing the same thing or is there any change in the fundamentals of the approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far it s a rewrite in a way I can easily manage for a future refactoring. When I try crazy number of permultations (500K) this part is the slowest by far (40s on my i7), I want to remake it using Rcpp, that's why I divided the code that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTHING TO DO (RIGHT NOW)
Regarding Travis, I will refactor until there is no more Notes / Warning. Nothing crazy anyway Edit: DONE |
fix warning + notes
FYI, I use lintr::lint_package() to clean the code style. |
Introduce RCPP
Add tests
improve BoW
Replace stringdist function by custom Rcpp one. Better horizontal (matrix col) scalability
update gitignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give me a heads up when you feel like you're done with the PR, then I'll give it a proper review
R/character.R
Outdated
expect_true(feature_selection_method %in% feature_selection_method()) | ||
expect_gte(number_features_explain, 1) | ||
expect_gte(n_permutations, 1) | ||
expect_gte(kernel_width, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use assertthat for defensive programming. testthat is for unit tests only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the last commit
R/lime.R
Outdated
@@ -46,7 +46,7 @@ | |||
#' | |||
#' @export | |||
lime <- function(x, model, ...) { | |||
UseMethod('lime') | |||
UseMethod("lime") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't change my coding style :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, you are right, I followed the advises of linter...
Do you want me to put single quotes everywhere?
@thomasp85 I think it s ready for a review. In details : Text explanations returned are less rich than those for data.frame. I will add info when perf optimizations will be finished. Even if light, they are usable IRL. Finally, I realized that trees bring something interesting : a limit on the number of features to use without requiring multiple steps or choosing just the biggest weights => better perf for long text with many permutations! I have some code at the end of the demo file. Just waiting for the new API you told me you are going to introduce in the package. |
If it is ok with you I'll merge it in and any improvements and changes will happen in a new branch - ok? |
Yes I think it's good ! |
Thanks. Will begin to do some small refactoring to make it fit into my local branch and then push |
Add text support
Text data
Demo for text explanation