Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lime function with date columns #39

Closed
ArunSharma93 opened this issue Sep 27, 2017 · 10 comments
Closed

lime function with date columns #39

ArunSharma93 opened this issue Sep 27, 2017 · 10 comments

Comments

@ArunSharma93
Copy link

ArunSharma93 commented Sep 27, 2017

The explain function errors whenever I have a date column in my dataset. This is a minor issue but I thought I should flag it anyways.

@thomasp85
Copy link
Owner

That is intentional - I have no idea how dates should be sampled in any meaningful way for the permutations

@ArunSharma93
Copy link
Author

Understood. I guess I have the issue as I have a time series model, which I understand was not what LIME was designed for, but I agree with your reasoning.

Thank you for supporting the package btw!

@thomasp85
Copy link
Owner

thomasp85 commented Sep 28, 2017

Hmm - it might make sense to just hold time constant across permutations so it will give insight into why, at this time, the model behaves as it does..?

I’ll reopen and give it some more thoughts

@thomasp85 thomasp85 reopened this Sep 28, 2017
@ArunSharma93 ArunSharma93 changed the title lime function with non-numeric columns lime function with date columns Sep 28, 2017
@ArunSharma93
Copy link
Author

I'd personally recommend converting the date column to numeric. A date class is a feature to represent time, and time should be better explained by a numeric space that contains the intrinsic relationship from one point to the next, over categories that do not hold this information. Techniques like RandomForest, XGBoost, and even linear regression convert dates to numeric, as if the user wanted to convey date via a category, it should be classed as a character/factor already.

However on the topic of time series, are there any consequences when using LIME and a time series model? I understand LIME was built in mind for stationary models (for instance, decisions trees), but could LIME's sampling technique produce misleading results?

@thomasp85
Copy link
Owner

What I'm suggesting is to hold the Date column constant, not converting it to something else. My rationale is that you're often not interested in knowing that your model is time-dependent; that lies implicit in time series. Instead you are more interested in knowing how the different variables, as they are on this specific time, have contributed.

@ArunSharma93
Copy link
Author

Ah yes, I understand what you mean now. This makes perfect sense to me now as LIME is sampling around the variables for a given epoch, and therefore it would make best sense to keep time static when doing so. Nice idea!

Any idea of when this could be implemented?

@thomasp85
Copy link
Owner

I won't make any promises but it could probably be included in the next update, due within the next couple of months

@thomasp85
Copy link
Owner

Do you have a dummy model and data I can play with? Don't really have any real-life timeseries data to validate with...

@thomasp85
Copy link
Owner

FYI the feature is being implemented in the date-support branch

@ArunSharma93
Copy link
Author

ArunSharma93 commented Nov 21, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants