Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tabular perturbation is inaccurate when using discretization #26

Open
TobiasGoerke opened this issue Aug 1, 2019 · 10 comments
Open

Tabular perturbation is inaccurate when using discretization #26

TobiasGoerke opened this issue Aug 1, 2019 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@TobiasGoerke
Copy link
Contributor

The default tabular perturbation function currently takes a random instance and replaces the perturbed instance's values by the non-fixed feature values of the other instance.
The fixed values remain unchanged.
This is inaccurate when using discretization and even the fixed values should randomly change within their discretized class.

@fkoehne
Copy link
Contributor

fkoehne commented Aug 2, 2019

I have not fully thought this through, but yes: it appears that there is more precision to be generated here.

@TobiasGoerke
Copy link
Contributor Author

This issue can also cause problems rather than just inaccuracies. The following situation just occured to me:
Assume there is a decision tree which splits feature A at value 5. The instance to be explained has a value of 6. Feature A is discretized to encompass a range from 4 to 6. Now, each time feature A is fixed the value 6 is passed to the model. Hence, we'll get a high precision even though we are ignoring a decision boundary.

@MagdalenaLang1
Copy link

just a first thought for this example: bad discretization...
the sequential definition of discretization and anchors is a downfall of the approach - in an ideal world, we would optimize bin-borders and anchors simultaneously.

@MagdalenaLang1
Copy link

but to be more productive: I can change the perturbation, but how?
1.) Use value of a random observation that falls in the same class: requires the training set? see issue #39 (and a few values could be reused often..)
2.) Random value within the class: require distribution within the class? (assume uniform?)

@MagdalenaLang1 MagdalenaLang1 self-assigned this Aug 5, 2019
@TobiasGoerke
Copy link
Contributor Author

I'd like to see 1.) implemented. The advantages of the current perturbation approach are that only such values get passed to the model that actually exist in the train set. If possible, we should keep it that way so that we are not dependent on the type of distribution.
However, as of now, two perturbation functions exist: one for original values and one for discretization. Their results would have to be interconnected for approach 1).

@fkoehne
Copy link
Contributor

fkoehne commented Aug 5, 2019

Re: "in an ideal world" - this is similar to what @NoItAll does with the "Magie" approach, right?

@fkoehne
Copy link
Contributor

fkoehne commented Aug 5, 2019

Which Milestone are we heading to, here?

@MagdalenaLang1
Copy link

alternative 1. is good - the simpler one;) implemented it for review

does not work without training set though when having issue #39 in mind (but I guess that has lower prio)

@NoItAll
Copy link

NoItAll commented Aug 5, 2019

Re: "in an ideal world" - this is similar to what @NoItAll does with the "Magie" approach, right?

Hi,
Currently MAGIE does not contain this functionality, - I rather suggested it as an outlook on exemplary additions. This ad-hoc discretization using a genetic algorithm was suggested in a paper I read on classification rule mining.
This would also imply the creation of a new index structure, e.g. a k-d-tree, which can deal with continuous data. The question is whether or not the k-d-tree is adequately fast, as the major speed-up of the rule mining in MAGIE stemmed from using roaring bitmaps.

@cjuers
Copy link

cjuers commented Aug 20, 2019

This issue has been fixed in the AutoTuning branch

cjuers pushed a commit that referenced this issue Sep 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants