Tabular perturbation is inaccurate when using discretization #26

TobiasGoerke · 2019-08-01T12:39:47Z

The default tabular perturbation function currently takes a random instance and replaces the perturbed instance's values by the non-fixed feature values of the other instance.
The fixed values remain unchanged.
This is inaccurate when using discretization and even the fixed values should randomly change within their discretized class.

fkoehne · 2019-08-02T06:26:18Z

I have not fully thought this through, but yes: it appears that there is more precision to be generated here.

TobiasGoerke · 2019-08-02T08:53:53Z

This issue can also cause problems rather than just inaccuracies. The following situation just occured to me:
Assume there is a decision tree which splits feature A at value 5. The instance to be explained has a value of 6. Feature A is discretized to encompass a range from 4 to 6. Now, each time feature A is fixed the value 6 is passed to the model. Hence, we'll get a high precision even though we are ignoring a decision boundary.

MagdalenaLang1 · 2019-08-05T08:15:23Z

just a first thought for this example: bad discretization...
the sequential definition of discretization and anchors is a downfall of the approach - in an ideal world, we would optimize bin-borders and anchors simultaneously.

MagdalenaLang1 · 2019-08-05T08:24:19Z

but to be more productive: I can change the perturbation, but how?
1.) Use value of a random observation that falls in the same class: requires the training set? see issue #39 (and a few values could be reused often..)
2.) Random value within the class: require distribution within the class? (assume uniform?)

TobiasGoerke · 2019-08-05T08:32:27Z

I'd like to see 1.) implemented. The advantages of the current perturbation approach are that only such values get passed to the model that actually exist in the train set. If possible, we should keep it that way so that we are not dependent on the type of distribution.
However, as of now, two perturbation functions exist: one for original values and one for discretization. Their results would have to be interconnected for approach 1).

fkoehne · 2019-08-05T11:04:48Z

Re: "in an ideal world" - this is similar to what @NoItAll does with the "Magie" approach, right?

fkoehne · 2019-08-05T11:05:47Z

Which Milestone are we heading to, here?

MagdalenaLang1 · 2019-08-05T11:36:32Z

alternative 1. is good - the simpler one;) implemented it for review

does not work without training set though when having issue #39 in mind (but I guess that has lower prio)

NoItAll · 2019-08-05T11:42:52Z

Re: "in an ideal world" - this is similar to what @NoItAll does with the "Magie" approach, right?

Hi,
Currently MAGIE does not contain this functionality, - I rather suggested it as an outlook on exemplary additions. This ad-hoc discretization using a genetic algorithm was suggested in a paper I read on classification rule mining.
This would also imply the creation of a new index structure, e.g. a k-d-tree, which can deal with continuous data. The question is whether or not the k-d-tree is adequately fast, as the major speed-up of the rule mining in MAGIE stemmed from using roaring bitmaps.

cjuers · 2019-08-20T11:55:24Z

This issue has been fixed in the AutoTuning branch

TobiasGoerke added the bug Something isn't working label Aug 1, 2019

TobiasGoerke mentioned this issue Aug 1, 2019

Tabular perturbation is inaccurate when using discretization viadee/anchorsOnR#40

Closed

MagdalenaLang1 self-assigned this Aug 5, 2019

cjuers pushed a commit that referenced this issue Sep 30, 2019

Closes #26

d763a47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tabular perturbation is inaccurate when using discretization #26

Tabular perturbation is inaccurate when using discretization #26

TobiasGoerke commented Aug 1, 2019

fkoehne commented Aug 2, 2019

TobiasGoerke commented Aug 2, 2019

MagdalenaLang1 commented Aug 5, 2019

MagdalenaLang1 commented Aug 5, 2019

TobiasGoerke commented Aug 5, 2019

fkoehne commented Aug 5, 2019

fkoehne commented Aug 5, 2019

MagdalenaLang1 commented Aug 5, 2019

NoItAll commented Aug 5, 2019 •

edited

cjuers commented Aug 20, 2019

Tabular perturbation is inaccurate when using discretization #26

Tabular perturbation is inaccurate when using discretization #26

Comments

TobiasGoerke commented Aug 1, 2019

fkoehne commented Aug 2, 2019

TobiasGoerke commented Aug 2, 2019

MagdalenaLang1 commented Aug 5, 2019

MagdalenaLang1 commented Aug 5, 2019

TobiasGoerke commented Aug 5, 2019

fkoehne commented Aug 5, 2019

fkoehne commented Aug 5, 2019

MagdalenaLang1 commented Aug 5, 2019

NoItAll commented Aug 5, 2019 • edited

cjuers commented Aug 20, 2019

NoItAll commented Aug 5, 2019 •

edited