Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to initialise FoldingBase objects with external parser #77

Closed
wants to merge 2 commits into from

Conversation

mschlupp
Copy link

If you would like to run rep with eg a StratifiedKFold instead of a normal KFold, this will be possible after the pull request. If no external folder-object is parsed, the default KFold algorithm is used.

@jonas-eschle
Copy link

Good idea, thank you! Hope it will be pulled in soon.

Btw it seems somehow neglected... are you still active maintaining this repo? I personally find it very useful and make a lot use of it.

@arogozhnikov
Copy link
Contributor

@mayou36

are you still active maintaining this repo?

rarely doing something. As the things we use work fine, there is no reason to spoil the situation :)

@mschlupp

Max, thanks for idea, but this patch isn't going to work (even if the code was correct).
The reason is FoldingClassifier should not only work with training dataset, but also be able to predict any arbitrary dataset (which may have e.g. different size - this isn't going to work in your case).

So, it's more complicated. I'll think how the case of stratified folding can be added.

@mschlupp
Copy link
Author

I don't quite understand why this should be an issue. You can predict arbitrary datasets and you can validate via CV. The only thing is that you substitute the KFold default by an folder of your choice. What am I missing here?

@arogozhnikov
Copy link
Contributor

arogozhnikov commented Jul 19, 2016

@mschlupp

There are several basic thing expected from FoldingClassifier:

  1. fitting on arbitrary dataset
  2. correctly predict the same dataset (meaning that fold ids used here and in training should be the same)
  3. correctly predict any other dataset with same columns, but maybe different length
  4. ability to work as a part of more complex classifier (e.g. Bagging over FoldingClassifier)

At the same time, for fitting we have labels, but not for predicting. Currently this is done by fixing internal random state, which is later together with length used to correctly generate folds indices.

When you pass StratifiedKFold, you can't fulfill points 3&4, since you need different folding for new test dataset.

@arogozhnikov
Copy link
Contributor

closed in favor of #92 (stratified folding)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants