Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility to do stratification with multi-output multi-class (multi-target) data #26

Closed
bundit786 opened this issue Nov 29, 2022 · 3 comments

Comments

@bundit786
Copy link

Hi,
I have a multi-output multi-class (multi-target) dataset and would like to do data stratification before applying a learning algorithm. Using iterative_train_test_split from skmultilearn library (```
from skmultilearn.model_selection import iterative_train_test_split
x_train, y_train, x_test, y_test = iterative_train_test_split(x, y, test_size = 0.1)

Thank you.
@trent-b
Copy link
Owner

trent-b commented Dec 1, 2022

Hi, unfortunately this package is not currently designed to handle multiclass-multioutput classification. I wonder though if you could maybe one-hot encode your targets, feed them into the stratifier, keep track of the indices of the instances, and then, once they are split into folds, convert them back into the original target values since you've retained the indices. Just a thought. I'm sorry I couldn't be of more help.

@bablf
Copy link

bablf commented Dec 14, 2022

This does work but I needed to use the MultiLabelBinarizer

@dyhan316
Copy link

dyhan316 commented Feb 10, 2023

Thank you @trent-b and @bablf ! Could you share a sample code that uses https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants