Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we need X for split #23

Closed
sachinruk opened this issue Nov 11, 2021 · 2 comments
Closed

Do we need X for split #23

sachinruk opened this issue Nov 11, 2021 · 2 comments

Comments

@sachinruk
Copy link

Forgive me if this is a dumb question, but if I understand this library correctly, the main aim is to look at correlations between the y's and somehow accomodate for that when stratifying. Is there any reason why we need the X variable as when splitting? eg. to get the indices we always have to do: train_index, test_index = next(iter(msss.split(X, y))).

Thanks in advance.

@sachinruk sachinruk changed the title Do we need X for split Do we need X for split Nov 11, 2021
@trent-b
Copy link
Owner

trent-b commented Nov 11, 2021

For non-stratified splitters, scikit-learn requires X to get the number of samples. For stratified splitters, scikit-learn requires X to get the number of samples for compatibility purposes with the non-stratified splitters although there is no reason I can see that the required y could not be used to get the number of samples instead. For compatibility with scikit-learn, this iterative-stratification package adopts the same practice of requiring X.

@trent-b
Copy link
Owner

trent-b commented Jul 29, 2022

Closing since there has not been additional activity.

@trent-b trent-b closed this as not planned Won't fix, can't repro, duplicate, stale Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants