Do we need X for `split` #23

sachinruk · 2021-11-11T06:25:56Z

Forgive me if this is a dumb question, but if I understand this library correctly, the main aim is to look at correlations between the y's and somehow accomodate for that when stratifying. Is there any reason why we need the X variable as when splitting? eg. to get the indices we always have to do: train_index, test_index = next(iter(msss.split(X, y))).

Thanks in advance.

The text was updated successfully, but these errors were encountered:

trent-b · 2021-11-11T13:00:12Z

For non-stratified splitters, scikit-learn requires X to get the number of samples. For stratified splitters, scikit-learn requires X to get the number of samples for compatibility purposes with the non-stratified splitters although there is no reason I can see that the required y could not be used to get the number of samples instead. For compatibility with scikit-learn, this iterative-stratification package adopts the same practice of requiring X.

trent-b · 2022-07-29T17:34:22Z

Closing since there has not been additional activity.

sachinruk changed the title ~~Do we need X for split~~ Do we need X for split Nov 11, 2021

trent-b closed this as not planned Won't fix, can't repro, duplicate, stale Jul 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do we need X for `split` #23

Do we need X for `split` #23

sachinruk commented Nov 11, 2021

trent-b commented Nov 11, 2021

trent-b commented Jul 29, 2022

Do we need X for split #23

Do we need X for split #23

Comments

sachinruk commented Nov 11, 2021

trent-b commented Nov 11, 2021

trent-b commented Jul 29, 2022

Do we need X for `split` #23

Do we need X for `split` #23