You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Forgive me if this is a dumb question, but if I understand this library correctly, the main aim is to look at correlations between the y's and somehow accomodate for that when stratifying. Is there any reason why we need the X variable as when splitting? eg. to get the indices we always have to do: train_index, test_index = next(iter(msss.split(X, y))).
Thanks in advance.
The text was updated successfully, but these errors were encountered:
sachinruk
changed the title
Do we need X for split
Do we need X for splitNov 11, 2021
For non-stratified splitters, scikit-learn requires X to get the number of samples. For stratified splitters, scikit-learn requires X to get the number of samples for compatibility purposes with the non-stratified splitters although there is no reason I can see that the required y could not be used to get the number of samples instead. For compatibility with scikit-learn, this iterative-stratification package adopts the same practice of requiring X.
Forgive me if this is a dumb question, but if I understand this library correctly, the main aim is to look at correlations between the y's and somehow accomodate for that when stratifying. Is there any reason why we need the X variable as when splitting? eg. to get the indices we always have to do:
train_index, test_index = next(iter(msss.split(X, y)))
.Thanks in advance.
The text was updated successfully, but these errors were encountered: