Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjusting test_size doesn't actually change test_size #3

Closed
tyler-lanigan-hs opened this issue Jul 28, 2018 · 9 comments
Closed

Adjusting test_size doesn't actually change test_size #3

tyler-lanigan-hs opened this issue Jul 28, 2018 · 9 comments

Comments

@tyler-lanigan-hs
Copy link

Hello!
I'm trying to use this code for a project, however, I don't want my test size to be 0.5. When I try and adjust it, I don't get a change:

# from iterstrat.ml_stratifiers import MultilabelStratifiedShuffleSplit
import numpy as np

X = np.array([[1,2], [3,4], [1,2], [3,4], [1,2], [3,4], [1,2], [3,4]])
y = np.array([[0,0], [0,0], [0,1], [0,1], [1,1], [1,1], [1,0], [1,0]])
msss = MultilabelStratifiedShuffleSplit(n_splits=3, test_size=0.25, random_state=42)

for train_index, test_index in msss.split(X, y):
    print("TRAIN:", train_index, "TEST:", test_index)
    print(len(train_index))
    print(len(test_index))
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

outputs:

('TRAIN:', array([1, 2, 4, 7]), 'TEST:', array([0, 3, 5, 6]))
4
4
('TRAIN:', array([2, 3, 6, 7]), 'TEST:', array([0, 1, 4, 5]))
4
4
('TRAIN:', array([0, 2, 4, 6]), 'TEST:', array([1, 3, 5, 7]))
4
4

Koodos on putting this out there!

@trent-b
Copy link
Owner

trent-b commented Aug 12, 2018

Thank you for reporting an issue. However, I am unable to reproduce the issue. I believe I may have had some local changes that I had not committed to the repo, so I created a new version (0.1.6). I updated the files for pip and conda installation as well. Please try the updated version and let me know how it goes. Here is the output that I get:

TRAIN: [0 1 2 4 5 7] TEST: [3 6]
6
2
TRAIN: [0 1 3 4 5 7] TEST: [2 6]
6
2
TRAIN: [0 1 3 4 5 6] TEST: [2 7]
6
2

@tyler-lanigan-hs
Copy link
Author

Thanks Trent. It's all good for me now :)

@trent-b trent-b closed this as completed Aug 13, 2018
@mosheliv
Copy link

I still get the same behavior in 1.6

Collecting iterative-stratification
Collecting scipy (from iterative-stratification)
  Using cached https://files.pythonhosted.org/packages/45/d1/7c2b33a5daee3d67752d043fe7e1476c4465788b0b6e59367fd71fdf684a/scipy-1.2.0-cp27-cp27mu-manylinux1_x86_64.whl
Collecting numpy (from iterative-stratification)
  Using cached https://files.pythonhosted.org/packages/de/37/fe7db552f4507f379d81dcb78e58e05030a8941757b1f664517d581b5553/numpy-1.15.4-cp27-cp27mu-manylinux1_x86_64.whl
Collecting scikit-learn (from iterative-stratification)
  Using cached https://files.pythonhosted.org/packages/9e/29/bbf3414ba3d03cf1f8d8516e56d69e44ec0ad3fc79a3713b1c6809070e7d/scikit_learn-0.20.2-cp27-cp27mu-manylinux1_x86_64.whl
Installing collected packages: numpy, scipy, scikit-learn, iterative-stratification
Successfully installed iterative-stratification-0.1.6 numpy-1.14.5 scikit-learn-0.20.2 scipy-1.1.0
You are using pip version 8.1.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
m@dl4:~/hpa$ python t.py 
('TRAIN:', array([1, 2, 4, 7]), 'TEST:', array([0, 3, 5, 6]))
4
4
('TRAIN:', array([2, 3, 6, 7]), 'TEST:', array([0, 1, 4, 5]))
4
4
('TRAIN:', array([0, 2, 4, 6]), 'TEST:', array([1, 3, 5, 7]))
4
4

@trent-b
Copy link
Owner

trent-b commented Dec 22, 2018

My intuition is that an earlier version of iterative-stratification is being used (even though I see that pip said that it installed 0.1.6). Try the following to confirm which version is being used:

import iterstrat
print(iterstrat.__version__)

Fix suggestions:

  1. I suggest upgrading pip with pip install --upgrade pip and then installing.

  2. If the upgrade does not resolve the issue, then try pip install iterative-stratification --no-cache-dir.

  3. If the --no-cache-dir option does not work, then consider manually deleting the cache files as described here and then installing.

Please let me know if this works for you.

@mosheliv
Copy link

Found when it happens. Python 2.7 vs 3. Can you please give it a try in 2.7? not sure what needs to be changed for it to work in 2.7, probaby one of the default casts when dividing, this is the most common problem.

@trent-b
Copy link
Owner

trent-b commented Dec 22, 2018

The package doesn't currently support running under Python 2.7, only 3.4, 3.5, and 3.6 (see the Requirements section of README.md). I'll take a look though, but it may be a few weeks.

@trent-b trent-b reopened this Dec 22, 2018
@mosheliv
Copy link

mosheliv commented Dec 22, 2018 via email

@trent-b
Copy link
Owner

trent-b commented Dec 22, 2018

I'm glad you were able to find a workaround for 2.7.

@trent-b
Copy link
Owner

trent-b commented Jan 13, 2019

I've decided to stay with this package's claim that it only supports 3.X.

@trent-b trent-b closed this as completed Jan 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants