Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace implicit imports with explicit imports #135

Merged
merged 25 commits into from
Aug 30, 2019

Conversation

johannfaouzi
Copy link
Contributor

@johannfaouzi johannfaouzi commented Aug 28, 2019

Fixes #134

As title says, the implicit imports are replaced with explicit imports in test_estimators.py.
It was a bit hard to find some of them from scikit-learn. Let's see if it improves code coverage.

@johannfaouzi
Copy link
Contributor Author

Well it does not. I will try to replace them in sklearn_patches.py too.

@rtavenar
Copy link
Member

For some reason, test_estimators.py is not listed on the coverage report, which probably explains why sklearn_patches.py is almost not covered. It is unclear to me why test_estimators is ignored

@rtavenar
Copy link
Member

Well, it is explicitly ignored in travis. Im not sure why, though. Could you check in this direction ?

@johannfaouzi
Copy link
Contributor Author

That may be a better explanation. I will try to remove it and see how things go.

@johannfaouzi
Copy link
Contributor Author

johannfaouzi commented Aug 28, 2019

With the current travis configuration file, the doctests are not run, because it is specified to run only the tests in the tests folder: pytest -v tslearn tslearn/tests/

It seems like an unintended behavior imho.

I did not see the doctest option after.

@johannfaouzi
Copy link
Contributor Author

The clustering algorithms do not pass the tests. I am a bit tired, so I may say something stupid, but predict returns the argmin of the cross-similarity matrix:
https://github.com/rtavenar/tslearn/blob/eb37c4fbebc2e575c16a955c4bb895cf016faf29/tslearn/clustering.py#L1102

So an array of non-negative integers. So the smallest label can never be -1, and may be 1 when there is noise?
https://github.com/rtavenar/tslearn/blob/eb37c4fbebc2e575c16a955c4bb895cf016faf29/tslearn/tests/sklearn_patches.py#L117-L118

@johannfaouzi
Copy link
Contributor Author

Never mind, it seems like the tests pass for some configurations, and lead to a more reasonable code coverage. A bit weird...

@johannfaouzi
Copy link
Contributor Author

test_all_estimators should be refactored with pytest.mark.parametrize: it would make the output cleaner and it would avoid the runtime issues.

@rtavenar
Copy link
Member

OK, so the failing tests are related to clustering algorithms that do not assign data to every possible cluster. Maybe adding a bit more samples to the dataset could make this more stable?

And I agree with your remark on pytest.mark.parametrize.

I can try to implement fixes, but I am not sure I can push to your master. Let me know what is best for you.

@johannfaouzi
Copy link
Contributor Author

johannfaouzi commented Aug 29, 2019

I will try to fix this and get back to you later on. I should have created another branch, but I often forget to do so :(

@rtavenar
Copy link
Member

Locally, if I just set the number of time series per class (for the clustering tests only) to 15, all tests pass.

@johannfaouzi
Copy link
Contributor Author

Locally, if I just set the number of time series per class (for the clustering tests only) to 15, all tests pass.

Did you change the number of time series in _create_small_ts_dataset()?

@johannfaouzi
Copy link
Contributor Author

johannfaouzi commented Aug 29, 2019

My code with pytest.mark.parametrize doesn't seem to work on Python 2.7...

One issue I faced during this PR: a lot of the functions used in sklearn_patches.py and test_estimators.py are not part of the public API of scikit-learn. I have the master branch installed and some functions were removed. I copy pasted them, but overall I don't think that it will be easy to maintain them, because these functions can be removed without any deprecation warning.

Edit: Following this post on StackOverflow, I tried to swap the decorators, let's see if it works

@rtavenar
Copy link
Member

Locally, if I just set the number of time series per class (for the clustering tests only) to 15, all tests pass.

Did you change the number of time series in _create_small_ts_dataset()?

I did the following :

def _create_small_ts_dataset(n_ts_per_class=5):
    return random_walk_blobs(n_ts_per_blob=n_ts_per_class, n_blobs=3,
                             random_state=1, sz=10, noise_level=0.025)

@johannfaouzi
Copy link
Contributor Author

johannfaouzi commented Aug 29, 2019

So I removed the @ignore_warnings() decorator, which enables the code to run on Python 2. KShape test fails on Python 2 only when numba is disabled, with the predicted clusters being [0, 2] instead of [0, 1, 2]...

https://travis-ci.org/rtavenar/tslearn/jobs/578251283#L822-L836

@rtavenar
Copy link
Member

One issue I faced during this PR: a lot of the functions used in sklearn_patches.py and test_estimators.py are not part of the public API of scikit-learn. I have the master branch installed and some functions were removed. I copy pasted them, but overall I don't think that it will be easy to maintain them, because these functions can be removed without any deprecation warning.

Argh, this is an important point indeed. Not sure how to deal with it in the future...

@rtavenar
Copy link
Member

Seems like we have an issue here: There are failing tests but the Travis indicator for those jobs is green :/

@johannfaouzi
Copy link
Contributor Author

My code for pyts is slightly different, I don't know if that would solve this:
https://github.com/johannfaouzi/pyts/blob/5b4310f581af774eb9d4575c0e40251ef707b5c8/.travis.yml#L32-L39

@johannfaouzi
Copy link
Contributor Author

I know nothing about bash, but the if else loop seems weird, doesn't it ?
https://travis-ci.org/rtavenar/tslearn/jobs/578251283#L1029

Unix always exits with the status of the last command in the pipeline, so if there are several commands and the last one succeeds, it doesn't matter that previous commands failed.

@johannfaouzi
Copy link
Contributor Author

Do you agree with the way the noise is added @rtavenar ?
https://github.com/rtavenar/tslearn/blob/eb37c4fbebc2e575c16a955c4bb895cf016faf29/tslearn/tests/sklearn_patches.py#L78

Right now it adds new points that are noise, instead of adding noise to existing points.

@rtavenar
Copy link
Member

Do you agree with the way the noise is added @rtavenar ?

https://github.com/rtavenar/tslearn/blob/eb37c4fbebc2e575c16a955c4bb895cf016faf29/tslearn/tests/sklearn_patches.py#L78

Right now it adds new points that are noise, instead of adding noise to existing points.

Hum, this is not noise imo, this is a new dataset that includes the previous one. Not sure why we should do that instead of adding white noise to the original dataset.

@rtavenar
Copy link
Member

My code for pyts is slightly different, I don't know if that would solve this:
https://github.com/johannfaouzi/pyts/blob/5b4310f581af774eb9d4575c0e40251ef707b5c8/.travis.yml#L32-L39

I agree that doing it this way is probably better (and is much more readable), and I guess the exit code we get is the one from the command that sends the coverage report rather than the one that corresponds to the tests themselves.

.travis.yml Outdated
else python -m pytest -v tslearn tslearn/tests/ --doctest-modules --ignore tslearn/docs --ignore tslearn/deprecated.py $KERAS_IGNORE;

after_success:
- if [ "$NUMBA_DISABLE_JIT" == 1 ]; then codeclimate-test-reporter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a fi; missing here?

.travis.yml Show resolved Hide resolved
@rtavenar
Copy link
Member

By the way, @johannfaouzi : all your commits are assigned to johann.faouzi (unknown github user) and hence you do not appear as a tslearn contributor, which is a pity

@johannfaouzi
Copy link
Contributor Author

By the way, @johannfaouzi : all your commits are assigned to johann.faouzi (unknown github user) and hence you do not appear as a tslearn contributor, which is a pity

Thanks! I was wondering why, and the issue is that I was using another ID (because I once worked on GitLab and had to use my email from the institute I work in) and never changed it back to my GitHub ID...

@johannfaouzi
Copy link
Contributor Author

I tried to ignore warnings by modifying the setup.cfg file but it didn't work. All the tests are passing (I think), but the output is ugly. I had to remove the @ignore_warnings decorator because it wasn't working well with @pytest.mark.parametrize for Python 2.7.

Let me know what you think about the current state of the PR.

@rtavenar
Copy link
Member

Have you tried options described there ?

Like, for example, would it be a problem to disable warnings while running tests using --disable-warnings? Or use the dedicated decorator @pytest.mark.filterwarnings?

@johannfaouzi
Copy link
Contributor Author

Thanks for the links @rtavenar! I think that it works quite well right now. I disabled:

  • DeprecationWarning,
  • PendingDeprecationWarning,
  • sklearn.exceptions.SkipTestWarning, and
  • UserWarning

There are a few RunTimeWarnings for Python 3.5 that could be worth looking at (but out of scope for this PR), and I'm not sure why it's happening for this version only.

@@ -114,8 +170,8 @@ def check_clustering(name, clusterer_orig, readonly_memmap=False):
assert_array_equal(labels_sorted, np.arange(labels_sorted[0],
labels_sorted[-1] + 1))

# Labels are expected to start at 0 (no noise) or -1 (if noise)
assert labels_sorted[0] in [0, -1]
# Labels are expected to start at 0 (no noise) or 1 (if noise)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quite don't understand why you changed this statement, and I have to admit I also don't understand why assigned clusters could be equal to -1. Could you give a bit of insight on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a predicted label can never be -1 (since the predicted label is the argmin). I just changed what I thought to be a typo, but even with noise I think that we should expect to have at least one sample for each original cluster.

I would remove this line and replace
https://github.com/rtavenar/tslearn/blob/eb37c4fbebc2e575c16a955c4bb895cf016faf29/tslearn/tests/sklearn_patches.py#L114-L115

with

assert_array_equal(labels_sorted, np.arange(0, 3))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me (though maybe for some clustering methods, labels could be computed without resorting to an argmin, but I still wouldn't understand why it could be -1).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all clustering algorithms assign each points to a cluster (and are more robust to noise). E.g. DBSCAN. The non-assigned samples get -1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that! However I don't think that any of the clustering algorithm currently available in this package do that, so the changes are acceptable imho. A good thing to keep in mind for the future though :)

@rtavenar rtavenar merged commit e3e8451 into tslearn-team:master Aug 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Very low code coverage for sklearn_patches.py
3 participants