-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace implicit imports with explicit imports #135
Conversation
Well it does not. I will try to replace them in |
For some reason, test_estimators.py is not listed on the coverage report, which probably explains why sklearn_patches.py is almost not covered. It is unclear to me why test_estimators is ignored |
Well, it is explicitly ignored in travis. Im not sure why, though. Could you check in this direction ? |
That may be a better explanation. I will try to remove it and see how things go. |
I did not see the doctest option after. |
The clustering algorithms do not pass the tests. I am a bit tired, so I may say something stupid, but So an array of non-negative integers. So the smallest label can never be -1, and may be 1 when there is noise? |
Never mind, it seems like the tests pass for some configurations, and lead to a more reasonable code coverage. A bit weird... |
|
OK, so the failing tests are related to clustering algorithms that do not assign data to every possible cluster. Maybe adding a bit more samples to the dataset could make this more stable? And I agree with your remark on I can try to implement fixes, but I am not sure I can push to your master. Let me know what is best for you. |
I will try to fix this and get back to you later on. I should have created another branch, but I often forget to do so :( |
Locally, if I just set the number of time series per class (for the clustering tests only) to 15, all tests pass. |
Did you change the number of time series in |
My code with One issue I faced during this PR: a lot of the functions used in Edit: Following this post on StackOverflow, I tried to swap the decorators, let's see if it works |
I did the following : def _create_small_ts_dataset(n_ts_per_class=5):
return random_walk_blobs(n_ts_per_blob=n_ts_per_class, n_blobs=3,
random_state=1, sz=10, noise_level=0.025) |
So I removed the https://travis-ci.org/rtavenar/tslearn/jobs/578251283#L822-L836 |
Argh, this is an important point indeed. Not sure how to deal with it in the future... |
Seems like we have an issue here: There are failing tests but the Travis indicator for those jobs is green :/ |
My code for |
I know nothing about bash, but the Unix always exits with the status of the last command in the pipeline, so if there are several commands and the last one succeeds, it doesn't matter that previous commands failed. |
Do you agree with the way the noise is added @rtavenar ? Right now it adds new points that are noise, instead of adding noise to existing points. |
Hum, this is not noise imo, this is a new dataset that includes the previous one. Not sure why we should do that instead of adding white noise to the original dataset. |
I agree that doing it this way is probably better (and is much more readable), and I guess the exit code we get is the one from the command that sends the coverage report rather than the one that corresponds to the tests themselves. |
.travis.yml
Outdated
else python -m pytest -v tslearn tslearn/tests/ --doctest-modules --ignore tslearn/docs --ignore tslearn/deprecated.py $KERAS_IGNORE; | ||
|
||
after_success: | ||
- if [ "$NUMBA_DISABLE_JIT" == 1 ]; then codeclimate-test-reporter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a fi;
missing here?
By the way, @johannfaouzi : all your commits are assigned to |
Thanks! I was wondering why, and the issue is that I was using another ID (because I once worked on GitLab and had to use my email from the institute I work in) and never changed it back to my GitHub ID... |
I tried to ignore warnings by modifying the Let me know what you think about the current state of the PR. |
Have you tried options described there ? Like, for example, would it be a problem to disable warnings while running tests using |
Thanks for the links @rtavenar! I think that it works quite well right now. I disabled:
There are a few |
tslearn/tests/sklearn_patches.py
Outdated
@@ -114,8 +170,8 @@ def check_clustering(name, clusterer_orig, readonly_memmap=False): | |||
assert_array_equal(labels_sorted, np.arange(labels_sorted[0], | |||
labels_sorted[-1] + 1)) | |||
|
|||
# Labels are expected to start at 0 (no noise) or -1 (if noise) | |||
assert labels_sorted[0] in [0, -1] | |||
# Labels are expected to start at 0 (no noise) or 1 (if noise) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quite don't understand why you changed this statement, and I have to admit I also don't understand why assigned clusters could be equal to -1. Could you give a bit of insight on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that a predicted label can never be -1 (since the predicted label is the argmin). I just changed what I thought to be a typo, but even with noise I think that we should expect to have at least one sample for each original cluster.
I would remove this line and replace
https://github.com/rtavenar/tslearn/blob/eb37c4fbebc2e575c16a955c4bb895cf016faf29/tslearn/tests/sklearn_patches.py#L114-L115
with
assert_array_equal(labels_sorted, np.arange(0, 3))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me (though maybe for some clustering methods, labels could be computed without resorting to an argmin, but I still wouldn't understand why it could be -1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all clustering algorithms assign each points to a cluster (and are more robust to noise). E.g. DBSCAN. The non-assigned samples get -1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that! However I don't think that any of the clustering algorithm currently available in this package do that, so the changes are acceptable imho. A good thing to keep in mind for the future though :)
Fixes #134
As title says, the implicit imports are replaced with explicit imports in
test_estimators.py
.It was a bit hard to find some of them from scikit-learn. Let's see if it improves code coverage.