Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disregard NaNs in TimeSeriesScalerMeanVariance #175

Closed
wants to merge 27 commits into from

Conversation

eliwoods
Copy link

@eliwoods eliwoods commented Jan 7, 2020

If a time series with any NaN values is passed to TimeSeriesScalerMeanVariance.fit_transform() the transformed time series returns as all NaN

E.g:

In [1]: from tslearn.preprocessing import TimeSeriesScalerMeanVariance                                                                

In [2]: from numpy import nan                                                                                                         

In [3]: TimeSeriesScalerMeanVariance().fit_transform([0, nan, 6])                                                                     
Out[3]: 
array([[[nan],
        [nan],
        [nan]]])

This could be fixed by using numpy.nanmean() and numpy.nanstd() in TimeSeriesScalerMeanVariance.transform instead of the current implementation. This also makes tslearn in line with sklearn in terms of NaN's and preprocessing: scikit-learn/scikit-learn#10404

@codecov-io
Copy link

codecov-io commented Jan 7, 2020

Codecov Report

Merging #175 into dev will decrease coverage by 0.28%.
The diff coverage is 76.19%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #175      +/-   ##
==========================================
- Coverage   93.97%   93.69%   -0.29%     
==========================================
  Files          22       22              
  Lines        3039     2585     -454     
==========================================
- Hits         2856     2422     -434     
+ Misses        183      163      -20
Impacted Files Coverage Δ
tslearn/__init__.py 75% <100%> (ø) ⬆️
tslearn/preprocessing.py 91.93% <100%> (ø) ⬆️
tslearn/tests/test_shapelets.py 100% <100%> (ø) ⬆️
tslearn/tests/test_estimators.py 89.28% <58.33%> (-5.59%) ⬇️
tslearn/tests/sklearn_patches.py 88.6% <0%> (-2.85%) ⬇️
tslearn/shapelets.py 91.2% <0%> (-2.58%) ⬇️
tslearn/tests/test_metrics.py 100% <0%> (ø) ⬆️
tslearn/tests/test_barycenters.py 100% <0%> (ø) ⬆️
tslearn/svm.py 98.43% <0%> (ø) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 40cf0c9...2996607. Read the comment docs.

@rtavenar
Copy link
Member

rtavenar commented Jan 7, 2020

Hi @eliwoods

Thanks for the suggestion, this makes a lot of sense.

I would suggest the following improvements:

  • Do the same for TimeSeriesScalerMinMax
  • Add doctest examples to make the behaviour explicit for end-users
  • Change the destination branch to dev such that your work can be included in the next tslearn version to come

Once all this is done, one last thing would be to document your contribution in CHANGELOG.md.

@eliwoods eliwoods changed the base branch from master to dev January 7, 2020 21:51
@pep8speaks
Copy link

Hello @eliwoods! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 114:80: E501 line too long (87 > 79 characters)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants