Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/improve timeseries #2196

Merged
merged 13 commits into from
Feb 2, 2024
Merged

Feat/improve timeseries #2196

merged 13 commits into from
Feb 2, 2024

Conversation

dennisbader
Copy link
Collaborator

@dennisbader dennisbader commented Jan 29, 2024

Summary

Refactor time series constructor and methods for performance boosts

  • avoids checking whether all time steps available, as freq is already well defined
  • avoids raise_if_* since they always compute the (formatted) strings
  • improves slicing with integers
  • improves slicing with Timestamps
  • improves from_group_dataframe() to perform some operations on the full DataFrame instead of every group iterations
  • adds option to prevent some group_cols from being added to static covariates when using TimeSeries.from_group_dataframe() with parameter drop_group_cols. (addresses [INFO] Fit multiple time series using RegressionModels with static and past covariates. #2183)

Results

  • large performance boosts especially for time series indexed with "special" date offsets (e.g. "W-MON"). These frequencies resulted in much longer TimeSeries creation times compared to normal frequencies (below). Now, they are comparably fast. (up to >100x faster)
  • good performance boosts for time series with "normal" date offsets (e.g. "D") ( up to 2x faster)

Boosts for "special" frequencies (e.g. "W-MON"): Series creation time boost by method and series length

  • recreation of the time index was the major bottle neck in old time series creation. Therefore we see the increase boost with increasing series length
  • all other boosts are translated also to the normal frequencies that are shown below
image

Boosts for "normal" frequencies (e.g. "D"): Series creation time boost by method and series length

  • the boosts here are relatively constant over the time series length, still we see speed ups for all methods
image

And here the actual time series creation times for all experiments
results_TimeSeries_D.csv
results_TimeSeries_W-MON.csv
results_TimeSeriesOld_D.csv
results_TimeSeriesOld_W-MON.csv

@madtoinou madtoinou self-requested a review January 29, 2024 14:17
@codecov-commenter
Copy link

codecov-commenter commented Jan 29, 2024

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (0b4dcf0) 93.92% compared to head (3e391b8) 93.87%.

Files Patch % Lines
darts/timeseries.py 89.65% 9 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2196      +/-   ##
==========================================
- Coverage   93.92%   93.87%   -0.06%     
==========================================
  Files         135      135              
  Lines       13394    13411      +17     
==========================================
+ Hits        12580    12589       +9     
- Misses        814      822       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

darts/tests/test_timeseries_static_covariates.py Outdated Show resolved Hide resolved
darts/tests/test_timeseries_static_covariates.py Outdated Show resolved Hide resolved
darts/tests/test_timeseries_static_covariates.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@madtoinou madtoinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we should maybe open a separate PR to refactor a bit the tests and leverage pytest.mark.parametrize as we did for other tests.

CHANGELOG.md Outdated Show resolved Hide resolved
@VascoSch92
Copy link

LGTM, we should maybe open a separate PR to refactor a bit the tests and leverage pytest.mark.parametrize as we did for other tests.

this is a good idea... sorry if I was pity about that but having good tests is a must :-)

@dennisbader
Copy link
Collaborator Author

Thanks @VascoSch92 and @madtoinou for the reviews. I applied the suggestion. For the tests I separated exceptions from behavior. Regarding parametrization we can make in a separate PR as @madtoinou suggested.

We recently migrated from unittest to pytest which is why there are still some relics of un-parametrized tests around.

@dennisbader dennisbader merged commit 8cb04f6 into master Feb 2, 2024
9 checks passed
@dennisbader dennisbader deleted the feat/improve_timeseries branch February 2, 2024 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants