Feat/improve timeseries #2196

dennisbader · 2024-01-29T13:53:46Z

Summary

Refactor time series constructor and methods for performance boosts

avoids checking whether all time steps available, as freq is already well defined
avoids raise_if_* since they always compute the (formatted) strings
improves slicing with integers
improves slicing with Timestamps
improves from_group_dataframe() to perform some operations on the full DataFrame instead of every group iterations
adds option to prevent some group_cols from being added to static covariates when using TimeSeries.from_group_dataframe() with parameter drop_group_cols. (addresses [INFO] Fit multiple time series using RegressionModels with static and past covariates. #2183)

Results

large performance boosts especially for time series indexed with "special" date offsets (e.g. "W-MON"). These frequencies resulted in much longer TimeSeries creation times compared to normal frequencies (below). Now, they are comparably fast. (up to >100x faster)
good performance boosts for time series with "normal" date offsets (e.g. "D") ( up to 2x faster)

Boosts for "special" frequencies (e.g. "W-MON"): Series creation time boost by method and series length

recreation of the time index was the major bottle neck in old time series creation. Therefore we see the increase boost with increasing series length
all other boosts are translated also to the normal frequencies that are shown below

Boosts for "normal" frequencies (e.g. "D"): Series creation time boost by method and series length

the boosts here are relatively constant over the time series length, still we see speed ups for all methods

And here the actual time series creation times for all experiments
results_TimeSeries_D.csv
results_TimeSeries_W-MON.csv
results_TimeSeriesOld_D.csv
results_TimeSeriesOld_W-MON.csv

codecov-commenter · 2024-01-29T14:23:34Z

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (0b4dcf0) 93.92% compared to head (3e391b8) 93.87%.

Files	Patch %	Lines
darts/timeseries.py	89.65%	9 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2196      +/-   ##
==========================================
- Coverage   93.92%   93.87%   -0.06%     
==========================================
  Files         135      135              
  Lines       13394    13411      +17     
==========================================
+ Hits        12580    12589       +9     
- Misses        814      822       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

darts/tests/test_timeseries_static_covariates.py

darts/timeseries.py

madtoinou

LGTM, we should maybe open a separate PR to refactor a bit the tests and leverage pytest.mark.parametrize as we did for other tests.

CHANGELOG.md

VascoSch92 · 2024-02-01T13:33:55Z

LGTM, we should maybe open a separate PR to refactor a bit the tests and leverage pytest.mark.parametrize as we did for other tests.

this is a good idea... sorry if I was pity about that but having good tests is a must :-)

dennisbader · 2024-02-02T08:59:09Z

Thanks @VascoSch92 and @madtoinou for the reviews. I applied the suggestion. For the tests I separated exceptions from behavior. Regarding parametrization we can make in a separate PR as @madtoinou suggested.

We recently migrated from unittest to pytest which is why there are still some relics of un-parametrized tests around.

dennisbader added 11 commits January 26, 2024 10:20

found major peformance boost for time series creation

787f2a7

first boosted time series version

44f8975

improve slicing with integers

c670eeb

improve slicing with time stamps

dfabbf5

improve slicing with time stamps

9ff23be

update from_xarray

d9003e1

improve from_group_dataframe()

8a04d1d

remove test time series

0e563b4

remove old time series

d4ee955

add option to drop group columns from from_group_dataframe

396c9a5

update changelog

94c7b70

madtoinou self-requested a review January 29, 2024 14:17

Merge branch 'master' into feat/improve_timeseries

e70d237

dennisbader mentioned this pull request Jan 29, 2024

[INFO] Fit multiple time series using RegressionModels with static and past covariates. #2183

Closed

VascoSch92 reviewed Jan 29, 2024

View reviewed changes

madtoinou approved these changes Feb 1, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

apply suggestions from PR review

3e391b8

dennisbader merged commit 8cb04f6 into master Feb 2, 2024
9 checks passed

dennisbader deleted the feat/improve_timeseries branch February 2, 2024 10:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/improve timeseries #2196

Feat/improve timeseries #2196

dennisbader commented Jan 29, 2024 •

edited

Loading

codecov-commenter commented Jan 29, 2024 •

edited

Loading

madtoinou left a comment

VascoSch92 commented Feb 1, 2024

dennisbader commented Feb 2, 2024

Feat/improve timeseries #2196

Feat/improve timeseries #2196

Conversation

dennisbader commented Jan 29, 2024 • edited Loading

Summary

Results

Boosts for "special" frequencies (e.g. "W-MON"): Series creation time boost by method and series length

Boosts for "normal" frequencies (e.g. "D"): Series creation time boost by method and series length

codecov-commenter commented Jan 29, 2024 • edited Loading

Codecov Report

madtoinou left a comment

Choose a reason for hiding this comment

VascoSch92 commented Feb 1, 2024

dennisbader commented Feb 2, 2024

dennisbader commented Jan 29, 2024 •

edited

Loading

codecov-commenter commented Jan 29, 2024 •

edited

Loading