Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] TimeSeries slicing silently changes the TimeSeries' frequency #2089

Closed
DavidKleindienst opened this issue Nov 23, 2023 · 3 comments · Fixed by #2152
Closed

[BUG] TimeSeries slicing silently changes the TimeSeries' frequency #2089

DavidKleindienst opened this issue Nov 23, 2023 · 3 comments · Fixed by #2152
Labels
bug Something isn't working
Projects

Comments

@DavidKleindienst
Copy link
Contributor

DavidKleindienst commented Nov 23, 2023

Describe the bug
When using BusinessDay as frequency, selecting a slice e.g. ts[1:5] can result in the slice having a different frequency (Day).
This is problematic because subsequent concatenations can fail because the two TimeSeries won't be contiguous at the altered frequency (but would be contiguous at the original frequency)

To Reproduce

from darts import TimeSeries, concatenate
import pandas as pd
import numpy as np

offset = pd.offsets.BusinessDay()   # Closed on Saturdays & Sundays
dates1 = pd.date_range("2023-11-01","2023-11-26", freq=offset)   # the 26th is a sunday, last date is the 24th
values1 = np.ones(len(dates1))
ts1  = TimeSeries.from_times_and_values(dates1,values1)
dates2 = pd.date_range("2023-11-26","2023-12-06", freq=offset)  # First date is the 27th
values2 = np.ones(len(dates2))
ts2 = TimeSeries.from_times_and_values(dates2, values2)

concatenate((ts1,ts2)) # works
print(ts1.freq) # <BusinessDay>
ts_subset = ts1[-4:]
print(ts_subset.freq) # <Day>
concatenate((ts_subset, ts2)) # Fails 

Expected behavior
TimeSeries slicing should not silently alter the frequency

System (please complete the following information):

  • Python version: 3.10
  • darts version: Tested on master and 0.25
@DavidKleindienst DavidKleindienst added bug Something isn't working triage Issue waiting for triaging labels Nov 23, 2023
@DavidKleindienst
Copy link
Contributor Author

The problematic piece in the TimeSeries.__getitem__ code seems to be this one:

def _set_freq_in_xa(xa_: xr.DataArray):
        # mutates the DataArray to make sure it contains the freq
        if isinstance(xa_.get_index(self._time_dim), pd.DatetimeIndex):
            inferred_freq = xa_.get_index(self._time_dim).inferred_freq
            if inferred_freq is not None:
                xa_.get_index(self._time_dim).freq = to_offset(inferred_freq)
            else:
                xa_.get_index(self._time_dim).freq = self._freq

Always setting xa_.get_index(self._time_dim).freq = self._freq without the inference would fix this problem, but I'm not sure if it would cause other issues (e.g. when indexing a TimeSeries by a DateTimeIndex)

@madtoinou madtoinou removed the triage Issue waiting for triaging label Nov 24, 2023
@madtoinou madtoinou added this to To do in darts via automation Nov 24, 2023
@dennisbader
Copy link
Collaborator

Hi @DavidKleindienst and thanks for raising this issue.

I think we need to investigate a bit more on how to solve this, as we cannot set the frequency to the original one fore all cases of slicing:

from darts import TimeSeries
import pandas as pd
import numpy as np

# daily frequency
dates1 = pd.date_range("2023-11-01", "2023-11-26", freq="D")
values1 = np.ones(len(dates1))
ts1 = TimeSeries.from_times_and_values(dates1, values1)
ts_subset = ts1[1::2]
print(ts_subset.freq)  # <2 * Days>

Taking every second element from a daily frequency results in a 2 days frequency which is correct.

@DavidKleindienst
Copy link
Contributor Author

DavidKleindienst commented Nov 24, 2023

Thank you @dennisbader for the quick response and explaing of the use case of the frequency inference.
Unfortunately examples like the one you described are apparently not covered by unit tests.
When I do the naive approach of just removing the frequency inference, all TimeSeries unit tests still pass.

I'd be happy to contribute to fix this problem, but I'm not well aware of which use-cases require the inference.
If you (or somebody else) could write some unit tests covering the cases which need the frequency inference, I'd be happy to work on a PR for this issue.

darts automation moved this from To do to Done Jan 15, 2024
@dennisbader dennisbader moved this from Done to Released in darts Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
darts
Released
Development

Successfully merging a pull request may close this issue.

3 participants