Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/ Allow empty TimeSeries #1359

Merged
merged 6 commits into from
Nov 18, 2022
Merged

Fix/ Allow empty TimeSeries #1359

merged 6 commits into from
Nov 18, 2022

Conversation

madtoinou
Copy link
Collaborator

@madtoinou madtoinou commented Nov 14, 2022

Fixes #1279.

Summary

Remove the assert preventing to create an empty TimeSeries in the constructor and replaced it by a warning to limit the risk of unexpected behavior for the users. Edited the corresponding test to catch the ValueError about missing dimensions in input xarray.DataArray instead of its emptiness.

###Other Information
The currents tests pass with these modifications but we might want to add tests or exception in fit methods to better handle the empty TimeSeries as they can now occur?

…ciated test to catch the ValueError about the inconrrect dimension of the input DataArray
@codecov-commenter
Copy link

codecov-commenter commented Nov 14, 2022

Codecov Report

Base: 93.88% // Head: 93.84% // Decreases project coverage by -0.04% ⚠️

Coverage data is based on head (3805b90) compared to base (ec78594).
Patch coverage: 0.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1359      +/-   ##
==========================================
- Coverage   93.88%   93.84%   -0.05%     
==========================================
  Files          78       78              
  Lines        8523     8513      -10     
==========================================
- Hits         8002     7989      -13     
- Misses        521      524       +3     
Impacted Files Coverage Δ
darts/timeseries.py 92.16% <0.00%> (-0.33%) ⬇️
...arts/models/forecasting/torch_forecasting_model.py 87.08% <0.00%> (-0.06%) ⬇️
darts/models/forecasting/block_rnn_model.py 98.24% <0.00%> (-0.04%) ⬇️
darts/models/forecasting/nhits.py 99.27% <0.00%> (-0.01%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but:

  • I would actually remove the warning (let me know if you think it's a terrible idea)
  • We need to slightly adapt the introductory docstring at the top of TimeSeries saying one of the guarantees is that it's non-empty (maybe also in the user guide - to be checked)
  • Could you try and see what happens when you print the series, or when you plot it? I expect plot() to crash... we should perhaps support creating an empty figure.

darts/timeseries.py Outdated Show resolved Hide resolved
…ed support for plotting empty timeseries, removed the bullet-point about non-emptiness of TimeSeries in the docstring
@madtoinou
Copy link
Collaborator Author

* I would actually remove the warning (let me know if you think it's a terrible idea)

Done, I was probably being too pessimistic about the users capabilities to debug their code.

* We need to slightly adapt the introductory docstring at the top of TimeSeries saying one of the guarantees is that it's non-empty (maybe also in the user guide - to be checked)

I looked at the User Guide, could not find any mention of the non-emptiness.

* Could you try and see what happens when you print the series, or when you plot it? I expect plot() to crash... we should perhaps support creating an empty figure.

It was indeed crashing, I added support for plotting empty TimeSeries and made it so that the x-axis is labeled correctly (using the time_index attribute) and the x-axis range is between 0 and 5, making it obvious that the TimeSeries is empty. We could also add limits to the y-axis to make it even more obvious.

darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
@hrzn
Copy link
Contributor

hrzn commented Nov 17, 2022

* I would actually remove the warning (let me know if you think it's a terrible idea)

Done, I was probably being too pessimistic about the users capabilities to debug their code.

* We need to slightly adapt the introductory docstring at the top of TimeSeries saying one of the guarantees is that it's non-empty (maybe also in the user guide - to be checked)

I looked at the User Guide, could not find any mention of the non-emptiness.

* Could you try and see what happens when you print the series, or when you plot it? I expect plot() to crash... we should perhaps support creating an empty figure.

It was indeed crashing, I added support for plotting empty TimeSeries and made it so that the x-axis is labeled correctly (using the time_index attribute) and the x-axis range is between 0 and 5, making it obvious that the TimeSeries is empty. We could also add limits to the y-axis to make it even more obvious.

For the non-emptiness reference, I was referring to this line.

@madtoinou
Copy link
Collaborator Author

madtoinou commented Nov 17, 2022

For the non-emptiness reference, I was referring to this line.

Oh I already removed this line based on your previous comment, my comment was just to say that I could not find any reference about the non-emptiness in the user-guide :)

@hrzn
Copy link
Contributor

hrzn commented Nov 18, 2022

For the non-emptiness reference, I was referring to this line.

Oh I already removed this line based on your previous comment, my comment was just to say that I could not find any reference about the non-emptiness in the user-guide :)

My bad, I missed it. Thanks!

Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@madtoinou madtoinou merged commit 3877047 into master Nov 18, 2022
@madtoinou madtoinou deleted the fix/allow-empty-ts branch November 18, 2022 12:56
@connesy
Copy link
Contributor

connesy commented Jan 2, 2023

Maybe I'm misunderstanding the purpose of this PR, but as far as I can tell this doesn't fix the original issue:

import darts
import pandas as pd

df = pd.DataFrame({"a": [1, 2, 3, 4, 5]}, index=pd.date_range(start="2022-01-01", end="2022-01-05"))
timeseries = darts.TimeSeries.from_dataframe(df)

print(timestamp[5:6])

...
ValueError: Neither `start` nor `end` can be NaT
Full traceback
File ~/miniconda3/envs/test2/lib/python3.10/site-packages/darts/timeseries.py:4717, in TimeSeries.__getitem__(self, key)
   4713     xa_ = self._xa.isel({self._time_dim: key})
   4714     _set_freq_in_xa(
   4715         xa_
   4716     )  # indexing may discard the freq so we restore it...
-> 4717     return self.__class__(xa_)
   4718 elif isinstance(key.start, pd.Timestamp) or isinstance(
   4719     key.stop, pd.Timestamp
   4720 ):
   4721     _check_dt()

File ~/miniconda3/envs/test2/lib/python3.10/site-packages/darts/timeseries.py:191, in TimeSeries.__init__(self, xa)
    184     self._xa.get_index(self._time_dim).freq = self._freq
    186     # We have to check manually if the index is complete. Another way could be to rely
    187     # on `inferred_freq` being present, but this fails for series of length < 3.
    189     is_index_complete = (
    190         len(
--> 191             pd.date_range(
    192                 self._time_index.min(), self._time_index.max(), freq=self._freq
    193             ).difference(self._time_index)
    194         )
    195         == 0
    196     )
    198     raise_if_not(
    199         is_index_complete,
    200         "Not all timestamps seem to be present in the time index. Does "
   (...)
    204         logger,
    205     )
    206 else:

File ~/miniconda3/envs/test2/lib/python3.10/site-packages/pandas/core/indexes/datetimes.py:1125, in date_range(start, end, periods, freq, tz, normalize, name, closed, inclusive, **kwargs)
   1122 if freq is None and com.any_none(periods, start, end):
   1123     freq = "D"
-> 1125 dtarr = DatetimeArray._generate_range(
   1126     start=start,
   1127     end=end,
   1128     periods=periods,
   1129     freq=freq,
   1130     tz=tz,
   1131     normalize=normalize,
   1132     inclusive=inclusive,
   1133     **kwargs,
   1134 )
   1135 return DatetimeIndex._simple_new(dtarr, name=name)

File ~/miniconda3/envs/test2/lib/python3.10/site-packages/pandas/core/arrays/datetimes.py:367, in DatetimeArray._generate_range(cls, start, end, periods, freq, tz, normalize, ambiguous, nonexistent, inclusive)
    364     end = Timestamp(end)
    366 if start is NaT or end is NaT:
--> 367     raise ValueError("Neither `start` nor `end` can be NaT")
    369 left_inclusive, right_inclusive = validate_inclusive(inclusive)
    370 start, end, _normalized = _maybe_normalize_endpoints(start, end, normalize)
ValueError: Neither `start` nor `end` can be NaT

Same thing happens with

print(timeseries[pd.Timestamp("2022-01-06"):pd.Timestamp("2022-01-08")])

...
ValueError: Neither `start` nor `end` can be NaT

@madtoinou madtoinou restored the fix/allow-empty-ts branch January 18, 2023 16:27
@madtoinou madtoinou deleted the fix/allow-empty-ts branch January 18, 2023 16:36
@madtoinou
Copy link
Collaborator Author

Hi!

I got a bit too excited and forgot to check if the fix was also working for TimeSeries with DatetimeIndex... Thank you for reporting the bug, I am working on a fix and open a PR by the end of the week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Empty time series allowed
4 participants