Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/historical_forecasts accept negative integer as start value #1866

Merged
merged 27 commits into from
Aug 15, 2023
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
9dbe81f
feat: historical_foreacst accept negative integer as start value
madtoinou Jun 29, 2023
344e929
fix: improved the negative start unit test
madtoinou Jun 29, 2023
50a2b1b
fix: simplified the logic around exception raising
madtoinou Jun 29, 2023
a58c89f
Merge branch 'master' into feat/historical_forecast_neg_int_start
madtoinou Jun 29, 2023
df176eb
Merge branch 'master' into feat/historical_forecast_neg_int_start
dennisbader Jul 4, 2023
b3ee491
merging master and using dict type to convey index from the end of th…
madtoinou Aug 9, 2023
db17f94
Merge branch 'master' into feat/historical_forecast_neg_int_start
madtoinou Aug 9, 2023
eee099e
fix: instead of adding capabilities to get_index_at_point, use a new …
madtoinou Aug 11, 2023
c4fcd58
test: udpated tests accordingly
madtoinou Aug 11, 2023
5fda99e
Merge branch 'master' into feat/historical_forecast_neg_int_start
madtoinou Aug 11, 2023
eef4089
doc: updated changelog
madtoinou Aug 11, 2023
a727152
test: added test for historical forecast on ts using a rangeindex sta…
madtoinou Aug 11, 2023
c1cccc3
Apply suggestions from code review
madtoinou Aug 11, 2023
7beb2a6
fix: changed the literal to 'positional_index' and 'value_index'
madtoinou Aug 11, 2023
4e40275
feat: making the error messages more informative, adapted the tests a…
madtoinou Aug 11, 2023
90b2e62
feat: extending the new argument to backtest and gridsearch
madtoinou Aug 11, 2023
ce4b669
fix: import of Literal for python 3.8
madtoinou Aug 14, 2023
292af54
doc: updated changelog
madtoinou Aug 14, 2023
86c3b84
fix: shortened the literal for start_format, updated tests accordingly
madtoinou Aug 14, 2023
b3d5929
doc: updated start docstring
madtoinou Aug 14, 2023
94842b0
test: limited the dependency on unittest in anticipation of the refac…
madtoinou Aug 14, 2023
f6f95bd
doc: updated changelog
madtoinou Aug 14, 2023
35bf096
fix: fixed typo
madtoinou Aug 14, 2023
ba13934
fix: fixed typo
madtoinou Aug 14, 2023
6a91897
doc: copy start and start_format docstring from hist_fct to backtest …
madtoinou Aug 15, 2023
462c51a
Apply suggestions from code review
madtoinou Aug 15, 2023
a8ac1f8
Merge branch 'master' into feat/historical_forecast_neg_int_start
madtoinou Aug 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ but cannot always guarantee backwards compatibility. Changes that may **break co

### For users of the library:

**Improvement**
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
- `TimeSeries` with a `RangeIndex` starting in the negative start are now supported by `historical_forecasts`. [#1866](https://github.com/unit8co/darts/pull/1866) by [Antoine Madrona](https://github.com/madtoinou).
- Added a new argument `start_format` to `historical_forecasts`, `start` can now be provided as an absolute index (positive or negative) instead of a point of the time index. [#1866](https://github.com/unit8co/darts/pull/1866) by [Antoine Madrona](https://github.com/madtoinou).

**Fixed**
- Fixed a bug in `TimeSeries.from_dataframe()` when using a pandas.DataFrame with `df.columns.name != None`. [#1938](https://github.com/unit8co/darts/pull/1938) by [Antoine Madrona](https://github.com/madtoinou).
- Fixed a bug in `RegressionEnsembleModel.extreme_lags` when the forecasting models have only covariates lags. [#1942](https://github.com/unit8co/darts/pull/1942) by [Antoine Madrona](https://github.com/madtoinou).
Expand Down
22 changes: 20 additions & 2 deletions darts/models/forecasting/forecasting_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,18 @@
from collections import OrderedDict
from itertools import product
from random import sample
from typing import Any, BinaryIO, Callable, Dict, List, Optional, Sequence, Tuple, Union
from typing import (
Any,
BinaryIO,
Callable,
Dict,
List,
Literal,
Optional,
Sequence,
Tuple,
Union,
)

import numpy as np
import pandas as pd
Expand Down Expand Up @@ -560,6 +571,7 @@ def historical_forecasts(
num_samples: int = 1,
train_length: Optional[int] = None,
start: Optional[Union[pd.Timestamp, float, int]] = None,
start_format: Literal["point", "index"] = "point",
forecast_horizon: int = 1,
stride: int = 1,
retrain: Union[bool, int, Callable[..., bool]] = True,
Expand Down Expand Up @@ -610,7 +622,7 @@ def historical_forecasts(
`min_train_series_length`.
start
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
Optionally, the first point in time at which a prediction is computed for a future time.
This parameter supports: ``float``, ``int`` and ``pandas.Timestamp``, and ``None``.
This parameter supports: ``float``, ``int``, ``pandas.Timestamp``, and ``None``.
If a ``float``, the parameter will be treated as the proportion of the time series
that should lie before the first prediction point.
If an ``int``, the parameter will be treated as an integer index to the time index of
Expand All @@ -628,6 +640,9 @@ def historical_forecasts(
Note: Raises a ValueError if `start` yields a time outside the time index of `series`.
Note: If `start` is outside the possible historical forecasting times, will ignore the parameter
(default behavior with ``None``) and start at the first trainable/predictable point.
start_format
If set to 'index', `start` must be an integer and corresponds to the absolute position of the first point
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
in time at which the prediction is generated. Default: ``'point'``.
forecast_horizon
The forecast horizon for the predictions.
stride
Expand Down Expand Up @@ -798,6 +813,7 @@ def retrain_func(
future_covariates=future_covariates,
num_samples=num_samples,
start=start,
start_format=start_format,
forecast_horizon=forecast_horizon,
stride=stride,
overlap_end=overlap_end,
Expand Down Expand Up @@ -876,6 +892,7 @@ def retrain_func(
forecast_horizon=forecast_horizon,
overlap_end=overlap_end,
start=start,
start_format=start_format,
show_warnings=show_warnings,
)

Expand Down Expand Up @@ -1893,6 +1910,7 @@ def _optimized_historical_forecasts(
future_covariates: Optional[Sequence[TimeSeries]] = None,
num_samples: int = 1,
start: Optional[Union[pd.Timestamp, float, int]] = None,
start_format: Literal["point", "index"] = "point",
forecast_horizon: int = 1,
stride: int = 1,
overlap_end: bool = False,
Expand Down
5 changes: 4 additions & 1 deletion darts/models/forecasting/regression_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
if their static covariates do not have the same size, the shorter ones are padded with 0 valued features.
"""
from collections import OrderedDict
from typing import Any, Callable, Dict, List, Optional, Sequence, Tuple, Union
from typing import Any, Callable, Dict, List, Literal, Optional, Sequence, Tuple, Union

import numpy as np
import pandas as pd
Expand Down Expand Up @@ -897,6 +897,7 @@ def _optimized_historical_forecasts(
future_covariates: Optional[Sequence[TimeSeries]] = None,
num_samples: int = 1,
start: Optional[Union[pd.Timestamp, float, int]] = None,
start_format: Literal["point", "index"] = "point",
forecast_horizon: int = 1,
stride: int = 1,
overlap_end: bool = False,
Expand Down Expand Up @@ -949,6 +950,7 @@ def _optimized_historical_forecasts(
future_covariates=future_covariates,
num_samples=num_samples,
start=start,
start_format=start_format,
forecast_horizon=forecast_horizon,
stride=stride,
overlap_end=overlap_end,
Expand All @@ -963,6 +965,7 @@ def _optimized_historical_forecasts(
future_covariates=future_covariates,
num_samples=num_samples,
start=start,
start_format=start_format,
forecast_horizon=forecast_horizon,
stride=stride,
overlap_end=overlap_end,
Expand Down
112 changes: 103 additions & 9 deletions darts/tests/models/forecasting/test_historical_forecasts.py
Original file line number Diff line number Diff line change
Expand Up @@ -374,6 +374,48 @@ def test_historical_forecasts_local_models(self):
"LocalForecastingModel does not support historical forecasting with `retrain` set to `False`"
)

def test_historical_forecasts_index_start(self):
series = tg.sine_timeseries(length=10)

model = LinearRegressionModel(lags=2)
model.fit(series[:8])

# negative index
forecasts = model.historical_forecasts(
series=series, start=-2, start_format="index", retrain=False
)
self.assertEqual(len(forecasts), 2)
self.assertTrue((series.time_index[-2:] == forecasts.time_index).all())

# positive index
forecasts = model.historical_forecasts(
series=series, start=5, start_format="index", retrain=False
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
)
self.assertEqual(len(forecasts), 5)
self.assertTrue((series.time_index[5:] == forecasts.time_index).all())

def test_historical_forecasts_negative_rangeindex(self):
series = TimeSeries.from_times_and_values(
times=pd.RangeIndex(start=-5, stop=5, step=1), values=np.arange(10)
)

model = LinearRegressionModel(lags=2)
model.fit(series[:8])

# start as point
forecasts = model.historical_forecasts(
series=series, start=-2, start_format="point", retrain=False
)
self.assertEqual(len(forecasts), 7)
self.assertTrue((series.time_index[-7:] == forecasts.time_index).all())

# start as index
forecasts = model.historical_forecasts(
series=series, start=-2, start_format="index", retrain=False
)
self.assertEqual(len(forecasts), 2)
self.assertTrue((series.time_index[-2:] == forecasts.time_index).all())

def test_historical_forecasts(self):
train_length = 10
forecast_horizon = 8
Expand Down Expand Up @@ -551,7 +593,7 @@ def test_sanity_check_invalid_start(self):
rangeidx_step1 = tg.linear_timeseries(start=0, length=10, freq=1)
rangeidx_step2 = tg.linear_timeseries(start=0, length=10, freq=2)

# index too large
# point (int) too large
with pytest.raises(ValueError) as msg:
LinearRegressionModel(lags=1).historical_forecasts(timeidx_, start=11)
assert str(msg.value).startswith("`start` index `11` is out of bounds")
Expand All @@ -562,26 +604,32 @@ def test_sanity_check_invalid_start(self):
LinearRegressionModel(lags=1).historical_forecasts(rangeidx_step2, start=11)
assert str(msg.value).startswith("The provided point is not a valid index")

# value too low
# point (int) too low
with pytest.raises(ValueError) as msg:
LinearRegressionModel(lags=1).historical_forecasts(
timeidx_, start=timeidx_.start_time() - timeidx_.freq
rangeidx_step1, start=rangeidx_step1.start_time() - rangeidx_step1.freq
)
assert str(msg.value).startswith(
"`start` time `1999-12-31 00:00:00` is before the first timestamp `2000-01-01 00:00:00`"
"The index corresponding to the provided point ("
)
with pytest.raises(ValueError) as msg:
LinearRegressionModel(lags=1).historical_forecasts(
rangeidx_step1, start=rangeidx_step1.start_time() - rangeidx_step1.freq
rangeidx_step2, start=rangeidx_step2.start_time() - rangeidx_step2.freq
)
assert str(msg.value).startswith("if `start` is an integer, must be `>= 0`")
assert str(msg.value).startswith(
"The index corresponding to the provided point ("
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
)

# point (timestamp) too low
with pytest.raises(ValueError) as msg:
LinearRegressionModel(lags=1).historical_forecasts(
rangeidx_step2, start=rangeidx_step2.start_time() - rangeidx_step2.freq
timeidx_, start=timeidx_.start_time() - timeidx_.freq
)
assert str(msg.value).startswith("if `start` is an integer, must be `>= 0`")
assert str(msg.value).startswith(
"`start` time `1999-12-31 00:00:00` is before the first timestamp `2000-01-01 00:00:00`"
)

# value too high
# point (timestamp) too high
with pytest.raises(ValueError) as msg:
LinearRegressionModel(lags=1).historical_forecasts(
timeidx_, start=timeidx_.end_time() + timeidx_.freq
Expand All @@ -602,6 +650,52 @@ def test_sanity_check_invalid_start(self):
"`start` index `20` is larger than the last index `18`"
)

# index too high when start_format = 'index'
with pytest.raises(ValueError) as msg:
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
LinearRegressionModel(lags=1).historical_forecasts(
timeidx_, start=11, start_format="index"
)
assert str(msg.value).startswith(
"`start` index `11` is out of bounds for series of length 10"
)
with pytest.raises(ValueError) as msg:
LinearRegressionModel(lags=1).historical_forecasts(
rangeidx_step1, start=11, start_format="index"
)
assert str(msg.value).startswith(
"`start` index `11` is out of bounds for series of length 10"
)
with pytest.raises(ValueError) as msg:
LinearRegressionModel(lags=1).historical_forecasts(
rangeidx_step2, start=11, start_format="index"
)
assert str(msg.value).startswith(
"`start` index `11` is out of bounds for series of length 10"
)

# index too high (negative) when start_format = 'index'
with pytest.raises(ValueError) as msg:
LinearRegressionModel(lags=1).historical_forecasts(
timeidx_, start=-11, start_format="index"
)
assert str(msg.value).startswith(
"`start` index `-11` is out of bounds for series of length 10"
)
with pytest.raises(ValueError) as msg:
LinearRegressionModel(lags=1).historical_forecasts(
rangeidx_step1, start=-11, start_format="index"
)
assert str(msg.value).startswith(
"`start` index `-11` is out of bounds for series of length 10"
)
with pytest.raises(ValueError) as msg:
LinearRegressionModel(lags=1).historical_forecasts(
rangeidx_step2, start=-11, start_format="index"
)
assert str(msg.value).startswith(
"`start` index `-11` is out of bounds for series of length 10"
)

def test_regression_auto_start_multiple_no_cov(self):
train_length = 15
forecast_horizon = 10
Expand Down
10 changes: 5 additions & 5 deletions darts/timeseries.py
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ def __init__(self, xa: xr.DataArray):
logger,
)
else:
self._freq = self._time_index.step
self._freq: int = self._time_index.step
self._freq_str = None

# check static covariates
Expand Down Expand Up @@ -2064,7 +2064,7 @@ def get_index_at_point(
Parameters
----------
point
This parameter supports 3 different data types: ``pd.Timestamp``, ``float`` and ``int``.
This parameter supports 4 different data types: ``pd.Timestamp``, ``float``, ``int`` and ``dict``.
madtoinou marked this conversation as resolved.
Show resolved Hide resolved

``pd.Timestamp`` work only on series that are indexed with a ``pd.DatetimeIndex``. In such cases, the
returned point will be the index of this timestamp if it is present in the series time index.
Expand Down Expand Up @@ -2103,7 +2103,7 @@ def get_index_at_point(
)
raise_if_not(
0 <= point_index < len(self),
"point (int) should be a valid index in series",
f"The index corresponding to the provided point ({point}) should be a valid index in series",
logger,
)
elif isinstance(point, pd.Timestamp):
Expand Down Expand Up @@ -2142,8 +2142,8 @@ def get_timestamp_at_point(
This parameter supports 3 different data types: `float`, `int` and `pandas.Timestamp`.
In case of a `float`, the parameter will be treated as the proportion of the time series
that should lie before the point.
In the case of `int`, the parameter will be treated as an integer index to the time index of
`series`. Will raise a ValueError if not a valid index in `series`
In case of `int`, the parameter will be treated as an integer index to the time index of
`series`. Will raise a ValueError if not a valid index in `series`.
In case of a `pandas.Timestamp`, point will be returned as is provided that the timestamp
is present in the series time index, otherwise will raise a ValueError.
"""
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from typing import List, Optional, Sequence, Union
from typing import List, Literal, Optional, Sequence, Union

import numpy as np
import pandas as pd
Expand All @@ -20,6 +20,7 @@ def _optimized_historical_forecasts_regression_last_points_only(
future_covariates: Optional[Sequence[TimeSeries]] = None,
num_samples: int = 1,
start: Optional[Union[pd.Timestamp, float, int]] = None,
start_format: Literal["point", "index"] = "point",
forecast_horizon: int = 1,
stride: int = 1,
overlap_end: bool = False,
Expand Down Expand Up @@ -63,6 +64,7 @@ def _optimized_historical_forecasts_regression_last_points_only(
past_covariates=past_covariates_,
future_covariates=future_covariates_,
start=start,
start_format=start_format,
forecast_horizon=forecast_horizon,
overlap_end=overlap_end,
freq=freq,
Expand Down Expand Up @@ -156,6 +158,7 @@ def _optimized_historical_forecasts_regression_all_points(
future_covariates: Optional[Sequence[TimeSeries]] = None,
num_samples: int = 1,
start: Optional[Union[pd.Timestamp, float, int]] = None,
start_format: Literal["point", "index"] = "point",
forecast_horizon: int = 1,
stride: int = 1,
overlap_end: bool = False,
Expand Down Expand Up @@ -199,6 +202,7 @@ def _optimized_historical_forecasts_regression_all_points(
past_covariates=past_covariates_,
future_covariates=future_covariates_,
start=start,
start_format=start_format,
forecast_horizon=forecast_horizon,
overlap_end=overlap_end,
freq=freq,
Expand Down