-
Notifications
You must be signed in to change notification settings - Fork 820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make baseline global #2016
make baseline global #2016
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @SimTheGreat, it looks like a very good start to make the baseline models global 🚀
I have some notes about the general idea behind of some of the model methods:
- Baseline models don't support past or future covariates. The Darts encoders generate covariates that can used by the models. So for the baseline models we can simply return
(None, None, ...)
in_model_encoder_settings
. - the
extreme_lags()
property gives the expected input boundaries for the target series and covariates. Since baseline models don't support covariates, we can simply use the same logic fromLocalForecastingModel.extreme_lags()
for all models.
From here on I have a couple of suggestions on how to further enhance this:
- we could write a new parent
BaselineModel
class that all base line models inherit from. This class takes care of: - fit logic that is shared across models
- predict logic that is shared across models
- encoder handling
- extreme lags handling
I have written an example below of such a parent BaselineModel
class and applied it to the NaiveMean
model.
Could you try to adapt the other models (execpt NaiveEnsemble) in a similar way?
from abc import ABC, abstractmethod
from typing import List, Optional, Sequence, Tuple, Union
import numpy as np
from darts.logging import get_logger, raise_if, raise_if_not, raise_log
from darts.models.forecasting.ensemble_model import EnsembleModel
from darts.timeseries import TimeSeries
from darts.models.forecasting.forecasting_model import (
ForecastingModel,
GlobalForecastingModel,
)
from darts.utils.utils import seq2series, series2seq
logger = get_logger(__name__)
class BaselineModel(GlobalForecastingModel, ABC):
def __init__(self):
super().__init__(add_encoders=None)
def fit(self, series: Union[TimeSeries, Sequence[TimeSeries]]) -> "BaselineModel":
"""Fit/train the model on a (or potentially multiple) series.
This method is only implemented for naive baseline models to provide a unified fit/predict API with other
forecasting models.
The models are not really trained on the input, but they store the training `series` in case only a single
`TimeSeries` was passed. This allows to call `predict()` without having to pass the single `series`.
All baseline models compute the forecasts for each series directly when calling `predict()`.
Parameters
----------
series
One or several target time series. The model will be trained to forecast these time series.
The series may or may not be multivariate, but if multiple series are provided
they must have the same number of components.
Returns
-------
self
Fitted model.
"""
series = seq2series(series)
super().fit(series=series)
self._fit_model(series=series)
@abstractmethod
def _fit_model(self, series: Union[TimeSeries, Sequence[TimeSeries]]) -> None:
"""Must implement the fit logic and checks for each sub model."""
pass
def predict(
self,
n: int,
series: Optional[Union[TimeSeries, Sequence[TimeSeries]]] = None,
num_samples: int = 1
) -> Union[TimeSeries, Sequence[TimeSeries]]:
"""Forecasts values for `n` time steps after the end of the series.
If :func:`fit()` has been called with only one ``TimeSeries`` as argument, then the `series` argument of
this function is optional, and it will simply produce the next `horizon` time steps forecast.
If :func:`fit()` has been called with `series` specified as a ``Sequence[TimeSeries]`` (i.e., the model
has been trained on multiple time series), the `series` argument must be specified.
When the `series` argument is specified, this function will compute the next `n` time steps forecasts
for the simple series (or for each series in the sequence) given by `series`.
Parameters
----------
n
Forecast horizon - the number of time steps after the end of the series for which to produce predictions.
series
The series whose future values will be predicted.
Returns
-------
Union[TimeSeries, Sequence[TimeSeries]]
If `series` is not specified, this function returns a single time series containing the `n`
next points after then end of the training series.
If `series` is given and is a simple ``TimeSeries``, this function returns the `n` next points
after the end of `series`.
If `series` is given and is a sequence of several time series, this function returns
a sequence where each element contains the corresponding `n` points forecasts.
"""
if series is None:
# then there must be a single TS, and that was saved in super().fit as self.training_series
if self.training_series is None:
raise_log(
ValueError(
"Input `series` must be provided. This is the result either from fitting on multiple series, "
"or from not having fit the model yet."
),
logger,
)
series = self.training_series
called_with_single_series = True if isinstance(series, TimeSeries) else False
series = series2seq(series)
super().predict(n=n, series=series, num_samples=num_samples)
predictions = self._predict(n=n, series=series, num_samples=num_samples)
return predictions[0] if called_with_single_series else predictions
@abstractmethod
def _predict(
self,
n: int,
series: Sequence[TimeSeries] = None,
num_samples: int = 1
) -> Sequence[TimeSeries]:
"""Must implement the prediction logic for each sub model."""
pass
@property
def extreme_lags(
self,
) -> Tuple[
Optional[int],
Optional[int],
Optional[int],
Optional[int],
Optional[int],
Optional[int],
]:
return -self.min_train_series_length, -1, None, None, None, None
@property
def _model_encoder_settings(
self,
) -> Tuple[
Optional[int],
Optional[int],
bool,
bool,
Optional[List[int]],
Optional[List[int]],
]:
"""Baseline models do not support covariates and therefore also no encoders."""
return None, None, False, False, None, None
def supports_multivariate(self) -> bool:
return True
class NaiveMean(BaselineModel):
def __init__(self):
"""Naive Mean Model
This model has no parameter, and always predicts the
mean value of the training series.
Examples
--------
>>> from darts.datasets import AirPassengersDataset
>>> from darts.models import NaiveMean
>>> series = AirPassengersDataset().load()
>>> model = NaiveMean()
>>> model.fit(series)
>>> pred = model.predict(6)
>>> pred.values()
array([[280.29861111],
[280.29861111],
[280.29861111],
[280.29861111],
[280.29861111],
[280.29861111]])
"""
super().__init__()
def _fit_model(self, series: Union[TimeSeries, Sequence[TimeSeries]]) -> None:
super()._fit_model(series)
def _predict(
self,
n: int,
series: Sequence[TimeSeries] = None,
num_samples: int = 1
) -> Sequence[TimeSeries]:
predictions = []
for series_ in series:
mean_val = np.mean(series_.values(copy=False), axis=0)
predictions.append(
self._build_forecast_series(
points_preds=np.tile(mean_val, (n, 1)),
input_series=series_,
)
)
return predictions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thanks a lot @SimTheGreat, this looks great!
Only added some minor suggestions.
Before merging, we should:
- add the
_fit_wrapper
and_predict_wrapper
methods without past/future covariates support to theBaselineModel
class to fix the current errors. - add this PR/improvement to the
CHANGELOG.md
file - README.md:
- change the LocalForecastingModel link to GlobalForecastingModel for the naive baselines in the forecasting model table
- add multivariate and multiple target series support for baseline models
- darts/docs/userguide/covariates.md:
- update the text for Global forecasting models with baseline models
- move the baseline models in the model tabel to the global forecasting models
- add some tests for the naive models now being global. I think these tests adapted for the naive models should be fine:
darts.tests.models.forecasting.test_global_forecasting_model.test_single_ts()
darts.tests.models.forecasting.test_global_forecasting_model.test_multi_ts()
darts.tests.models.forecasting.test_global_forecasting_model.test_prediction_with_different_n()
Let me know if you need help with anything
return self._build_forecast_series(forecast) | ||
predictions = [] | ||
for series_ in series: | ||
last_k_vals = series_.values(copy=False)[-self.K :, :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should check that each series has at least K
points, as done previously in fit()
|
||
series = series2seq(series) | ||
super().predict(n=n, series=series, num_samples=num_samples) | ||
predictions = self._predict(n=n, series=series, num_samples=num_samples) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice :) It looks like all models now use the same for loop to go over all input series.
We can put the for here into the base class:
predictions = self._predict(n=n, series=series, num_samples=num_samples) | |
predictions = [] | |
for series_ in series: | |
predictions.append( | |
self._build_forecast_series( | |
points_preds=self._predict( | |
n=n, series=series_, num_samples=num_samples | |
), | |
input_series=series_, | |
) | |
) |
And then _predict()
from all models can be adapted to act on a single TimeSeries
and return a single np.ndarray
(see comment below)
@abstractmethod | ||
def _predict( | ||
self, n: int, series: Sequence[TimeSeries] = None, num_samples: int = 1 | ||
) -> Sequence[TimeSeries]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abstractmethod | |
def _predict( | |
self, n: int, series: Sequence[TimeSeries] = None, num_samples: int = 1 | |
) -> Sequence[TimeSeries]: | |
@abstractmethod | |
def _predict( | |
self, n: int, series: TimeSeries = None, num_samples: int = 1 | |
) -> np.ndarray: |
""" | ||
series = seq2series(series) | ||
super().fit(series=series) | ||
self._fit_model(series=series) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like none of the models actually use the _fit_model
method, so we can remove the method and save some lines :)
self._fit_model(series=series) |
@abstractmethod | ||
def _fit_model(self, series: Union[TimeSeries, Sequence[TimeSeries]]) -> None: | ||
"""Must implement the fit logic and checks for each sub model.""" | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abstractmethod | |
def _fit_model(self, series: Union[TimeSeries, Sequence[TimeSeries]]) -> None: | |
"""Must implement the fit logic and checks for each sub model.""" | |
pass |
@abstractmethod | ||
def _fit_model(self, series: Union[TimeSeries, Sequence[TimeSeries]]) -> None: | ||
"""Must implement the fit logic and checks for each sub model.""" | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you don't have to write the pass
(I'm sure for 3.9+ but check for 3.8 :-) )
|
||
@abstractmethod | ||
def _predict( | ||
self, n: int, series: Sequence[TimeSeries] = None, num_samples: int = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put a comma at the end. in this way the method looks like the override ones
|
||
for series_ in series: | ||
first, last = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first, last = series_.first_values(), series_.last_values(),
|
||
chunk_length = self.input_chunk_length | ||
for i in range(chunk_length, chunk_length + n): | ||
prediction = rolling_sum / chunk_length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a constant. You can put it outside of the for loop
or it is an error?
Closing this one, since #2261 was merged. |
Fixes #2002
Summary
makes all baseline models Global