Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make baseline global #2016

Closed

Conversation

SimTheGreat
Copy link
Contributor

Fixes #2002

Summary

makes all baseline models Global

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @SimTheGreat, it looks like a very good start to make the baseline models global 🚀

I have some notes about the general idea behind of some of the model methods:

  • Baseline models don't support past or future covariates. The Darts encoders generate covariates that can used by the models. So for the baseline models we can simply return (None, None, ...) in _model_encoder_settings.
  • the extreme_lags() property gives the expected input boundaries for the target series and covariates. Since baseline models don't support covariates, we can simply use the same logic from LocalForecastingModel.extreme_lags() for all models.

From here on I have a couple of suggestions on how to further enhance this:

  • we could write a new parent BaselineModel class that all base line models inherit from. This class takes care of:
  • fit logic that is shared across models
  • predict logic that is shared across models
  • encoder handling
  • extreme lags handling

I have written an example below of such a parent BaselineModel class and applied it to the NaiveMean model.

Could you try to adapt the other models (execpt NaiveEnsemble) in a similar way?

from abc import ABC, abstractmethod
from typing import List, Optional, Sequence, Tuple, Union

import numpy as np

from darts.logging import get_logger, raise_if, raise_if_not, raise_log
from darts.models.forecasting.ensemble_model import EnsembleModel
from darts.timeseries import TimeSeries
from darts.models.forecasting.forecasting_model import (
    ForecastingModel,
    GlobalForecastingModel,
)
from darts.utils.utils import seq2series, series2seq

logger = get_logger(__name__)


class BaselineModel(GlobalForecastingModel, ABC):
    def __init__(self):
        super().__init__(add_encoders=None)

    def fit(self, series: Union[TimeSeries, Sequence[TimeSeries]]) -> "BaselineModel":
        """Fit/train the model on a (or potentially multiple) series.
        This method is only implemented for naive baseline models to provide a unified fit/predict API with other
        forecasting models.

        The models are not really trained on the input, but they store the training `series` in case only a single
        `TimeSeries` was passed. This allows to call `predict()` without having to pass the single `series`.

        All baseline models compute the forecasts for each series directly when calling `predict()`.

        Parameters
        ----------
        series
            One or several target time series. The model will be trained to forecast these time series.
            The series may or may not be multivariate, but if multiple series are provided
            they must have the same number of components.

        Returns
        -------
        self
            Fitted model.
        """
        series = seq2series(series)
        super().fit(series=series)
        self._fit_model(series=series)

    @abstractmethod
    def _fit_model(self, series: Union[TimeSeries, Sequence[TimeSeries]]) -> None:
        """Must implement the fit logic and checks for each sub model."""
        pass

    def predict(
        self,
        n: int,
        series: Optional[Union[TimeSeries, Sequence[TimeSeries]]] = None,
        num_samples: int = 1
    ) -> Union[TimeSeries, Sequence[TimeSeries]]:
        """Forecasts values for `n` time steps after the end of the series.

        If :func:`fit()` has been called with only one ``TimeSeries`` as argument, then the `series` argument of
        this function is optional, and it will simply produce the next `horizon` time steps forecast.

        If :func:`fit()` has been called with `series` specified as a ``Sequence[TimeSeries]`` (i.e., the model
        has been trained on multiple time series), the `series` argument must be specified.

        When the `series` argument is specified, this function will compute the next `n` time steps forecasts
        for the simple series (or for each series in the sequence) given by `series`.

        Parameters
        ----------
        n
            Forecast horizon - the number of time steps after the end of the series for which to produce predictions.
        series
            The series whose future values will be predicted.

        Returns
        -------
        Union[TimeSeries, Sequence[TimeSeries]]
            If `series` is not specified, this function returns a single time series containing the `n`
            next points after then end of the training series.
            If `series` is given and is a simple ``TimeSeries``, this function returns the `n` next points
            after the end of `series`.
            If `series` is given and is a sequence of several time series, this function returns
            a sequence where each element contains the corresponding `n` points forecasts.
        """
        if series is None:
            # then there must be a single TS, and that was saved in super().fit as self.training_series
            if self.training_series is None:
                raise_log(
                    ValueError(
                        "Input `series` must be provided. This is the result either from fitting on multiple series, "
                        "or from not having fit the model yet."
                    ),
                    logger,
                )
            series = self.training_series

        called_with_single_series = True if isinstance(series, TimeSeries) else False

        series = series2seq(series)
        super().predict(n=n, series=series, num_samples=num_samples)
        predictions = self._predict(n=n, series=series, num_samples=num_samples)
        return predictions[0] if called_with_single_series else predictions

    @abstractmethod
    def _predict(
        self,
        n: int,
        series: Sequence[TimeSeries] = None,
        num_samples: int = 1
    ) -> Sequence[TimeSeries]:
        """Must implement the prediction logic for each sub model."""
        pass

    @property
    def extreme_lags(
        self,
    ) -> Tuple[
        Optional[int],
        Optional[int],
        Optional[int],
        Optional[int],
        Optional[int],
        Optional[int],
    ]:
        return -self.min_train_series_length, -1, None, None, None, None

    @property
    def _model_encoder_settings(
            self,
    ) -> Tuple[
        Optional[int],
        Optional[int],
        bool,
        bool,
        Optional[List[int]],
        Optional[List[int]],
    ]:
        """Baseline models do not support covariates and therefore also no encoders."""
        return None, None, False, False, None, None

    def supports_multivariate(self) -> bool:
        return True


class NaiveMean(BaselineModel):
    def __init__(self):
        """Naive Mean Model

        This model has no parameter, and always predicts the
        mean value of the training series.

        Examples
        --------
        >>> from darts.datasets import AirPassengersDataset
        >>> from darts.models import NaiveMean
        >>> series = AirPassengersDataset().load()
        >>> model = NaiveMean()
        >>> model.fit(series)
        >>> pred = model.predict(6)
        >>> pred.values()
        array([[280.29861111],
              [280.29861111],
              [280.29861111],
              [280.29861111],
              [280.29861111],
              [280.29861111]])
        """
        super().__init__()

    def _fit_model(self, series: Union[TimeSeries, Sequence[TimeSeries]]) -> None:
        super()._fit_model(series)

    def _predict(
        self,
        n: int,
        series: Sequence[TimeSeries] = None,
        num_samples: int = 1
    ) -> Sequence[TimeSeries]:
        predictions = []
        for series_ in series:
            mean_val = np.mean(series_.values(copy=False), axis=0)
            predictions.append(
                self._build_forecast_series(
                    points_preds=np.tile(mean_val, (n, 1)),
                    input_series=series_,
                )
            )
        return predictions

Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks a lot @SimTheGreat, this looks great!

Only added some minor suggestions.

Before merging, we should:

  • add the _fit_wrapper and _predict_wrapper methods without past/future covariates support to the BaselineModel class to fix the current errors.
  • add this PR/improvement to the CHANGELOG.md file
  • README.md:
    • change the LocalForecastingModel link to GlobalForecastingModel for the naive baselines in the forecasting model table
    • add multivariate and multiple target series support for baseline models
  • darts/docs/userguide/covariates.md:
    • update the text for Global forecasting models with baseline models
    • move the baseline models in the model tabel to the global forecasting models
  • add some tests for the naive models now being global. I think these tests adapted for the naive models should be fine:
    • darts.tests.models.forecasting.test_global_forecasting_model.test_single_ts()
    • darts.tests.models.forecasting.test_global_forecasting_model.test_multi_ts()
    • darts.tests.models.forecasting.test_global_forecasting_model.test_prediction_with_different_n()

Let me know if you need help with anything

return self._build_forecast_series(forecast)
predictions = []
for series_ in series:
last_k_vals = series_.values(copy=False)[-self.K :, :]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check that each series has at least K points, as done previously in fit()


series = series2seq(series)
super().predict(n=n, series=series, num_samples=num_samples)
predictions = self._predict(n=n, series=series, num_samples=num_samples)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice :) It looks like all models now use the same for loop to go over all input series.
We can put the for here into the base class:

Suggested change
predictions = self._predict(n=n, series=series, num_samples=num_samples)
predictions = []
for series_ in series:
predictions.append(
self._build_forecast_series(
points_preds=self._predict(
n=n, series=series_, num_samples=num_samples
),
input_series=series_,
)
)

And then _predict() from all models can be adapted to act on a single TimeSeries and return a single np.ndarray (see comment below)

Comment on lines +112 to +115
@abstractmethod
def _predict(
self, n: int, series: Sequence[TimeSeries] = None, num_samples: int = 1
) -> Sequence[TimeSeries]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@abstractmethod
def _predict(
self, n: int, series: Sequence[TimeSeries] = None, num_samples: int = 1
) -> Sequence[TimeSeries]:
@abstractmethod
def _predict(
self, n: int, series: TimeSeries = None, num_samples: int = 1
) -> np.ndarray:

"""
series = seq2series(series)
super().fit(series=series)
self._fit_model(series=series)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like none of the models actually use the _fit_model method, so we can remove the method and save some lines :)

Suggested change
self._fit_model(series=series)

Comment on lines +54 to +57
@abstractmethod
def _fit_model(self, series: Union[TimeSeries, Sequence[TimeSeries]]) -> None:
"""Must implement the fit logic and checks for each sub model."""
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@abstractmethod
def _fit_model(self, series: Union[TimeSeries, Sequence[TimeSeries]]) -> None:
"""Must implement the fit logic and checks for each sub model."""
pass

@abstractmethod
def _fit_model(self, series: Union[TimeSeries, Sequence[TimeSeries]]) -> None:
"""Must implement the fit logic and checks for each sub model."""
pass

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't have to write the pass (I'm sure for 3.9+ but check for 3.8 :-) )


@abstractmethod
def _predict(
self, n: int, series: Sequence[TimeSeries] = None, num_samples: int = 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put a comma at the end. in this way the method looks like the override ones


for series_ in series:
first, last = (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first, last = series_.first_values(), series_.last_values(),


chunk_length = self.input_chunk_length
for i in range(chunk_length, chunk_length + n):
prediction = rolling_sum / chunk_length

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a constant. You can put it outside of the for loop

or it is an error?

@dennisbader
Copy link
Collaborator

Closing this one, since #2261 was merged.

@dennisbader dennisbader closed this Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make baseline models global
3 participants