Skip to content

Extend CLI for estimation of the number of folds #1284

Merged
merged 9 commits into from
Jun 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Exogenous variables shift transform `ExogShiftTransform`([#1254](https://github.com/tinkoff-ai/etna/pull/1254))
- Parameter `start_timestamp` to forecast CLI command ([#1265](https://github.com/tinkoff-ai/etna/pull/1265))
- Function `estimate_max_n_folds` for folds number estimation ([#1279](https://github.com/tinkoff-ai/etna/pull/1279))
- Parameters `estimate_n_folds` and `context_size` to forecast and backtest CLI commands ([#1284](https://github.com/tinkoff-ai/etna/pull/1284))
-
### Changed
- Set the default value of `final_model` to `LinearRegression(positive=True)` in the constructor of `StackingEnsemble` ([#1238](https://github.com/tinkoff-ai/etna/pull/1238))
Expand Down
73 changes: 68 additions & 5 deletions docs/source/commands.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,20 @@ Basic ``forecast`` usage:

**Forecast config parameters**

* :code:`prediction_interval` - whether to estimate prediction interval for forecast.
* :code:`quantiles` - levels of prediction distribution. By default 2.5% and 97.5% are taken to form a 95% prediction interval.
* :code:`n_folds` - number of folds to use in the backtest for prediction interval estimation. By default equals to 3.
* :code:`return_components` - whether to estimate forecast components
* :code:`start_timestamp` - timestamp with the starting point of forecast.
* :code:`estimate_n_folds` - whether to estimate the number of folds from data. Works only when prediction intervals are enabled. Requires :code:`context_size` parameter set in pipeline config.

Mr-Geekman marked this conversation as resolved.
Show resolved Hide resolved
Other parameters that could be set in the configuration file could be found in :meth:`~etna.pipeline.pipeline.Pipeline.forecast` method documentation.

Setting these parameters is optional.
Further information on arguments could be found in the documentation of :meth:`~etna.pipeline.pipeline.Pipeline.forecast` method.


**Pipeline config parameters**

* :code:`context_size` - minimum number of points in the history that is required by pipeline to produce a forecast.

Further information on pipeline parameters could be found in :class:`~etna.pipeline.pipeline.Pipeline` method documentation.


**How to create config?**

Expand Down Expand Up @@ -63,6 +69,26 @@ Parameter :code:`start_timestamp` could be set similarly:
quantiles: [0.025, 0.975]
start_timestamp: "2020-01-12"

Example of a pair of configs for number of folds estimation:

.. code-block:: yaml

_target_: etna.pipeline.Pipeline
horizon: 4
model:
_target_: etna.models.CatBoostMultiSegmentModel
transforms:
- _target_: etna.transforms.LinearTrendTransform
in_column: target
- _target_: etna.transforms.SegmentEncoderTransform
context_size: 1

.. code-block:: yaml

prediction_interval: true
quantiles: [0.025, 0.975]
estimate_n_folds: true

**How to prepare data?**

Example of dataset with data to forecast:
Expand Down Expand Up @@ -114,6 +140,20 @@ Basic ``backtest`` usage:
[EXOG_PATH] path to csv with exog data
[KNOWN_FUTURE] list of all known_future columns (regressor columns). If not specified then all exog_columns considered known_future [default: None]

**Backtest config parameters**

* :code:`estimate_n_folds` - whether to estimate the number of folds from data. Requires :code:`context_size` parameter set in pipeline config.

Other parameters that could be set in the configuration file could be found in :meth:`~etna.pipeline.base.BasePipeline.backtest` method documentation.
Mr-Geekman marked this conversation as resolved.
Show resolved Hide resolved

Setting these parameters is optional.


**Pipeline config parameters**

* :code:`context_size` - minimum number of points in the history that is required by pipeline to produce a forecast.

Further information on pipeline parameters could be found in :class:`~etna.pipeline.pipeline.Pipeline` method documentation.

**How to create configs?**

Expand Down Expand Up @@ -142,6 +182,29 @@ Example of backtest run config:
- _target_: etna.metrics.MAPE
- _target_: etna.metrics.SMAPE

Example of a pair of configs for number of folds estimation for backtest:

.. code-block:: yaml

_target_: etna.pipeline.Pipeline
horizon: 4
model:
_target_: etna.models.CatBoostMultiSegmentModel
transforms:
- _target_: etna.transforms.LinearTrendTransform
in_column: target
- _target_: etna.transforms.SegmentEncoderTransform
context_size: 1

.. code-block:: yaml

n_folds: 200
n_jobs: 4
metrics:
- _target_: etna.metrics.MAE
- _target_: etna.metrics.SMAPE
estimate_n_folds: true


**How to prepare data?**

Expand Down
36 changes: 34 additions & 2 deletions etna/commands/backtest_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,22 @@
from typing import Optional
from typing import Sequence
from typing import Union
from typing import cast

import hydra_slayer
import pandas as pd
import typer
from omegaconf import OmegaConf
from typing_extensions import Literal

from etna.commands.utils import estimate_max_n_folds
from etna.commands.utils import remove_params
from etna.datasets import TSDataset
from etna.pipeline import Pipeline

ADDITIONAL_BACKTEST_PARAMETERS = {"estimate_n_folds"}
ADDITIONAL_PIPELINE_PARAMETERS = {"context_size"}


def backtest(
config_path: Path = typer.Argument(..., help="path to yaml config with desired pipeline"),
Expand Down Expand Up @@ -63,6 +69,8 @@ def backtest(
============= =========== =============== ===============
"""
pipeline_configs = OmegaConf.to_object(OmegaConf.load(config_path))
pipeline_configs = cast(Dict[str, Any], pipeline_configs)

backtest_configs = OmegaConf.to_object(OmegaConf.load(backtest_config_path))

df_timeseries = pd.read_csv(target_path, parse_dates=["timestamp"])
Expand All @@ -78,10 +86,34 @@ def backtest(

tsdataset = TSDataset(df=df_timeseries, freq=freq, df_exog=df_exog, known_future=k_f)

pipeline: Pipeline = hydra_slayer.get_from_params(**pipeline_configs)
pipeline_args = remove_params(params=pipeline_configs, to_remove=ADDITIONAL_PIPELINE_PARAMETERS)
pipeline: Pipeline = hydra_slayer.get_from_params(**pipeline_args)
backtest_configs_hydra_slayer: Dict[str, Any] = hydra_slayer.get_from_params(**backtest_configs)

metrics, forecast, info = pipeline.backtest(ts=tsdataset, **backtest_configs_hydra_slayer)
# estimate number of folds if parameters set
if backtest_configs_hydra_slayer.get("estimate_n_folds", False):
if "context_size" not in pipeline_configs:
raise ValueError("Parameter `context_size` must be set if number of folds estimation enabled!")

context_size = pipeline_configs["context_size"]

max_n_folds = estimate_max_n_folds(
ts=tsdataset,
pipeline=pipeline,
method_name="backtest",
context_size=context_size,
**backtest_configs_hydra_slayer,
)

n_folds = min(
max_n_folds, backtest_configs_hydra_slayer.get("n_folds", 5)
) # use default value of folds if parameter not set

backtest_configs_hydra_slayer["n_folds"] = n_folds

backtest_call_args = remove_params(params=backtest_configs_hydra_slayer, to_remove=ADDITIONAL_BACKTEST_PARAMETERS)
Mr-Geekman marked this conversation as resolved.
Show resolved Hide resolved

metrics, forecast, info = pipeline.backtest(ts=tsdataset, **backtest_call_args)

(metrics.to_csv(output_path / "metrics.csv", index=False))
(TSDataset.to_flatten(forecast).to_csv(output_path / "forecast.csv", index=False))
Expand Down
42 changes: 33 additions & 9 deletions etna/commands/forecast_command.py
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
import warnings
from pathlib import Path
from typing import Any
from typing import Dict
from typing import List
from typing import Optional
from typing import Sequence
from typing import Union
from typing import cast

import hydra_slayer
import pandas as pd
import typer
from omegaconf import OmegaConf
from typing_extensions import Literal

from etna.commands.utils import estimate_max_n_folds
from etna.commands.utils import remove_params
from etna.datasets import TSDataset
from etna.models.utils import determine_num_steps
from etna.pipeline import Pipeline

ADDITIONAL_FORECAST_PARAMETERS = {"start_timestamp"}


def get_forecast_call_params(forecast_params: Dict[str, Any]) -> Dict[str, Any]:
"""Select `forecast` arguments from params."""
return {k: v for k, v in forecast_params.items() if k not in ADDITIONAL_FORECAST_PARAMETERS}
ADDITIONAL_FORECAST_PARAMETERS = {"start_timestamp", "estimate_n_folds"}
ADDITIONAL_PIPELINE_PARAMETERS = {"context_size"}


def compute_horizon(horizon: int, forecast_params: Dict[str, Any], tsdataset: TSDataset) -> int:
Expand Down Expand Up @@ -101,6 +101,8 @@ def forecast(
============= =========== =============== ===============
"""
pipeline_configs = OmegaConf.to_object(OmegaConf.load(config_path))
pipeline_configs = cast(Dict[str, Any], pipeline_configs)

if forecast_config_path:
forecast_params_config = OmegaConf.to_object(OmegaConf.load(forecast_config_path))
else:
Expand All @@ -124,10 +126,32 @@ def forecast(
horizon = compute_horizon(horizon=horizon, forecast_params=forecast_params, tsdataset=tsdataset)
pipeline_configs["horizon"] = horizon # type: ignore

forecast_call_args = get_forecast_call_params(forecast_params)

pipeline: Pipeline = hydra_slayer.get_from_params(**pipeline_configs)
pipeline_args = remove_params(params=pipeline_configs, to_remove=ADDITIONAL_PIPELINE_PARAMETERS)
pipeline: Pipeline = hydra_slayer.get_from_params(**pipeline_args)
pipeline.fit(tsdataset)

# estimate number of folds if parameters set
if forecast_params.get("estimate_n_folds", False):
if forecast_params.get("prediction_interval", False):
if "context_size" not in pipeline_configs:
raise ValueError("Parameter `context_size` must be set if number of folds estimation enabled!")

context_size = pipeline_configs["context_size"]

max_n_folds = estimate_max_n_folds(
pipeline=pipeline, method_name="forecast", context_size=context_size, **forecast_params
)

n_folds = min(
max_n_folds, forecast_params.get("n_folds", 3)
) # use default value of folds if parameter not set
forecast_params["n_folds"] = n_folds

else:
warnings.warn("Number of folds estimation would be ignored as the current forecast call doesn't use folds!")

forecast_call_args = remove_params(params=forecast_params, to_remove=ADDITIONAL_FORECAST_PARAMETERS)

forecast = pipeline.forecast(**forecast_call_args)

forecast = filter_forecast(forecast_ts=forecast, forecast_params=forecast_params)
Expand Down
8 changes: 8 additions & 0 deletions etna/commands/utils.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,21 @@
from enum import Enum
from math import floor
from typing import Any
from typing import Dict
from typing import Literal
from typing import Optional
from typing import Set
from typing import Union

from etna.datasets import TSDataset
from etna.pipeline import Pipeline


def remove_params(params: Dict[str, Any], to_remove: Set[str]) -> Dict[str, Any]:
"""Select `forecast` arguments from params."""
return {k: v for k, v in params.items() if k not in to_remove}


class MethodsWithFolds(str, Enum):
"""Enum for methods that use `n_folds` argument."""

Expand Down
52 changes: 21 additions & 31 deletions tests/test_commands/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,27 @@ def base_pipeline_yaml_path():
tmp.close()


@pytest.fixture
def base_pipeline_with_context_size_yaml_path():
tmp = NamedTemporaryFile("w")
tmp.write(
"""
_target_: etna.pipeline.Pipeline
horizon: 4
model:
_target_: etna.models.CatBoostMultiSegmentModel
transforms:
- _target_: etna.transforms.LinearTrendTransform
in_column: target
- _target_: etna.transforms.SegmentEncoderTransform
context_size: 1
"""
)
tmp.flush()
yield Path(tmp.name)
tmp.close()


@pytest.fixture
def elementary_linear_model_pipeline():
tmp = NamedTemporaryFile("w")
Expand Down Expand Up @@ -143,37 +164,6 @@ def base_timeseries_exog_path():
tmp.close()


@pytest.fixture
def base_forecast_omegaconf_path():
Mr-Geekman marked this conversation as resolved.
Show resolved Hide resolved
tmp = NamedTemporaryFile("w")
tmp.write(
"""
prediction_interval: true
quantiles: [0.025, 0.975]
n_folds: 3
"""
)
tmp.flush()
yield Path(tmp.name)
tmp.close()


@pytest.fixture
def start_timestamp_forecast_omegaconf_path():
tmp = NamedTemporaryFile("w")
tmp.write(
"""
prediction_interval: true
quantiles: [0.025, 0.975]
n_folds: 3
start_timestamp: "2021-09-10"
"""
)
tmp.flush()
yield Path(tmp.name)
tmp.close()


@pytest.fixture
def empty_ts():
df = pd.DataFrame({"segment": [], "timestamp": [], "target": []})
Expand Down