Refactor/metrics #2284

dennisbader · 2024-03-14T13:57:06Z

Checklist before merging this PR:

Mentioned all issues that this PR fixes or addresses.
Summarized the updates of this PR under Summary.
Added an entry under Unreleased in the Changelog.

Fixes #2249, fixes #2233, closes #2031.

Summary

🚀🚀🚀 Improvements to metrics, historical forecasts, backtest, and residuals through major refactor. The refactor includes optimization of multiple process and improvemenets to consistency, reliability, and the documentation. Some of these necessary changes come at the cost of breaking changes.
- Metrics:
  - Optimized all metrics, which now run >20 times faster than before for univariate series, and >>20 times for multivariate series. This boosts direct metric computations as well as backtesting and residuals computation!
  - Added new metrics:
    - Time aggregated metric merr() (Mean Error)
    - Time aggregated scaled metrics rmsse(), and msse(): The (Root) Mean Squared Scaled Error.
    - "Per time step" metrics that return a metric score per time step: err() (Error), ae() (Absolute Error), se() (Squared Error), sle() (Squared Log Error), ase() (Absolute Scaled Error), sse (Squared Scaled Error), ape() (Absolute Percentage Error), sape() (symmetric Absolute Percentage Error), arre() (Absolute Ranged Relative Error), ql (Quantile Loss)
  - All scaled metrics now accept insample series that can be overlapping into pred_series (before that had to end exactly one step before pred_series). Darts will handle the correct time extraction for you.
  - Improvements to the documentation:
    - Added a summary list of all metrics to the metrics documentation page
    - Standardized the documentation of each metric (added formula, improved return documentation, ...)
  - 🔴 Improved metric output consistency based on the type of input series, and the applied reductions:
    - float: A single metric score for:
      - single univariate series
      - single multivariate series with component_reduction
      - sequence (list) of uni/multivariate series with series_reduction and component_reduction (and time_reduction for "per time step metrics")
    - np.ndarray: A numpy array of metric scores. The array has shape (n time steps, n components) without time and component reductions. The time dimension is only available for "per time step" metrics. For:
      - single multivariate series and at least component_reduction=None for time aggregated metrics.
      - single uni/multivariate series and at least time_reduction=None for "per time step metrics"
      - sequence of uni/multivariate series including series_reduction and at least one of component_reduction=None or time_reduction=None for "per time step metrics"
    - List[float]: Same as for type float but for a sequence of series
    - List[np.ndarray] Same as for type np.ndarray but for a sequence of series
  - 🔴 Other breaking changes:
    - quantile_loss():
      - renamed to mql() (Mean Quantile Loss)
      - renamed quantile parameter tau to q
      - the metric is now multiplied by a factor 2 to make the loss more interpretable (e.g. for q=0.5 it is identical to the MAE)
    - rho_risk():
      - renamed to qr() (Quantile Risk)
      - renamed quantile parameter rho to q
    - Renamed metric parameter reduction to series_reduction
    - Renamed metric parameter inter_reduction to component_reduction
    - Scaled metrics do not allow seasonality inference anymore with m=None.
    - Custom metrics using decorators multi_ts_support and multivariate_support must now act on multivariate series (possibly containing missing values) instead of univariate series.
- ForecastingModel.historical_forecasts():
  - 🔴 Improved historical forecasts output consistency based on the type of input series: If series is a sequence, historical forecasts will always return a sequence/list of the same length (instead of trying to reduce to a TimeSeries object).
    - TimeSeries: A single historical forecast for a single series and last_points_only=True: it contains only the predictions at step forecast_horizon from all historical forecasts.
    - List[TimeSeries] A list of historical forecasts for:
      - a sequence (list) of series and last_points_only=True: for each series, it contains only the predictions at step forecast_horizon from all historical forecasts.
      - a single series and last_points_only=False: for each historical forecast, it contains the entire horizon forecast_horizon.
    - List[List[TimeSeries]] A list of lists of historical forecasts for a sequence of series and last_points_only=False. For each series, and historical forecast, it contains the entire horizon forecast_horizon. The outer list is over the series provided in the input sequence, and the inner lists contain the historical forecasts for each series.
- ForecastingModel.backtest():
  - Metrics are now computed only once between all series and historical_forecasts, significantly speeding things up when using a large number of series.
  - Added support for scaled metrics as metric (such as ase, mase, ...). No extra code required, backtest extracts the correct insample series for you.
  - Added support for passing additional metric arguments with parameter metric_kwargs. This allows for example parallelization of the metric computation with n_jobs, customize the metric reduction with *_reduction, specify seasonality m for scaled metrics, etc..
  - 🔴 Improved backtest output consistency based on the type of input series, historical_forecast, and the applied backtest reduction:
    - float: A single backtest score for single uni/multivariate series, a single metric function and:
      - historical_forecasts generated with last_points_only=True
      - historical_forecasts generated with last_points_only=False and using a backtest reduction
    - np.ndarray: An numpy array of backtest scores. For single series and one of:
      - a single metric function, historical_forecasts generated with last_points_only=False and backtest reduction=None. The output has shape (n forecasts,).
      - multiple metric functions and historical_forecasts generated with last_points_only=False. The output has shape (n metrics,) when using a backtest reduction, and (n metrics, n forecasts) when reduction=None
      - multiple uni/multivariate series including series_reduction and at least one of component_reduction=None or time_reduction=None for "per time step metrics"
    - List[float]: Same as for type float but for a sequence of series. The returned metric list has length len(series) with the float metric for each input series.
    - List[np.ndarray] Same as for type np.ndarray but for a sequence of series. The returned metric list has length len(series) with the np.ndarray metrics for each input series.
  - 🔴 Other breaking changes:
    - reduction callable now acts on axis=1 rather than axis=0 to aggregate the metrics per series.
    - backtest will now raise an error when user supplied historical_forecasts don't have the expected format based on input series and the last_points_only value.
- ForecastingModel.residuals(). While the default behavior of residuals() remains identical, the method is now very similar to backtest() but that it computes a "per time step" metric on historical_forecasts:
  - Added support for multivariate series.
  - Added support for all historical_forecasts() parameters to generate the historical forecasts for the residuals computation.
  - Added support for pre-computed historical forecasts with parameter historical_forecasts.
  - Added support for computing the residuals with any of Darts' "per time step" metric with parameter metric (e.g. err(), ae(), ape(), ...). By default uses err() (Error).
  - Added support for parallelizing the metric computation across historical forecasts with parameter n_jobs.
  - 🔴 Improved residuals output and consistency based on the type of input series and historical_forecast:
    - TimeSeries: Residual TimeSeries for a single series and historical_forecasts generated with last_points_only=True.
    - List[TimeSeries] A list of residual TimeSeries for a sequence (list) of series with last_points_only=True. The residual list has length len(series).
    - List[List[TimeSeries]] A list of lists of residual TimeSeries for a sequence of series with last_points_only=False. The outer residual list has length len(series). The inner lists consist of the residuals from all possible series-specific historical forecasts.
Improvements to TimeSeries:(https://github.com/dennisbader).
- Performance boost for methods: slice_intersect(), has_same_time_as()
- New method slice_intersect_values(), which returns the sliced values of a series, where the time index has been intersected with another series.
🔴 Moved utils functions to clearly separate Darts-specific from non-Darts-specific logic:
- Moved function generate_index() from darts.utils.timeseries_generation to darts.utils.utils
- Moved functions retain_period_common_to_all(), series2seq(), seq2series(), get_single_series() from darts.utils.utils to darts.utils.ts_utils.

Here are the speed ups for different series lengths and number of components:

All were evaluated on single thread over 10k series.
Speed up is given as the computation time per series of the refactored metrics divided by original metrics implementation.
series per second is for the refactored metrics

ts_length	ts_components	speedup [t_new / t_old]	series_per_second
10	1	20.18	13673
100	1	23.04	13929
1000	1	21.59	13394
10000	1	22.58	10642
10	10	194.21	14023
100	10	202.35	13216
1000	10	135.92	9362
10000	10	41.68	2053
10	100	1818.85	13333
100	100	1497.29	9600
1000	100	402.6	2708
10000	100	42.34	201

darts/metrics/metrics.py

codecov-commenter · 2024-03-14T14:28:00Z

Codecov Report

Attention: Patch coverage is 94.77612% with 14 lines in your changes are missing coverage. Please review.

Project coverage is 93.91%. Comparing base (91c7087) to head (545c86d).

Files	Patch %	Lines
darts/utils/ts_utils.py	91.66%	7 Missing ⚠️
darts/models/forecasting/forecasting_model.py	93.87%	6 Missing ⚠️
darts/timeseries.py	96.87%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2284      +/-   ##
==========================================
- Coverage   93.95%   93.91%   -0.04%     
==========================================
  Files         136      137       +1     
  Lines       13687    13962     +275     
==========================================
+ Hits        12860    13113     +253     
- Misses        827      849      +22

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

review-notebook-app · 2024-03-14T15:13:37Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

… metrics, pre computed hist fcs, ...

madtoinou

The metrics look so much tidier now!

Some minor comments, mostly about documentation.

darts/metrics/metrics.py

darts/models/forecasting/forecasting_model.py

darts/tests/utils/test_residuals.py

darts/tests/utils/test_ts_utils.py

dennisbader added 12 commits March 4, 2024 14:24

metrics: avoid series intersection if they share the same time index

630bb44

Merge branch 'master' into refactor/metrics

4f18d0e

move timerseries utils to ts_utils.py

18522f8

move around helper functions to import utils from timeseries

0350f52

improve time series slicing

eba33c9

add unit tests for slice intersect

69c2ef2

add test for multivariate series

216f3cd

add tests for slice intersect values

451c57d

replace raise_if_not in favor of raise_log

bfdbece

update metrics docs

d7a7b58

update all metrics logic

bc519d0

fix quantile_loss

89a9a5c

dennisbader requested a review from madtoinou as a code owner March 14, 2024 13:57

VascoSch92 reviewed Mar 14, 2024

View reviewed changes

darts/metrics/metrics.py Show resolved Hide resolved

darts/metrics/metrics.py Show resolved Hide resolved

darts/metrics/metrics.py Show resolved Hide resolved

dennisbader added 3 commits March 14, 2024 15:53

fix notebooks

88c212f

update changelog

d7a1244

Merge branch 'master' into refactor/metrics

7376fec

dennisbader added 11 commits March 16, 2024 12:07

Merge branch 'master' into refactor/metrics

ce5affd

refactor backtest

de20ec3

update docs

8ac0ca6

update returns docs for metrics

86f5ed8

refactor backtest

317eb10

refactor wrappers to support metrics with insample

61cd3fa

update changelog

37518b1

add more tests for metric output shapes

d9cd67a

sanity checks for reduction functions

f67484a

add new metrics

6107317

update docs

59dd7b6

dennisbader added 12 commits March 22, 2024 11:27

make ForecastingModel.residuals work with multivariate series, custom…

7c105ec

… metrics, pre computed hist fcs, ...

make hist fc and backtest output consistent

7e28997

add historical forecasts tests for output type

9652b9a

improve backtest

f8a599e

update changelog

6971d50

scaled metrics now accept that are overlapping into

404a2f1

added support for scaled metrics to backtest and metric_kwargs

f803f33

update changelog

1417a8f

add unit tests for ts utils

545c86d

multiply quantile loss by 2

6fb1a72

fix notebooks p1

9cb7dbb

last changes for PR review

6a56c31

dennisbader mentioned this pull request Mar 27, 2024

add TimesSeries.from_group_dataframe parallel mode #2292

Merged

3 tasks

madtoinou approved these changes Apr 4, 2024

View reviewed changes

dennisbader added 2 commits April 4, 2024 14:52

apply suggestions from pr review

45d5a00

udpate quickstart notebook

0c433e8

dennisbader merged commit 5c97c9b into master Apr 4, 2024
8 of 9 checks passed

dennisbader deleted the refactor/metrics branch April 4, 2024 14:09

dwolffram mentioned this pull request May 14, 2024

Quantile forecasts are identical for all forecast horizons (XGBoost) #2382

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor/metrics #2284

Refactor/metrics #2284

dennisbader commented Mar 14, 2024 •

edited

codecov-commenter commented Mar 14, 2024 •

edited

review-notebook-app bot commented Mar 14, 2024

madtoinou left a comment

Refactor/metrics #2284

Refactor/metrics #2284

Conversation

dennisbader commented Mar 14, 2024 • edited

Summary

codecov-commenter commented Mar 14, 2024 • edited

Codecov Report

review-notebook-app bot commented Mar 14, 2024

madtoinou left a comment

Choose a reason for hiding this comment

dennisbader commented Mar 14, 2024 •

edited

codecov-commenter commented Mar 14, 2024 •

edited