Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor/metrics #2284

Merged
merged 40 commits into from Apr 4, 2024
Merged

Refactor/metrics #2284

merged 40 commits into from Apr 4, 2024

Conversation

dennisbader
Copy link
Collaborator

@dennisbader dennisbader commented Mar 14, 2024

Checklist before merging this PR:

  • Mentioned all issues that this PR fixes or addresses.
  • Summarized the updates of this PR under Summary.
  • Added an entry under Unreleased in the Changelog.

Fixes #2249, fixes #2233, closes #2031.

Summary

  • 🚀🚀🚀 Improvements to metrics, historical forecasts, backtest, and residuals through major refactor. The refactor includes optimization of multiple process and improvemenets to consistency, reliability, and the documentation. Some of these necessary changes come at the cost of breaking changes.
    • Metrics:
      • Optimized all metrics, which now run >20 times faster than before for univariate series, and >>20 times for multivariate series. This boosts direct metric computations as well as backtesting and residuals computation!
      • Added new metrics:
        • Time aggregated metric merr() (Mean Error)
        • Time aggregated scaled metrics rmsse(), and msse(): The (Root) Mean Squared Scaled Error.
        • "Per time step" metrics that return a metric score per time step: err() (Error), ae() (Absolute Error), se() (Squared Error), sle() (Squared Log Error), ase() (Absolute Scaled Error), sse (Squared Scaled Error), ape() (Absolute Percentage Error), sape() (symmetric Absolute Percentage Error), arre() (Absolute Ranged Relative Error), ql (Quantile Loss)
      • All scaled metrics now accept insample series that can be overlapping into pred_series (before that had to end exactly one step before pred_series). Darts will handle the correct time extraction for you.
      • Improvements to the documentation:
        • Added a summary list of all metrics to the metrics documentation page
        • Standardized the documentation of each metric (added formula, improved return documentation, ...)
      • 🔴 Improved metric output consistency based on the type of input series, and the applied reductions:
        • float: A single metric score for:
          • single univariate series
          • single multivariate series with component_reduction
          • sequence (list) of uni/multivariate series with series_reduction and component_reduction (and time_reduction for "per time step metrics")
        • np.ndarray: A numpy array of metric scores. The array has shape (n time steps, n components) without time and component reductions. The time dimension is only available for "per time step" metrics. For:
          • single multivariate series and at least component_reduction=None for time aggregated metrics.
          • single uni/multivariate series and at least time_reduction=None for "per time step metrics"
          • sequence of uni/multivariate series including series_reduction and at least one of component_reduction=None or time_reduction=None for "per time step metrics"
        • List[float]: Same as for type float but for a sequence of series
        • List[np.ndarray] Same as for type np.ndarray but for a sequence of series
      • 🔴 Other breaking changes:
        • quantile_loss():
          • renamed to mql() (Mean Quantile Loss)
          • renamed quantile parameter tau to q
          • the metric is now multiplied by a factor 2 to make the loss more interpretable (e.g. for q=0.5 it is identical to the MAE)
        • rho_risk():
          • renamed to qr() (Quantile Risk)
          • renamed quantile parameter rho to q
        • Renamed metric parameter reduction to series_reduction
        • Renamed metric parameter inter_reduction to component_reduction
        • Scaled metrics do not allow seasonality inference anymore with m=None.
        • Custom metrics using decorators multi_ts_support and multivariate_support must now act on multivariate series (possibly containing missing values) instead of univariate series.
    • ForecastingModel.historical_forecasts():
      • 🔴 Improved historical forecasts output consistency based on the type of input series: If series is a sequence, historical forecasts will always return a sequence/list of the same length (instead of trying to reduce to a TimeSeries object).
        • TimeSeries: A single historical forecast for a single series and last_points_only=True: it contains only the predictions at step forecast_horizon from all historical forecasts.
        • List[TimeSeries] A list of historical forecasts for:
          • a sequence (list) of series and last_points_only=True: for each series, it contains only the predictions at step forecast_horizon from all historical forecasts.
          • a single series and last_points_only=False: for each historical forecast, it contains the entire horizon forecast_horizon.
        • List[List[TimeSeries]] A list of lists of historical forecasts for a sequence of series and last_points_only=False. For each series, and historical forecast, it contains the entire horizon forecast_horizon. The outer list is over the series provided in the input sequence, and the inner lists contain the historical forecasts for each series.
    • ForecastingModel.backtest():
      • Metrics are now computed only once between all series and historical_forecasts, significantly speeding things up when using a large number of series.
      • Added support for scaled metrics as metric (such as ase, mase, ...). No extra code required, backtest extracts the correct insample series for you.
      • Added support for passing additional metric arguments with parameter metric_kwargs. This allows for example parallelization of the metric computation with n_jobs, customize the metric reduction with *_reduction, specify seasonality m for scaled metrics, etc..
      • 🔴 Improved backtest output consistency based on the type of input series, historical_forecast, and the applied backtest reduction:
        • float: A single backtest score for single uni/multivariate series, a single metric function and:
          • historical_forecasts generated with last_points_only=True
          • historical_forecasts generated with last_points_only=False and using a backtest reduction
        • np.ndarray: An numpy array of backtest scores. For single series and one of:
          • a single metric function, historical_forecasts generated with last_points_only=False and backtest reduction=None. The output has shape (n forecasts,).
          • multiple metric functions and historical_forecasts generated with last_points_only=False. The output has shape (n metrics,) when using a backtest reduction, and (n metrics, n forecasts) when reduction=None
          • multiple uni/multivariate series including series_reduction and at least one of component_reduction=None or time_reduction=None for "per time step metrics"
        • List[float]: Same as for type float but for a sequence of series. The returned metric list has length len(series) with the float metric for each input series.
        • List[np.ndarray] Same as for type np.ndarray but for a sequence of series. The returned metric list has length len(series) with the np.ndarray metrics for each input series.
      • 🔴 Other breaking changes:
        • reduction callable now acts on axis=1 rather than axis=0 to aggregate the metrics per series.
        • backtest will now raise an error when user supplied historical_forecasts don't have the expected format based on input series and the last_points_only value.
    • ForecastingModel.residuals(). While the default behavior of residuals() remains identical, the method is now very similar to backtest() but that it computes a "per time step" metric on historical_forecasts:
      • Added support for multivariate series.
      • Added support for all historical_forecasts() parameters to generate the historical forecasts for the residuals computation.
      • Added support for pre-computed historical forecasts with parameter historical_forecasts.
      • Added support for computing the residuals with any of Darts' "per time step" metric with parameter metric (e.g. err(), ae(), ape(), ...). By default uses err() (Error).
      • Added support for parallelizing the metric computation across historical forecasts with parameter n_jobs.
      • 🔴 Improved residuals output and consistency based on the type of input series and historical_forecast:
        • TimeSeries: Residual TimeSeries for a single series and historical_forecasts generated with last_points_only=True.
        • List[TimeSeries] A list of residual TimeSeries for a sequence (list) of series with last_points_only=True. The residual list has length len(series).
        • List[List[TimeSeries]] A list of lists of residual TimeSeries for a sequence of series with last_points_only=False. The outer residual list has length len(series). The inner lists consist of the residuals from all possible series-specific historical forecasts.
  • Improvements to TimeSeries:(https://github.com/dennisbader).
    • Performance boost for methods: slice_intersect(), has_same_time_as()
    • New method slice_intersect_values(), which returns the sliced values of a series, where the time index has been intersected with another series.
  • 🔴 Moved utils functions to clearly separate Darts-specific from non-Darts-specific logic:
    • Moved function generate_index() from darts.utils.timeseries_generation to darts.utils.utils
    • Moved functions retain_period_common_to_all(), series2seq(), seq2series(), get_single_series() from darts.utils.utils to darts.utils.ts_utils.

Here are the speed ups for different series lengths and number of components:

  • All were evaluated on single thread over 10k series.
  • Speed up is given as the computation time per series of the refactored metrics divided by original metrics implementation.
  • series per second is for the refactored metrics
ts_length ts_components speedup [t_new / t_old] series_per_second
10 1 20.18 13673
100 1 23.04 13929
1000 1 21.59 13394
10000 1 22.58 10642
10 10 194.21 14023
100 10 202.35 13216
1000 10 135.92 9362
10000 10 41.68 2053
10 100 1818.85 13333
100 100 1497.29 9600
1000 100 402.6 2708
10000 100 42.34 201

darts/metrics/metrics.py Show resolved Hide resolved
darts/metrics/metrics.py Show resolved Hide resolved
darts/metrics/metrics.py Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Mar 14, 2024

Codecov Report

Attention: Patch coverage is 94.77612% with 14 lines in your changes are missing coverage. Please review.

Project coverage is 93.91%. Comparing base (91c7087) to head (545c86d).

Files Patch % Lines
darts/utils/ts_utils.py 91.66% 7 Missing ⚠️
darts/models/forecasting/forecasting_model.py 93.87% 6 Missing ⚠️
darts/timeseries.py 96.87% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2284      +/-   ##
==========================================
- Coverage   93.95%   93.91%   -0.04%     
==========================================
  Files         136      137       +1     
  Lines       13687    13962     +275     
==========================================
+ Hits        12860    13113     +253     
- Misses        827      849      +22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Collaborator

@madtoinou madtoinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics look so much tidier now!

Some minor comments, mostly about documentation.

darts/metrics/metrics.py Show resolved Hide resolved
darts/metrics/metrics.py Outdated Show resolved Hide resolved
darts/metrics/metrics.py Outdated Show resolved Hide resolved
darts/metrics/metrics.py Outdated Show resolved Hide resolved
darts/models/forecasting/forecasting_model.py Show resolved Hide resolved
darts/models/forecasting/forecasting_model.py Outdated Show resolved Hide resolved
darts/models/forecasting/forecasting_model.py Outdated Show resolved Hide resolved
darts/tests/utils/test_residuals.py Outdated Show resolved Hide resolved
darts/tests/utils/test_ts_utils.py Outdated Show resolved Hide resolved
@dennisbader dennisbader merged commit 5c97c9b into master Apr 4, 2024
8 of 9 checks passed
@dennisbader dennisbader deleted the refactor/metrics branch April 4, 2024 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants