-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Scaler with rolling/expanding window to eliminate look ahead bias #1540
Comments
Another way would be to allow users to pass some (of our existing) transformers for the target and covariates to historical_forecasts. |
Thanks for raising this excellent point @tuomijal. I personally like the solution of @dennisbader as it would remove the need for users to use the windowing exactly right - they wouldn't have to worry about it, only specify which kind of scaling they want when calling historical forecasts. |
I agree, the solution proposed by @dennisbader is the most elegant one 👍🏼 |
Hi @dennisbader @madtoinou ! The PR looks cool, could I pick it up? |
Hi @JanFidor,
|
Sure things, If there's a chance to avoid major merge conflicts, I'll happily take it. I'll keep my eyes peeled for the new release! |
Refactoring of historical forecasts has just been merged on the main branch, if you still have time to work on this, you can go ahead! The logic is now found in two different places: |
Hi @madtoinou, thanks for reminding me, this issue totally slipped my mind! Small heads up, I might have slightly less time going forward, but I'll happily give it a go. I already browsed the |
Indeed, we decided to optimize this method step by step and the "retrain" logic was a bit harder to support directly. I think that the main source of data leakage is the processing/transformation of the entire input series instead of just the part available/used for the latest historical forecast (for both retraining and/or inference). So the logic should be contained in |
Any news on this? Lack of this feature means that if backcast or other more involved testing procedures are needed, Darts is quite unusable, since those transforms are really necessary (e.g. differencing, scaling). |
Yeah, I'm also quite keen on this being part of darts. I was hoping the pipeline class would function more like sklearns pipeline where transformations and models can be bundled together. Then if the pipeline class had backtest / historical_forecast we could be sure of no data leakage during backtesting. |
+1 this is crucial to be able to user darts' backtest functionality |
Problem description
Currently, Scaler transforms input series "globally", meaning that all values of the input vector are considered:
This is not a problem if data is manually split into train and test sets and scaler is fitted with training set.
However, if we go on to use this vector as an input to historical forecasts, we risk introducing look ahead bias into analysis (at least when performing normalization). To eliminate this bias, rolling window approach is sometimes used https://arxiv.org/abs/1907.09452.
Describe proposed solution
One solution would be to add parameters to scaler like so:
Describe potential alternatives
Another option is to integrate this functionality to historical_forecasts and backtest functions. This might be convenient because parameters above could be inferred by the desired backtesting setup.
Additional context
Thank you again for excellent software!
The text was updated successfully, but these errors were encountered: