Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backtest(): Retrain every n steps #135

Closed
aschl opened this issue Jul 10, 2020 · 11 comments
Closed

backtest(): Retrain every n steps #135

aschl opened this issue Jul 10, 2020 · 11 comments
Labels
feature request Use this label to request a new feature good first issue Good for newcomers

Comments

@aschl
Copy link

aschl commented Jul 10, 2020

Recently found your library and it is really great and handy!
The evaluation of the forecasting performance is best done using a cross-validation approach. The backtest_forecasting()-function does that - although it currently iterates and re-trains the model on every single time step. In my application, I am training ten-thousands of different time series and it becomes computationally unfeasible to retrain on every time step. Another approach already implemented in scikit-learn is TimeSeriesSplit(), which generates a k-fold cross-validation split designed for time series.
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html

One solution to the issue would be to add support for different cross-validation techniques implemented in scikit-learn (such as TimeSeriesSplit()). Another solution I see could be to add a stride parameter in backtest_forecasting() that specifies how many time steps it should skip while backtesting.

@aschl aschl added feature request Use this label to request a new feature triage Issue waiting for triaging labels Jul 10, 2020
@hrzn
Copy link
Contributor

hrzn commented Jul 11, 2020

Thanks for pointing this out @aschl. We are currently re-working quite substantially the backtesting functionality, and indeed our plan is to introduce a stride parameter like you suggest. We will also introduce moving windows for the training set (instead of expending only, as now). These should come in a release sometime soon.

@aschl
Copy link
Author

aschl commented Jul 11, 2020

Awesome! Looking forward to seeing this feature. The idea of incorporating the moving window is also great and very useful.
When do you approximately plan to release this?

@guillaumeraille
Copy link
Contributor

guillaumeraille commented Jul 13, 2020

Hello @aschl, while we don't have a date yet for the release of this feature. If this something you would like to try as soon as possible the stride parameter has already been added to the backtest_forecasting method on the develop branch. Keep in mind that we are currently working on a quite heavy refactor of the whole backtesting.py file and the API for backtesting might change in the upcoming release.

@aschl
Copy link
Author

aschl commented Jul 16, 2020

I checked backtesting.py in the develop branch. If I understand it correctly, the stride parameters determines the distance between two consecutive predictions. This might be helpful if e.g. you only want to predict January values (and skip Feb-Dec forecast). I think there are quite a few use cases when you need all prediction points but would like to train the model only on every x time steps. This would increase efficiency quite a lot without necessarily loosing predictive power (adding a few more samples to a series of few hundert observations doesn't substantially improve the model if the distribution remains similar).

The idea behind scikitlearn's TimeSeriesSplit() is exactly this. You split the time series in k cross-validation pieces and retrain the model only k times. This way you obtain a relatively unbiased estimate of the forecasting error with reasonable computational efforts.

(I see that you implemented a retrain argument, which goes into this direction. This would however only allow to either retrain for each prediction or only once with the first sample.)

@LeoTafti LeoTafti removed the triage Issue waiting for triaging label Oct 4, 2020
@LeoTafti
Copy link
Contributor

LeoTafti commented Oct 16, 2020

Hi @aschl and sorry for the delay answering your last comment. If I understand correctly, your suggestion would be to replace retrain:bool=True with something like retrain_every_n:int = 1, am I correct ?

If that's the case, I think you make a good point. Feel free to contribute if you can! Otherwise I'll put it on our radar.

@LeoTafti LeoTafti changed the title Backtesting vs. sklearn's TimeSeriesSplit backtest(): Retrain every n steps Oct 16, 2020
@aschl
Copy link
Author

aschl commented Oct 16, 2020

Yes. Exactly.
I am currently busy with some other projects. Not sure when I will find time to come back to this. Please keep me updated.

@LeoTafti LeoTafti added the good first issue Good for newcomers label Oct 16, 2020
@pravarmahajan
Copy link

Can I take at this one up if this is still open?

@hrzn
Copy link
Contributor

hrzn commented Jul 17, 2021

@pravarmahajan sorry for the late reply. We would be very happy to receive a PR to introduce a "retrain every N steps" feature to historical forecasting / backtesting!

@deltamacht
Copy link

Just looping back to this. As I understand, #1139 addressed the concern on retraining every n steps in the retrain behavior in backtest(), but this parameter isn't exposed in the gridsearch method. With regards to the discussion above about having some behavior that would be similar to Sklearn's TimeSeriesSplit, am I correct in thinking that this type of cross validation isn't easily specified in the gridsearch method at present?

@hrzn
Copy link
Contributor

hrzn commented Nov 2, 2022

You're right @deltamacht. Gridsearch is still pretty basic... There's however a stride parameter in gridsearch() which maybe goes some way into doing what you're looking for.

@hrzn
Copy link
Contributor

hrzn commented Jan 5, 2023

Implemented in v0.22.0 🚀

@hrzn hrzn closed this as completed Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Use this label to request a new feature good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

6 participants