Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#1101 Implemented min_train_series_length for Theta and FourTheta #1111

Merged
merged 8 commits into from
Aug 9, 2022

Conversation

rijkvandermeulen
Copy link
Contributor

This PR implements the min_train_series_length() method for the FourTheta and Theta models that overwrites the minimum default of 3 training samples by 2*seasonal_period when appropriate.

Summary

When seasonality is used in the Theta and FourTheta models the minimum number of training samples should be 2*seasonal_period. Since the min_train_series_length() method wasn't implemented for the theta models, however, it always defaulted to 3 (for seasonal as well as non-seasonal models).

Other Information

Note that there is one case that the newly implemented min_train_series_length() method does not address: namely when season_mode != SeasonalityMode.NONE and seasonality_period==None. In this case, for us to be able to determine the minimum number of training samples, we need to first retrieve the seasonality_period from the time series. We can actually do this (for example by using the check_seasonality() function in utils.statistics). However, I wasn't entirely sure whether this would be desirable. The issue that I saw was that you lose your strict decoupling of models and time series. I.e., you cannot retrieve the min_train_series_length property from the model alone anymore - you'd have to pass the time series itself as well. Another possibility would be to make the seasonal_period a mandatory argument (probably not desired though I guess).

Looking forward to hearing your thoughts and feedback

Best,
Rijk

Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!
Concerning the adaptation when the seasonality is not known beforehand, I agree with your observation that we should leave "data" (time series) and models decoupled. So I think we can live with that for now.

@@ -76,3 +76,51 @@ def test_best_model(self):
self.assertTrue(
mape(val_series, forecast_best) <= mape(val_series, forecast_random)
)

def test_min_train_series_length_with_seasonality(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding unit tests :)

@codecov-commenter
Copy link

codecov-commenter commented Aug 7, 2022

Codecov Report

Merging #1111 (c9914e3) into master (5ceac68) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head c9914e3 differs from pull request most recent head fe9fbc4. Consider uploading reports for the commit fe9fbc4 to get more accurate results

@@            Coverage Diff             @@
##           master    #1111      +/-   ##
==========================================
- Coverage   93.76%   93.75%   -0.01%     
==========================================
  Files          80       80              
  Lines        8207     8204       -3     
==========================================
- Hits         7695     7692       -3     
  Misses        512      512              
Impacted Files Coverage Δ
darts/models/forecasting/theta.py 89.72% <100.00%> (+0.58%) ⬆️
darts/timeseries.py 92.23% <0.00%> (-0.07%) ⬇️
...arts/models/forecasting/torch_forecasting_model.py 89.54% <0.00%> (-0.05%) ⬇️
darts/models/forecasting/block_rnn_model.py 98.24% <0.00%> (-0.04%) ⬇️
darts/models/forecasting/nhits.py 98.55% <0.00%> (-0.02%) ⬇️
darts/datasets/__init__.py 100.00% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

CHANGELOG.md Outdated
@@ -34,6 +34,7 @@ Darts is still in an early development phase and we cannot always guarantee back
- An issue with arguments being reverted for the `metric` function of gridsearch and backtest [#989](https://github.com/unit8co/darts/pull/989) by [Clara Grotehans](https://github.com/ClaraGrthns).
- An error checking whether `fit()` has been called in global models [#944](https://github.com/unit8co/darts/pull/944) by [Julien Herzen](https://github.com/hrzn).
- An error in Gaussian Process filter happening with newer versions of sklearn [#963](https://github.com/unit8co/darts/pull/963) by [Julien Herzen](https://github.com/hrzn).
- Implemented the min_train_series_length method for the FourTheta and Theta models that overwrites the minimum default of 3 training samples by 2*seasonal_period when appropriate [#1101](https://github.com/unit8co/darts/pull/1101) by [Rijk van der Meulen](https://github.com/rijkvandermeulen)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rijkvandermeulen here too, could you move this to the Unreleased section? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jup - done!

@hrzn hrzn merged commit 5ef13ba into unit8co:master Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants