Feature/four theta #123

Droxef · 2020-07-03T12:49:53Z

Implement 4Theta method from M4 competition

Summary

Naive implementation of the 4Theta method from the organizers of M4 competition.
Modify backtesting gridsearch to let an automatic search of the best hyper-parameters

…FourTheta doc(gridsearch): correct type in doc

TheMP · 2020-07-06T13:54:35Z

darts/models/theta.py


        # Linear Regression part of the decomposition. We select the degree one coefficient.
        b_theta = np.polyfit(np.array([i for i in range(0, self.length)]), (1.0 - self.theta) * new_ts.values(), 1)[0]

        # Normalization of the coefficient b_theta.
-        self.coef = b_theta / (2.0 - self.theta)
+        self.coef = b_theta / (2.0 - self.theta)  # change to b_theta / (-self.theta) if classical theta


Did you manage to figure out why is the formula written like this?

Mathematically, I'm still not quite sure about the rest, but I am sure it is correct for theta = 0 and 1.
And the definition of the theta lines are not quite the same, thus the confusing theta.

But comparing with the classical theta (4theta with default parameter), we have the exact same results if we change the normalization by -1/theta. (because there is a symmetry: 2-theta_1 = theta_2)

Could we unify the meaning of theta? E.g. by changing to b_theta / (-self.theta) and adapting the default value to theta=2 ?

TheMP · 2020-07-06T13:58:52Z

darts/models/theta.py

+            linreg = new_ts.values()
+        elif self.trend_mode == 'exponential':
+            linreg = np.log(new_ts.values())
+        else:


Will be less code and more coherent with other implementations if we check for either linear or exponential with assert and throw an exception if none of the values is selected.

TheMP · 2020-07-06T13:59:17Z

darts/models/theta.py

+            theta_t = self.theta * new_ts.values() + (1 - self.theta) * theta0_in
+        elif self.model_mode == 'multiplicative' and (theta0_in > 0).all():
+            theta_t = (new_ts.values() ** self.theta) * (theta0_in ** (1 - self.theta))
+        else:


Similar comment as before, we can assert at the beginning and throw an exception

For a first draft, I blindly replicated the step of the original alorithm. So I will change it.
But we still need to check if all values are positive if 'multiplicative' models are chosen or else it will fail.
I can either raise an error, or simply keep the fallback as it is.

There should be a more optimized implementation, like our theta. But we need to do the math to find it

TheMP · 2020-07-06T14:02:05Z

darts/models/theta.py

+
+            replicated_seasonality = np.tile(self.seasonality.pd_series()[-self.season_period:],
+                                             math.ceil(n / self.season_period))[:n]
+            if self.season_mode in ['multiplicative', 'mul']:


for simplicity I would also pick one of them, and also an ENUM that we pass here might be helpful to avoid typos https://docs.python.org/3/library/enum.html

To be sure I understand correctly, the different model arguments will accept an Enum value?
I can change the theta method as well in the same manner. Thus Enum classes will be declared exterior to the class

Yes this comment actually applies to both theta methods - instead of retyping "multiplicative"/"additive" string we can have an enum that would have 2 values - MULTIPLICATIVE or ADDITIVE and if you make a typo the IDE will tell you something is not matching (not the case for raw strings)

…FourTheta

pennfranc · 2020-07-08T09:49:32Z

darts/backtesting/backtesting.py

@@ -313,15 +313,16 @@ def backtest_gridsearch(model_class: type,

    Parameters
    ----------
-    model
+    model_class


I let this slip through, thanks!

pennfranc · 2020-07-08T10:08:18Z

darts/backtesting/backtesting.py

+        elif val_series == 'train':
+            model.fit(train_series)
+            # Use ndarray because casting to TimeSeries takes too much time
+            error = metric(model.fitted_values, train_series.univariate_values())


As far as I can tell, FourTheta is the only model that possesses the attribute fitted_values. So it would be good to have a check here to test whether the given model supports this functionality, right?

At least all ExponentialSmoothing have a fitted attribute that can be retrieved (So the current theta model could have one too, I think). If you approve this functionality, it might be interesting to add this attribute in other models.
But yes, a check is necessary.

pennfranc · 2020-07-08T10:45:07Z

darts/models/theta.py

@@ -88,13 +89,13 @@ def fit(self, series: TimeSeries, component_index: Optional[int] = None):
            new_ts = remove_seasonality(ts, self.season_period, model=self.mode)

        # SES part of the decomposition.
-        self.model = hw.SimpleExpSmoothing(new_ts.values()).fit()
+        self.model = hw.SimpleExpSmoothing(new_ts.values()).fit(initial_level=0.2)


What is the effect of this change?

The problem with SES in statsmodels is that alpha is not constrained between [0.1, 0.99] as it should, thus giving NaN values when alpha is 0.
Setting the initial_level to 0.2 (or anything else) avoided all encountered cases where the optimization gave alpha=0.
It is more a hotfix rather than an actual solution before statsmodels corrects the problem.

Seems like it leads to different results. I will add it only in the case alpha is 0, and it will recompute it

…or bugs

…it file

…FourTheta

* Add normalization choice * Add comment to be clearer * Correct the docs * clean the code and add a check on mean=0

…FourTheta

hrzn

I got few requested changes, but overall it looks good @Droxef
Thanks for the work!

hrzn · 2020-07-16T07:40:33Z

darts/backtesting/backtesting.py

@@ -364,17 +365,23 @@ def backtest_gridsearch(model_class: type,
    For every hyperparameter combination, the model is trained on `train_series` and
    evaluated on `val_series`.

+    Comparison with fitted values (activated when `use_fitted_values` is passed):
+    For every hyperparameter combination, the model is trained on `train_series` and the resulting


Small proposed modification:

For every hyperparameter combination, the model is trained on `train_series` and evaluated on the resulting fitted values. Not all models have fitted values, and this method raises an error if `model.fitted_values` doesn't exist. The fitted values are the result of the fit of the model on the training series. Comparing with the fitted values can be a quick way to assess the model, but one cannot see if the model overfits or underfits.

hrzn · 2020-07-16T07:41:41Z

darts/backtesting/backtesting.py

@@ -389,6 +396,9 @@ def backtest_gridsearch(model_class: type,
        as argument to the predict method of `model`.
    num_predictions:
        The number of train/prediction cycles performed in one iteration of expanding window mode.
+    use_fitted_values
+        If `True`, it will activates the comparison with the fitted values, if `fitted_values` is an attribute of


If True, uses the comparison with the fitted values. Raises an error if fitted_values is not an attribute of model_class.

hrzn · 2020-07-16T07:49:35Z

darts/backtesting/backtesting.py

        raise_if_not(train_series.width == val_series.width, "Training and validation series require the same"
                     " number of components.", logger)

-    raise_if_not((fcast_horizon_n is None) ^ (val_series is None),
-                 "Please pass exactly one of the arguments 'forecast_horizon_n' or 'val_series'.", logger)
+    raise_if_not(bool((fcast_horizon_n is None) ^ (val_series is None) ^ use_fitted_values


I find this a bit hard to read. How about

raise_if_not(((fcast_horizon_n is not None) + (val_series is not None) + use_fitted_values) == 1)

Also I would perform this check at the top of the method.

hrzn · 2020-07-16T07:55:56Z

darts/backtesting/backtesting.py


    fit_kwargs, predict_kwargs = _create_parameter_dicts(model_class(), target_indices, component_index,
                                                         use_full_output_length)

-    if val_series is None:
+    if (val_series is None) & (not use_fitted_values):


Could you use logical operators (and) instead of binary ones when possible?

hrzn · 2020-07-16T07:58:59Z

darts/__init__.py

+from enum import Enum
+
+
+class Season(Enum):


Could we call these SeasonalityMode, TrendMode and ModelMode ? wdyt?

hrzn · 2020-07-16T08:50:16Z

darts/models/theta.py

+            self.fitted_values *= self.mean
+        # Takes too much time to create a TimeSeries
+        # Overhead: 30% ± 10 (2-10 ms in average)
+        self.fitted_values = TimeSeries.from_times_and_values(ts.time_index(), self.fitted_values)


You could build the TimeSeries only in the backtesting method, this way the time overhead is paid only for backtesting and not for simply fitting the model.

hrzn · 2020-07-16T08:53:24Z

darts/tests/test_autoregression_models.py

@@ -37,6 +36,8 @@ class AutoregressionModelsTestCase(unittest.TestCase):
        (Theta(), 11.3),
        (Theta(1), 20.2),
        (Theta(3), 9.8),
+        (FourTheta(1), 20.2),
+        (FourTheta(-1), 9.8),


could you add testing for a few more modes?

hrzn · 2020-07-16T08:55:11Z

darts/models/theta.py

+    def select_best_model(ts: TimeSeries, thetas: Optional[List[int]] = None,
+                          m: Optional[int] = None, normalization: bool = True) -> 'FourTheta':
+        """
+        Performs a grid search over all hyper parameters to select the best model.


Perhaps mention that it is using the fitted values on the training series here.

hrzn · 2020-07-16T08:55:38Z

darts/tests/test_4theta.py

+        model.fit(train_series)
+        best_model.fit(train_series)
+        forecast_random = model.predict(10)
+        forecast_best = model.predict(10)


Suggested change

forecast_best = model.predict(10)

forecast_best = best_model.predict(10)

hrzn · 2020-07-16T08:57:59Z

darts/models/theta.py

+        return self._build_forecast_series(forecast)
+
+    @staticmethod
+    def select_best_model(ts: TimeSeries, thetas: Optional[List[int]] = None,


We can keep it temporarily, but then eventually I think we should have each model (if possible) given good reasonable default hyper parameters sets, and then let the user call backtest_gridsearch(params='default', ...) or something like this.
Ping @grll

…sting

…FourTheta

hrzn

Thanks @Droxef, nice addition!

hrzn · 2020-07-17T20:26:15Z

darts/models/theta.py

-            Type of seasonality. Either "additive" or "multiplicative".
+        season_mode
+            Type of seasonality.
+            Either SeasonalityMode.MULTIPLICATIVE, SeasonalityMode.ADDITIVE or SeasonalityMode.NONE.


hrzn · 2020-07-17T20:26:49Z

darts/models/theta.py

+        or an inferred seasonality period.
+
+        When called with `theta = 2 - X`, `model_mode = Model.ADDITIVE` and `trend_mode = Trend.LINEAR`,
+        this model is equivalent to calling `Theta(theta=X)`.


Solved after our discussion - The original Theta implementation is faster.

hrzn · 2020-07-17T20:29:32Z

darts/models/theta.py

+        # will lead to fitted_values similar to ts. But one cannot see if it overfits.
+        if self.normalization:
+            self.fitted_values *= self.mean
+        # Takes too much time to create a TimeSeries


I think you can now remove this comment (or move to the corresponding backtesting function)

…o feat/FourTheta

…FourTheta

Droxef added 7 commits July 3, 2020 12:53

feat(4Theta): naive implementation of 4Theta model

01e1837

fix(theta): avoid NaN values in theta, and unnecessary season test

f6831be

feat(gridsearch): add possibility to compare with model.fitted_values

53ceeab

feat(4theta): add a method to auto select best model

7ae0803

Merge branch 'develop' of https://github.com/unit8co/darts into feat/…

4001009

…FourTheta doc(gridsearch): correct type in doc

refactor(4Theta): Specify univariate model

264ddf3

style(4Theta): Fix linter

e5bc790

Droxef requested review from hrzn and TheMP as code owners July 3, 2020 12:49

Droxef added 5 commits July 3, 2020 14:59

style(4Theta): fix docstring

e951554

style(4Theta): Fix docstring

f596476

style(4Theta): Change link

b712c7d

style(4Theta): Correct docstring

dee5b0d

style(4theta): correct ticks in docstring

ba64351

TheMP reviewed Jul 6, 2020

View reviewed changes

Droxef added 3 commits July 7, 2020 11:02

Merge branch 'develop' of https://github.com/unit8co/darts into feat/…

36d99fe

…FourTheta

Merge branch 'develop' of https://github.com/unit8co/darts into feat/…

1dc5bb3

…FourTheta

refactor(4Theta): change different modes verification and add Enum

4d24f75

pennfranc reviewed Jul 8, 2020

View reviewed changes

Droxef added 9 commits July 8, 2020 14:04

refactor(theta): replace all string modes by Enum

240cea4

test(backtesting): Add a test to verify if fitted_values exist

7d5dffa

Fix(Theta): Correct all Enums

b19cf97

fix(Theta): compare with enum members value instead. Correct some min…

bbd58cc

…or bugs

fix(4theta): move the creation of enums in init file

71aa8d3

test(4theta): Add 4Theta to autoregressive test. Move Enums to top in…

8286062

…it file

test(4theta): Add 4Theta specific test

af69e63

style(backtesting): fix lint

97f6956

test(4theta): Add another exception to test

97cff2a

Droxef added 9 commits July 13, 2020 14:20

style(Theta): rename mode to season_mode to be consistent w/ FourTheta

2bad4f7

Merge branch 'develop' of https://github.com/unit8co/darts into feat/…

49fafda

…FourTheta

docs(thetas): correct errors in the different docs

fe1b320

refactor(4Theta): Correct

fc04e68

* Add normalization choice * Add comment to be clearer * Correct the docs * clean the code and add a check on mean=0

refactor(backtesting): add a 'use_fitted_values' parameter

cae49f6

fix(4theta): correct select_best_model

27b9143

test(4Theta): add a test for zero mean and correct others

2fa7f4f

Merge branch 'develop' of https://github.com/unit8co/darts into feat/…

5ed92bd

…FourTheta

style(backtesting): linter formatting

a2b64e4

hrzn suggested changes Jul 16, 2020

View reviewed changes

Droxef added 11 commits July 16, 2020 13:38

refactor(4Theta): change Enums names, correct theta and backtesting docs

98d1b7a

refactor(4theta): move creation of fitted_values timeseries to backte…

f405e8b

…sting

Merge branch 'develop' of https://github.com/unit8co/darts into feat/…

9cad7cf

…FourTheta

refactor(statistics): include Enums in extract and remove functions

1e6be8e

refactor(4Theta): check earlier if univariate

2f9bc5b

test(4Theta): correct backtesting and test best_model

a405e47

test(4Theta): add new modes in test models

25f3de2

docs(4theta): Add a disclaimer for 4theta performance

e15f52b

refactor(Theta): change theta to have the same behavior as FourTheta

61c0f17

examples(darts-intro): modify notebook to give the same results

e63e215

style(4Theta): correct deprecation warning for logger.warn

be1427d

hrzn approved these changes Jul 17, 2020

View reviewed changes

hrzn and others added 4 commits July 17, 2020 22:35

Merge branch 'develop' into feat/FourTheta

34a8c41

style(4theta): move comment to backtesting

d8e890f

Merge branch 'feat/FourTheta' of https://github.com/unit8co/darts int…

e02d65d

…o feat/FourTheta

Merge branch 'develop' of https://github.com/unit8co/darts into feat/…

8e92eba

…FourTheta

Droxef merged commit 55e4e42 into develop Jul 20, 2020

Droxef deleted the feat/FourTheta branch July 20, 2020 07:24

Droxef mentioned this pull request Jul 20, 2020

Fix Develop #156

Merged

madtoinou mentioned this pull request Nov 23, 2022

Fix Theta model NaN predictions #655

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/four theta #123

Feature/four theta #123

Droxef commented Jul 3, 2020

TheMP Jul 6, 2020

Droxef Jul 7, 2020

hrzn Jul 16, 2020

TheMP Jul 6, 2020

TheMP Jul 6, 2020

Droxef Jul 7, 2020

Droxef Jul 7, 2020

TheMP Jul 6, 2020

Droxef Jul 7, 2020

TheMP Jul 7, 2020

pennfranc Jul 8, 2020

pennfranc Jul 8, 2020

Droxef Jul 8, 2020

pennfranc Jul 8, 2020

Droxef Jul 8, 2020

Droxef Jul 8, 2020

hrzn left a comment

hrzn Jul 16, 2020

hrzn Jul 16, 2020

hrzn Jul 16, 2020

hrzn Jul 16, 2020

hrzn Jul 16, 2020

hrzn Jul 16, 2020

hrzn Jul 16, 2020

hrzn Jul 16, 2020

hrzn Jul 16, 2020

hrzn Jul 16, 2020

hrzn Jul 16, 2020

hrzn left a comment

hrzn Jul 17, 2020

hrzn Jul 17, 2020

hrzn Jul 17, 2020

	forecast_best = model.predict(10)
	forecast_best = best_model.predict(10)

Feature/four theta #123

Feature/four theta #123

Conversation

Droxef commented Jul 3, 2020

Summary

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hrzn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hrzn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment