# Regression with Orbit - Basic

In this demo, we want to demonstartate how to use the different arguments in the model classes (LGT or DLT) to realize different setups for the regressors. Those could be very useful in practice when tuning the models.

In [1]:
import pandas as pd
import numpy as np

from orbit.models.lgt import LGTMAP, LGTAggregated, LGTFull
from orbit.models.dlt import DLTMAP, DLTAggregated, DLTFull
from orbit.diagnostics.plot import plot_predicted_data, plot_predicted_components
from orbit.utils.dataset import load_iclaims

## load data

In [2]:
raw_df = load_iclaims()
raw_df.dtypes

week              datetime64[ns]
claims                   float64
trend.unemploy           float64
trend.filling            float64
trend.job                float64
sp500                    float64
vix                      float64
dtype: object

In [3]:
raw_df.head()

Unnamed: 0,week,claims,trend.unemploy,trend.filling,trend.job,sp500,vix
0,2010-01-03,13.386595,0.219882,-0.318452,0.1175,-0.417633,0.122654
1,2010-01-10,13.624218,0.219882,-0.194838,0.168794,-0.42548,0.110445
2,2010-01-17,13.398741,0.236143,-0.292477,0.1175,-0.465229,0.532339
3,2010-01-24,13.137549,0.203353,-0.194838,0.106918,-0.481751,0.428645
4,2010-01-31,13.19676,0.13436,-0.242466,0.074483,-0.488929,0.487404


In [4]:
df=raw_df.copy()

## Use regressors and specify their signs

The regressor columns can be supplied via argument `regressor_col`. Their signs can be specified via `regressor_sign`, with values either '=' (regular, no restriction) or '+' (positive). These two lists should be of the same lenght. The default values of `regressor_sign` is all '='.

Also, note that in general, a better performance can be acheived in regressions with `infer_method=mcmc` due to potential high dimensional distributions of parameters. We will use `mcmc` as sample method in following examples.

In [5]:
DATE_COL="week"
RESPONSE_COL="claims"
REGRESSOR_COL=['trend.unemploy', 'trend.filling', 'trend.job']

In [6]:
dlt_mod=DLTAggregated(response_col=RESPONSE_COL,
                      date_col=DATE_COL,
                      regressor_col=REGRESSOR_COL,
                      regressor_sign=["+", '+', '='], 
                      seasonality=52,
                      seed=1)

dlt_mod.fit(df=df)



Gradient evaluation took 0.00037 seconds
Gradient evaluation took 0.000361 seconds
1000 transitions using 10 leapfrog steps per transition would take 3.7 seconds.
Adjust your expectations accordingly!


1000 transitions using 10 leapfrog steps per transition would take 3.61 seconds.
Adjust your expectations accordingly!


Iteration:   1 / 250 [  0%]  (Warmup)
Iteration:   1 / 250 [  0%]  (Warmup)

Gradient evaluation took 0.000544 seconds
1000 transitions using 10 leapfrog steps per transition would take 5.44 seconds.
Adjust your expectations accordingly!



Gradient evaluation took 0.000514 seconds
1000 transitions using 10 leapfrog steps per transition would take 5.14 seconds.
Adjust your expectations accordingly!


Iteration:   1 / 250 [  0%]  (Warmup)
Iteration:   1 / 250 [  0%]  (Warmup)
Iteration:  25 / 250 [ 10%]  (Warmup)
Iteration:  25 / 250 [ 10%]  (Warmup)
Iteration:  25 / 250 [ 10%]  (Warmup)
Iteration:  25 / 250 [ 10%]  (Warmup)
Iteration:  50 / 250 [ 20%]  (Warmup)
Iter

To run all diagnostics call pystan.check_hmc_diagnostics(fit)


Iteration: 250 / 250 [100%]  (Sampling)

 Elapsed Time: 8.14911 seconds (Warm-up)
               0.20557 seconds (Sampling)
               8.35468 seconds (Total)



The estimated regressor coefficients can be retrieved via `.aggregated_posteriors`.

In [7]:
dlt_mod.get_regression_coefs()

Unnamed: 0,regressor,regressor_sign,coefficient
0,trend.unemploy,Positive,0.054019
1,trend.filling,Positive,0.075139
2,trend.job,Regular,-0.058975


## Adjust pirors for regressor beta and regressor standard deviation

In the model, it is assumed $$\beta \sim Gaussian(\beta_{prior}, \sigma_{prior})$$

The default values for $\beta_{prior}$ and $\sigma_{prior}$ are 0 and 1, respectively.

Users could adjust them via arguments `regressor_beta_prior` and `regressor_sigma_prior`. These two lists should be of the same lenght as `regressor_col`.

In [8]:
dlt_mod=DLTAggregated(response_col=RESPONSE_COL,
                      date_col=DATE_COL,
                      regressor_col=REGRESSOR_COL,
                      regressor_sign=["+", '+', '='], 
                      regressor_beta_prior=[0.05, 0.05, 0],
                      regressor_sigma_prior=[0.1, 0.1, 0.1],
                      seasonality=52,
                      seed=1)

In [9]:
dlt_mod.fit(df=df)



Gradient evaluation took 0.000368 seconds
Gradient evaluation took 0.000388 seconds
1000 transitions using 10 leapfrog steps per transition would take 3.88 seconds.
1000 transitions using 10 leapfrog steps per transition would take 3.68 seconds.
Adjust your expectations accordingly!
Adjust your expectations accordingly!




Iteration:   1 / 250 [  0%]  (Warmup)
Iteration:   1 / 250 [  0%]  (Warmup)

Gradient evaluation took 0.000531 seconds
1000 transitions using 10 leapfrog steps per transition would take 5.31 seconds.
Adjust your expectations accordingly!



Gradient evaluation took 0.000529 seconds
1000 transitions using 10 leapfrog steps per transition would take 5.29 seconds.
Adjust your expectations accordingly!


Iteration:   1 / 250 [  0%]  (Warmup)
Iteration:   1 / 250 [  0%]  (Warmup)
Iteration:  25 / 250 [ 10%]  (Warmup)
Iteration:  25 / 250 [ 10%]  (Warmup)
Iteration:  25 / 250 [ 10%]  (Warmup)
Iteration:  25 / 250 [ 10%]  (Warmup)
Iteration:  50 / 250 [ 20%]  (Warmup)
It

To run all diagnostics call pystan.check_hmc_diagnostics(fit)


Iteration: 250 / 250 [100%]  (Sampling)

 Elapsed Time: 5.98603 seconds (Warm-up)
               0.20397 seconds (Sampling)
               6.19 seconds (Total)



One can notice the significant changes in the estimted coefficients by using different priors.

In [10]:
dlt_mod.get_regression_coefs()

Unnamed: 0,regressor,regressor_sign,coefficient
0,trend.unemploy,Positive,0.052978
1,trend.filling,Positive,0.07133
2,trend.job,Regular,-0.040119


## Use data-driven sigma for each coefficients

Instead of using fixed standard deviations for regressors, a hyperprior can be assigned to them, i.e.
$$\sigma_\beta \sim \text{Half-Cauchy}(0, \text{ridge_scale})$$

This can be done by setting `regression_penalty="auto_ridge"`.  Notice there is hyperprior `auto_ridge_scale` for tuning with a default of `0.5`.

In [11]:
dlt_mod=DLTAggregated(response_col=RESPONSE_COL,
                      date_col=DATE_COL,
                      regressor_col=REGRESSOR_COL,
                      regressor_sign=["+", '+', '='],
                      seasonality=52,
                      seed=1,
                      regression_penalty="auto_ridge",
                      auto_ridge_scale=0.5)


dlt_mod.fit(df=df)



Gradient evaluation took 0.000382 seconds
1000 transitions using 10 leapfrog steps per transition would take 3.82 seconds.
Adjust your expectations accordingly!


Gradient evaluation took 0.000375 seconds
1000 transitions using 10 leapfrog steps per transition would take 3.75 seconds.
Adjust your expectations accordingly!


Iteration:   1 / 250 [  0%]  (Warmup)
Iteration:   1 / 250 [  0%]  (Warmup)

Gradient evaluation took 0.000634 seconds
1000 transitions using 10 leapfrog steps per transition would take 6.34 seconds.
Adjust your expectations accordingly!



Gradient evaluation took 0.000568 seconds
1000 transitions using 10 leapfrog steps per transition would take 5.68 seconds.
Adjust your expectations accordingly!


Iteration:   1 / 250 [  0%]  (Warmup)
Iteration:   1 / 250 [  0%]  (Warmup)
Iteration:  25 / 250 [ 10%]  (Warmup)
Iteration:  25 / 250 [ 10%]  (Warmup)
Iteration:  25 / 250 [ 10%]  (Warmup)
Iteration:  25 / 250 [ 10%]  (Warmup)
Iteration:  50 / 250 [ 20%]  (Warmup)
It

To run all diagnostics call pystan.check_hmc_diagnostics(fit)


Iteration: 250 / 250 [100%]  (Sampling)

 Elapsed Time: 9.62643 seconds (Warm-up)
               0.214623 seconds (Sampling)
               9.84105 seconds (Total)



In [12]:
dlt_mod.get_regression_coefs()

Unnamed: 0,regressor,regressor_sign,coefficient
0,trend.unemploy,Positive,0.048211
1,trend.filling,Positive,0.057213
2,trend.job,Regular,-0.048793
