# Regression with Orbit - Basic

In this demo, we want to demonstartate how to use the different arguments in the model classes (LGT or DLT) to realize different setups for the regressors. Those could be very useful in practice when tuning the models.

In [3]:
import pandas as pd
import numpy as np

from orbit.models import LGT, DLT
from orbit.diagnostics.plot import plot_predicted_data, plot_predicted_components
from orbit.utils.dataset import load_iclaims

## load data

In [4]:
raw_df = load_iclaims()
raw_df.dtypes

week              datetime64[ns]
claims                   float64
trend.unemploy           float64
trend.filling            float64
trend.job                float64
sp500                    float64
vix                      float64
dtype: object

In [5]:
raw_df.head()

Unnamed: 0,week,claims,trend.unemploy,trend.filling,trend.job,sp500,vix
0,2010-01-03,13.386595,0.219882,-0.318452,0.1175,-0.417633,0.122654
1,2010-01-10,13.624218,0.219882,-0.194838,0.168794,-0.42548,0.110445
2,2010-01-17,13.398741,0.236143,-0.292477,0.1175,-0.465229,0.532339
3,2010-01-24,13.137549,0.203353,-0.194838,0.106918,-0.481751,0.428645
4,2010-01-31,13.19676,0.13436,-0.242466,0.074483,-0.488929,0.487404


In [6]:
df=raw_df.copy()

## Use regressors and specify their signs

The regressor columns can be supplied via argument `regressor_col`. Their signs can be specified via `regressor_sign`, with values either '=' (regular, no restriction) or '+' (positive). These two lists should be of the same lenght. The default values of `regressor_sign` is all '='.

Also, note that in general, a better performance can be acheived in regressions with `infer_method=mcmc` due to potential high dimensional distributions of parameters. We will use `mcmc` as sample method in following examples.

In [7]:
DATE_COL="week"
RESPONSE_COL="claims"
REGRESSOR_COL=['trend.unemploy', 'trend.filling', 'trend.job']

In [9]:
dlt_mod=DLT(response_col=RESPONSE_COL,
            date_col=DATE_COL,
            regressor_col=REGRESSOR_COL,
            regressor_sign=["+", '+', '='], 
            seasonality=52,
            seed=1)

dlt_mod.fit(df=df, point_method='median')

To run all diagnostics call pystan.check_hmc_diagnostics(fit)


The estimated regressor coefficients can be retrieved via `.aggregated_posteriors`.

In [10]:
dlt_mod.get_regression_coefs()

Unnamed: 0,regressor,regressor_sign,coefficient
0,trend.unemploy,Positive,0.054465
1,trend.filling,Positive,0.065368
2,trend.job,Regular,-0.074599


## Adjust pirors for regressor beta and regressor standard deviation

In the model, it is assumed $$\beta \sim Gaussian(\beta_{prior}, \sigma_{prior})$$

The default values for $\beta_{prior}$ and $\sigma_{prior}$ are 0 and 1, respectively.

Users could adjust them via arguments `regressor_beta_prior` and `regressor_sigma_prior`. These two lists should be of the same lenght as `regressor_col`.

In [8]:
dlt_mod=DLT(response_col=RESPONSE_COL,
            date_col=DATE_COL,
            regressor_col=REGRESSOR_COL,
            regressor_sign=["+", '+', '='], 
            regressor_beta_prior=[0.05, 0.05, 0],
            regressor_sigma_prior=[0.1, 0.1, 0.1],
            seasonality=52,
            seed=1)

In [11]:
dlt_mod.fit(df=df, point_method='median')

To run all diagnostics call pystan.check_hmc_diagnostics(fit)


One can notice the significant changes in the estimted coefficients by using different priors.

In [12]:
dlt_mod.get_regression_coefs()

Unnamed: 0,regressor,regressor_sign,coefficient
0,trend.unemploy,Positive,0.052746
1,trend.filling,Positive,0.067916
2,trend.job,Regular,-0.059771


## Use data-driven sigma for each coefficients

Instead of using fixed standard deviations for regressors, a hyperprior can be assigned to them, i.e.
$$\sigma_\beta \sim \text{Half-Cauchy}(0, \text{ridge_scale})$$

This can be done by setting `regression_penalty="auto_ridge"`.  Notice there is hyperprior `auto_ridge_scale` for tuning with a default of `0.5`.

In [13]:
dlt_mod=DLT(response_col=RESPONSE_COL,
            date_col=DATE_COL,
            regressor_col=REGRESSOR_COL,
            regressor_sign=["+", '+', '='],
            seasonality=52,
            seed=1,
            regression_penalty="auto_ridge",
            auto_ridge_scale=0.5)


dlt_mod.fit(df=df, point_method='median')

To run all diagnostics call pystan.check_hmc_diagnostics(fit)


In [14]:
dlt_mod.get_regression_coefs()

Unnamed: 0,regressor,regressor_sign,coefficient
0,trend.unemploy,Positive,0.048657
1,trend.filling,Positive,0.064906
2,trend.job,Regular,-0.040714
