# Regression with Orbit

In this demo, we want to demonstartate how to use the different arguments in the model classes (LGT or DLT) to realize different setups for the regressors. Those could be very useful in practice when tuning the models.

In [2]:
import pandas as pd
import numpy as np
from orbit.models.lgt import LGTMAP, LGTAggregated, LGTFull
from orbit.models.dlt import DLTMAP, DLTAggregated, DLTFull
from orbit.diagnostics.plot import plot_predicted_data
from orbit.diagnostics.plot import plot_predicted_components
from orbit.utils.dataset import load_iclaims

## load data

In [3]:
df = load_iclaims()
df[['claims', 'trend.unemploy', 'trend.filling', 'trend.job']] = \
    np.log(df[['claims', 'trend.unemploy', 'trend.filling', 'trend.job']])

## Use regressors and specify their signs

The regressor columns can be supplied via argument `regressor_col`. Their signs can be specified via `regressor_sign`, with values either '=' (regular, no restriction) or '+' (positive). These two lists should be of the same lenght. The default values of `regressor_sign` is all '='.

In [4]:
DATE_COL="week"
RESPONSE_COL="claims"
REGRESSOR_COL=['trend.unemploy', 'trend.filling', 'trend.job']

In [5]:
lgt_mod=LGTAggregated(
    response_col=RESPONSE_COL,
    date_col=DATE_COL,
    regressor_col=REGRESSOR_COL,
    regressor_sign=["+", '+', '='], 
    seasonality=52,
    seed=1,
)
lgt_mod.fit(df=df)

To run all diagnostics call pystan.check_hmc_diagnostics(fit)


The estimated regressor coefficients can be retrieved via `.aggregated_posteriors`.

In [6]:
lgt_mod.get_regression_coefs()

Unnamed: 0,regressor,regressor_sign,coefficient
0,trend.unemploy,Positive,0.053995
1,trend.filling,Positive,0.076244
2,trend.job,Regular,-0.070255


## Regression Types

In orbit, we have different prior types for the regressiont coefficients:

* Fixed Ridge
* Lasso
* Auto Ridge

In **Fixed Ridge**, it is assumed that $$\beta \sim Gaussian(\beta_{prior}, \sigma_{prior})$$

In **Lasso**, it is assumed that $$\beta \sim Laplace(\beta_{prior}, \sigma_{prior})$$

In **Auto Ridge**, it is assumed that $$\beta \sim Gaussian(\beta_{prior}, \sigma_{\beta})$$, $$\sigma_\beta \sim \text{Half-Cauchy}(0, \text{ridge_scale})$$

### Fixed Ridge

In [7]:
lgt_mod = LGTAggregated(
    response_col=RESPONSE_COL,
    date_col=DATE_COL,
    regressor_col=REGRESSOR_COL,
    regressor_sign=["+", '+', '='],
    seasonality=52,
    seed=1,
    regression_penalty="fixed_ridge",
)
lgt_mod.fit(df=df)

To run all diagnostics call pystan.check_hmc_diagnostics(fit)


In [8]:
lgt_mod.get_regression_coefs()

Unnamed: 0,regressor,regressor_sign,coefficient
0,trend.unemploy,Positive,0.053995
1,trend.filling,Positive,0.076244
2,trend.job,Regular,-0.070255


### Lasso

In [9]:
lgt_mod = LGTAggregated(
    response_col=RESPONSE_COL,
    date_col=DATE_COL,
    regressor_col=REGRESSOR_COL,
    regressor_sign=["+", '+', '='],
    seasonality=52,
    seed=1,
    regression_penalty="lasso",
)
lgt_mod.fit(df=df)

To run all diagnostics call pystan.check_hmc_diagnostics(fit)


In [10]:
lgt_mod.get_regression_coefs()

Unnamed: 0,regressor,regressor_sign,coefficient
0,trend.unemploy,Positive,0.049352
1,trend.filling,Positive,0.068068
2,trend.job,Regular,-0.058273


### Auto Ridge

In [11]:
lgt_mod = LGTAggregated(
    response_col=RESPONSE_COL,
    date_col=DATE_COL,
    regressor_col=REGRESSOR_COL,
    regressor_sign=["+", '+', '='],
    seasonality=52,
    seed=1,
    regression_penalty="auto_ridge",
    auto_ridge_scale=0.5,
)
lgt_mod.fit(df=df)

To run all diagnostics call pystan.check_hmc_diagnostics(fit)


In [12]:
lgt_mod.get_regression_coefs()

Unnamed: 0,regressor,regressor_sign,coefficient
0,trend.unemploy,Positive,0.044648
1,trend.filling,Positive,0.052548
2,trend.job,Regular,-0.050318


## Adjust pirors for regressor beta and regressor standard deviation

In the model, it is assumed $$\beta \sim Gaussian(\beta_{prior}, \sigma_{prior})$$

The default values for $\beta_{prior}$ and $\sigma_{prior}$ are 0 and 1, respectively.

Users could adjust them via arguments `regressor_beta_prior` and `regressor_sigma_prior`. These two lists should be of the same lenght as `regressor_col`.

In [13]:
lgt_mod = LGTAggregated(
    response_col=RESPONSE_COL,
    date_col=DATE_COL,
    regressor_col=REGRESSOR_COL,
    regressor_sign=["+", '+', '='], 
    regressor_beta_prior=[0.05, 0.05, 0],
    regressor_sigma_prior=[0.1, 0.1, 0.1],
    seasonality=52,
    seed=1,
)

In [14]:
lgt_mod.regression_penalty

'fixed_ridge'

In [15]:
lgt_mod.fit(df=df)

To run all diagnostics call pystan.check_hmc_diagnostics(fit)


One can notice the significant changes in the estimted coefficients by using different priors.