# Regressorion with Orbit

In this demo, we want to demonstartate how to use the different arguments in the model classes (LGT or DLT) to realize different setups for the regressors. Those could be very useful in practice when tuning the models.

In [2]:
import pandas as pd
import numpy as np
from orbit.lgt import LGT
from orbit.utils.plot import plot_predicted_data
from orbit.utils.plot import plot_predicted_components

In [3]:
print(np.__version__)

1.18.2


## Simulate Regressions

In [20]:
SERIES_LEN = 300
CYCLE_LEN = 52

In [21]:
COEFS_MU, COEFS_SIGMA = 0.01, 0.03
NUM_COEFS = 10

In [22]:
COEFS = np.random.default_rng().normal(COEFS_MU, COEFS_MU, NUM_COEFS)

In [29]:
OBBS_REG_PROB = .5
OBS_REG_LOG_LOC = np.log(100)
OBS_REG_LOG_SCALE = np.log(10)
OBS_LOG_SCALE = np.log(100)

In [34]:
X = np.exp(np.random.default_rng().normal(
    OBS_REG_LOG_LOC, OBS_REG_LOG_SCALE, SERIES_LEN * NUM_COEFS)).reshape(SERIES_LEN, -1)
Z = np.random.default_rng().binomial(1, 0.5, SERIES_LEN * NUM_COEFS).reshape(SERIES_LEN, -1)
X = X * Z
NOISE =  np.random.default_rng().normal(0, OBS_LOG_SCALE, SERIES_LEN)
Y = np.matmul(X, COEFS) + NOISE

In [41]:
print(X.shape, COEFS.shape, NOISE.shape, Y.shape) 

(300, 10) (10,) (300,) (300,)


## Simulate Random Walk

## load data

In [5]:
x = simulate_seasonal_term(periodicity=52, total_cycles=3, noise_std=.1)

array([ -53.10050505,   70.7361681 ,    7.62218668,   80.02691934,
         -1.81383886,   44.9931839 ,   49.6354467 ,   62.20835997,
        -34.04515083,   78.40813572,  -73.58858178,  -63.60792255,
         62.48874301,  -27.03546529,  -13.56171563,   33.85175525,
        -40.02128493,   31.8835865 ,  -33.50208007,  -37.76527625,
         15.25614042,  114.43526237,  -38.82027586,  -91.83970707,
         77.52998744,   20.45353617,   58.14408148,   50.5528106 ,
        -31.21139065,    9.58691162,   35.4563511 ,   38.32069734,
         38.89042428,  -42.36190623,  -31.52009024,   -8.81342001,
        -29.5024374 ,    6.09869338,  -34.75919007,   -1.97169924,
        -52.32229734,    5.31323622,  -37.02783143,   21.19296602,
        -37.29383493,   66.0971052 ,   75.07076214,   12.84289229,
         46.09243304, -180.13246653, -172.06303607,  -42.94273364,
        -47.32622845,   73.17509569,    7.51016724,   76.75832875,
          1.17580032,   52.09741581,   43.6349905 ,   63.98079

In [2]:
DATA_FILE = "./data/iclaims.example.csv"

raw_df = pd.read_csv(DATA_FILE, parse_dates=['week'])

raw_df.dtypes

week              datetime64[ns]
claims                     int64
trend.unemploy           float64
trend.filling            float64
trend.job                float64
dtype: object

In [3]:
raw_df.head(5)

Unnamed: 0,week,claims,trend.unemploy,trend.filling,trend.job
0,2010-01-03,651215,1.183973,0.72014,1.119669
1,2010-01-10,825891,1.183973,0.814896,1.178599
2,2010-01-17,659173,1.203382,0.739091,1.119669
3,2010-01-24,507651,1.164564,0.814896,1.107883
4,2010-01-31,538617,1.086926,0.776993,1.072525


In [4]:
df=raw_df.copy()
test_size=52
train_df=df[:-test_size]
test_df=df[-test_size:]

## Use regressors and specify their signs

The regressor columns can be supplied via argument `regressor_col`. Their signs can be specified via `regressor_sign`, with values either '=' (regular, no restriction) or '+' (positive). These two lists should be of the same lenght. The default values of `regressor_sign` is all '='.

In [5]:
DATE_COL="week"
RESPONSE_COL="claims"
REGRESSOR_COL=['trend.unemploy', 'trend.filling', 'trend.job']

In [6]:
lgt_map=LGT(
    response_col=RESPONSE_COL,
    date_col=DATE_COL,
    regressor_col=REGRESSOR_COL,
    regressor_sign=["+", '+', '='], 
    seasonality=52,
    seed=1,
    predict_method='map',
    sample_method='map',
    is_multiplicative=True,
    auto_scale=False
)

In [7]:
lgt_map.fit(df=train_df)
_ = lgt_map.predict(df=test_df)

The estimated regressor coefficients can be retrieved via `.aggregated_posteriors`.

In [12]:
lgt_map.get_regression_coefs(aggregation_method='map')

Unnamed: 0,regressor,regressor_sign,coefficient
0,trend.unemploy,Positive,0.041533
1,trend.filling,Positive,1.1e-05
2,trend.job,Regular,-0.070758


## Adjust pirors for regressor beta and regressor standard deviation

In the model, it is assumed $$\beta \sim Gaussian(\beta_{prior}, \sigma_{prior})$$

The default values for $\beta_{prior}$ and $\sigma_{prior}$ are 0 and 1, respectively.

Users could adjust them via arguments `regressor_beta_prior` and `regressor_sigma_prior`. These two lists should be of the same lenght as `regressor_col`.

In [11]:
lgt_map=LGT(
    response_col=RESPONSE_COL,
    date_col=DATE_COL,
    regressor_col=REGRESSOR_COL,
    regressor_sign=["+", '+', '='], 
    regressor_beta_prior=[0.05, 0.05, 0],
    regressor_sigma_prior=[0.1, 0.1, 0.1],
    seasonality=52,
    seed=1,
    predict_method='map',
    is_multiplicative=True
)

In [12]:
lgt_map.fit(df=train_df)
_ = lgt_map.predict(df=test_df)

One can notice the significant changes in the estimted coefficients by using different priors.

In [13]:
lgt_map.aggregated_posteriors['map']['pr_beta']

array([[0.04529929, 0.06795404]])

In [14]:
lgt_map.aggregated_posteriors['map']['rr_beta']

array([-0.03710713])

## Use Cauchy prior on regressor standard deviation instead of fixed

Instead of using fixed standard deviations for regressors, a hyperprior can be assigned to them, i.e.
$$\sigma_\beta \sim \text{Half-Cauchy}(\sigma_{\beta_{prior}}, \sigma_{sd})$$

This can be done by setting `fix_regression_coef_sd = 0`, and $\sigma_{sd}$ can be changed via `regressor_sigma_sd`.

In [15]:
lgt_map=LGT(
    response_col=RESPONSE_COL,
    date_col=DATE_COL,
    regressor_col=REGRESSOR_COL,
    regressor_sign=["+", '+', '='],
    fix_regression_coef_sd=0,
    regressor_sigma_sd=0.5,
    seasonality=52,
    seed=1,
    predict_method='map',
    is_multiplicative=True
)

In [16]:
lgt_map.fit(df=train_df)
_ = lgt_map.predict(df=test_df)

In [17]:
lgt_map.aggregated_posteriors['map']['pr_beta']

array([[4.14571092e-02, 1.00000000e-15]])

In [18]:
lgt_map.aggregated_posteriors['map']['rr_beta']

array([-0.07121057])

## Set max value threshold for regressor coef

In some cases, the users may want to tune the maximum threshold on the regressor coefficients. This can be done by using `regression_coef_max`, whose default value is 1.

In [19]:
lgt_map=LGT(
    response_col=RESPONSE_COL,
    date_col=DATE_COL,
    regressor_col=REGRESSOR_COL,
    regressor_sign=["+", '+', '='],
    regression_coef_max=0.03,
    seasonality=52,
    seed=1,
    predict_method='map',
    is_multiplicative=True
)


lgt_map.fit(df=train_df)
_ = lgt_map.predict(df=test_df)

One can notice the impacts on the estimated coefficients.

In [20]:
lgt_map.aggregated_posteriors['map']['pr_beta']

array([[0.02995537, 0.02953037]])

In [21]:
lgt_map.aggregated_posteriors['map']['rr_beta']

array([-0.02876044])