# The preps

This notebook is to show the importance of adding constant when using `statsmodel.api` (aka `sm`) or `statsmodel.formula.api` (aka `smf`). OLS is used.

The short answer is:

**Always add constant before using it.**

In [21]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [20]:
import numpy as np
import pandas as pd
# import datetools to avoid statsmodel deperecation warning
from pandas.core import datetools
import statsmodels.api as sm
import statsmodels.formula.api as smf
# set the seed to get consitent random result.Paradox.

# The right way of using sm or smf

## 1. the sm

If we put the following X, y in Excel, the result is:

`Intercept       1.497761024
X Variable 1    0.012073045
X Variable 2    0.623936056 `

In [102]:
nobs = 25 # number of observations
np.random.seed(123) # to get consistent result 
X = np.random.random((nobs, 2))
X_with_ones = sm.add_constant(X)
beta = [1, .1, .5]
e = np.random.random(nobs)
y = np.dot(X_with_ones, beta) + e
results = sm.OLS(y, X_with_ones).fit()
print(results.summary())
results = sm.OLS(y, X).fit()
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.247
Model:                            OLS   Adj. R-squared:                  0.179
Method:                 Least Squares   F-statistic:                     3.609
Date:                Sat, 24 Mar 2018   Prob (F-statistic):             0.0441
Time:                        23:01:20   Log-Likelihood:                -1.8015
No. Observations:                  25   AIC:                             9.603
Df Residuals:                      22   BIC:                             13.26
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.4978      0.178      8.427      0.0

As we can see the result of the above cell,

1. The result of adding constant conforms with the result from excel
2. The result of *NOT* adding constant is **WRONG**

## 2. the smf

In [103]:
results = smf.OLS(y, X_with_ones).fit()
print(results.summary())
results = smf.OLS(y, X).fit()
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.247
Model:                            OLS   Adj. R-squared:                  0.179
Method:                 Least Squares   F-statistic:                     3.609
Date:                Sat, 24 Mar 2018   Prob (F-statistic):             0.0441
Time:                        23:04:49   Log-Likelihood:                -1.8015
No. Observations:                  25   AIC:                             9.603
Df Residuals:                      22   BIC:                             13.26
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.4978      0.178      8.427      0.0

As we can see the result of the above cell, it is consitent with sm model.