# Chapter 3 Applied Labs

## Q11 (investigating the t-statistic)

In [1]:
import statsmodels.api as sm
import numpy as np


In [2]:
x = np.random.normal(size=100)
y= 2*x + np.random.normal(size=100)

### (a) regress y onto x, without an intercept.

In [3]:
model = sm.OLS(y, x)
results = model.fit()
print(results.summary())


                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.810
Model:                            OLS   Adj. R-squared:                  0.808
Method:                 Least Squares   F-statistic:                     420.9
Date:                Fri, 17 May 2019   Prob (F-statistic):           1.97e-37
Time:                        11:48:59   Log-Likelihood:                -136.43
No. Observations:                 100   AIC:                             274.9
Df Residuals:                      99   BIC:                             277.5
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             1.9839      0.097     20.515      0.0

Comment on the results:
- Coefficient is 2.10+-0.09, when the true coefficient is 2
- t-statistic is 24.1, and the p-value is roughly 0, meaning the null hypothesis that B = 0 can be rejected.

### (b) regress x onto y, without an intercept

In [4]:
model = sm.OLS(x,y)
results = model.fit()
print(results.summary())


                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.810
Model:                            OLS   Adj. R-squared:                  0.808
Method:                 Least Squares   F-statistic:                     420.9
Date:                Fri, 17 May 2019   Prob (F-statistic):           1.97e-37
Time:                        11:48:59   Log-Likelihood:                -57.355
No. Observations:                 100   AIC:                             116.7
Df Residuals:                      99   BIC:                             119.3
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.4081      0.020     20.515      0.0

Comment:
- The t-statistic and p value are identical, but the coefficient is now 0.41 +- 0.02 (true value =0.5)

### (d) For the regression of Y onto X, the t-statistic for $H_0: \beta = 0$ takes the form $\beta/SE(\beta)$, where:

$$ SE(\beta) = \sqrt{ \frac{ \sum_i{(y_i - x_i \beta)^2 }}{(n-1)\sum_i {x_i^2}  }} $$

and:

$$ \beta = \left( \sum_i{x_i y_i} \right) / \left( \sum_i{x_i^2} \right)$$

Give the algebraic form of the t-statistic and confirm.


$$ t = \left( \sqrt{n-1} \sum_i{x_i y_i} \right) / \left( \sqrt{ \sum_i { x_i^2} \sum_i {y_i^2 } - \sum_i{x_i y_i}^2 }\right) $$

In [5]:
#numerically

t = np.sqrt(len(x) -1 ) * np.sum(x*y) / np.sqrt(np.sum(np.square(x))*np.sum(np.square(y)) - np.sum(x*y)**2 )

print(t)

20.515196451178866


This agrees with the values computed by statsmodels. We can see the expression for t doesn't change if x and y are swapped, meaning that the t-statistic for regression of y onto x and x onto y must be the same.

### (f) Show that when regression is performed *with* an intercept, the t-statistics for B1 for y onto x and x onto y are the same.

In [6]:
model = sm.OLS(y, sm.add_constant(x))
results = model.fit()
t_B1_1 = results.tvalues[1]

model = sm.OLS(x ,sm.add_constant(y))
results = model.fit()
t_B1_2 = results.tvalues[1]

# equality test doesn't really work for floats, this tests to a relative tolerance of 1e-05
print(np.isclose(t_B1_1,t_B1_2))

True
