# Least Squares - Inference

In [1]:
%%capture
import stata_setup, os
if os.name == 'nt':
    stata_setup.config('C:/Program Files/Stata17/','mp')
else:
    stata_setup.config('/usr/local/stata17','mp')

We load the data, rename the outcome variable, generate the indicator variables for ```year``` and ```cluster``` and define local Stata variables called ```journals``` and ```jel_imp``` which collects all relevant indicators.

In [2]:
%%stata -qui

use "../data/data", clear
rename log_flesch_kincaid_grade_level FKG
quietly tabulate year, generate(y_)
quietly tabulate cluster, generate(c_)

local journals  ecm jpe qje res  //AER based category

local jel_imp a_imp b_imp c_imp  e_imp f_imp g_imp h_imp i_imp j_imp k_imp /// 
		l_imp m_imp n_imp o_imp p_imp q_imp r_imp y_imp z_imp // D JEL based case




Performing the OLS regression of $\mathbf{Y}$ on $\mathbf{X}$ using ```Stata``` and saving a sub-vector of the original $\widehat{\beta}$ and its correponding submatrix $\widehat{V}_{\beta}$ (Note that row and/or column names are missing in some cases):

In [3]:
%%stata -qui
#delimit ;
reg FKG log_num_authors log_num_pages both_genders prop_women
			`journals' `jel_imp' y_2-y_20  c_2-c_215  jel_flag, vce(cluster cluster);
matrix b = e(b)[1,"log_num_authors"],e(b)[1,"log_num_pages"],
                e(b)[1,"both_genders"],e(b)[1,"prop_women"],e(b)[1,"_cons"];
matrix V = (e(V)[1,1], e(V)[1,2], e(V)[1,3], e(V)[1,4], e(V)[1,262] \ 
            e(V)[2,1], e(V)[2,2], e(V)[2,3], e(V)[2,4], e(V)[2,262] \ 
            e(V)[3,1], e(V)[3,2], e(V)[3,3], e(V)[3,4], e(V)[3,262] \ 
            e(V)[4,1], e(V)[4,2], e(V)[4,3], e(V)[4,4], e(V)[4,262] \ 
            e(V)[262,1], e(V)[262,2], e(V)[262,3], e(V)[262,4], e(V)[262,262]);
matrix rownames V = log_num_authors log_num_pages both_genders prop_women _cons;
matrix colnames V = log_num_authors log_num_pages both_genders prop_women _cons;
#delimit cr




In [4]:
%stata matrix list b
%stata matrix list V


b[1,5]
    log_num_au~s  log_num_pa~s  both_genders    prop_women         _cons
y1    -.00397377     .01915903     .00059809    -.01889331     2.7023992

symmetric V[5,5]
              log_num_au~s  log_num_pa~s  both_genders    prop_women
log_num_au~s     9.062e-06
log_num_pa~s    -8.521e-06     .00002404
both_genders     2.477e-06    -6.824e-06     .00001387
  prop_women     .00001121    -.00001846     4.847e-06     .00003053
       _cons     7.022e-06    -.00003725    -.00002375     5.047e-06

                     _cons
       _cons     .00025911


## t-Statistics \& _p_-Values

Printing the estimation results for these subset of coefficients of interest:

In [5]:
%stata ereturn post b V
%stata ereturn display, l(90)

------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [90% conf. interval]
-------------+----------------------------------------------------------------
log_num_au~s |  -.0039738   .0030103    -1.32   0.187    -.0089253    .0009778
log_num_pa~s |    .019159   .0049032     3.91   0.000     .0110941     .027224
both_genders |   .0005981   .0037246     0.16   0.872    -.0055284    .0067246
  prop_women |  -.0188933   .0055253    -3.42   0.001    -.0279816    -.009805
       _cons |   2.702399    .016097   167.88   0.000     2.675922    2.728876
------------------------------------------------------------------------------


Performing the test of the null hypothesis $\mathbb{H}_0: \beta_{\text{prop\_women}}=0$ against the alternative $\mathbb{H}_1:\beta_{\text{prop\_women}}\neq 0$.

In [6]:
%%stata
capture scalar drop T
scalar T = _b[prop_women]/_se[prop_women]
di _n "T(prop_women) = " T
di _n "Prob > |T| = " 2*(1-normal(abs(T)))


. capture scalar drop T

. scalar T = _b[prop_women]/_se[prop_women]

. di _n "T(prop_women) = " T

T(prop_women) = -3.4194288

. di _n "Prob > |T| = " 2*(1-normal(abs(T)))

Prob > |T| = .00062753

. 


## Confidence Interval

Printing the estimation results for these subset of coefficients of interest:

In [7]:
%%stata -qui
#delimit ;
reg FKG log_num_authors log_num_pages both_genders prop_women
			`journals' `jel_imp' y_2-y_20  c_2-c_215  jel_flag, vce(cluster cluster);
matrix b = e(b)[1,"log_num_authors"],e(b)[1,"log_num_pages"],
                e(b)[1,"both_genders"],e(b)[1,"prop_women"],e(b)[1,"_cons"];
matrix V = (e(V)[1,1], e(V)[1,2], e(V)[1,3], e(V)[1,4], e(V)[1,262] \ 
            e(V)[2,1], e(V)[2,2], e(V)[2,3], e(V)[2,4], e(V)[2,262] \ 
            e(V)[3,1], e(V)[3,2], e(V)[3,3], e(V)[3,4], e(V)[3,262] \ 
            e(V)[4,1], e(V)[4,2], e(V)[4,3], e(V)[4,4], e(V)[4,262] \ 
            e(V)[262,1], e(V)[262,2], e(V)[262,3], e(V)[262,4], e(V)[262,262]);
matrix rownames V = log_num_authors log_num_pages both_genders prop_women _cons;
matrix colnames V = log_num_authors log_num_pages both_genders prop_women _cons;
#delimit cr




In [8]:
%stata ereturn post b V
%stata ereturn display, l(90)

-----------------------------------------------------------------------------
> -
             | Coefficient  Std. err.      z    P>|z|     [90% conf. interval]
-------------+----------------------------------------------------------------
log_num_au~s |  -.0039738   .0030103    -1.32   0.187    -.0089253    .0009778
log_num_pa~s |    .019159   .0049032     3.91   0.000     .0110941     .027224
both_genders |   .0005981   .0037246     0.16   0.872    -.0055284    .0067246
  prop_women |  -.0188933   .0055253    -3.42   0.001    -.0279816    -.009805
       _cons |   2.702399    .016097   167.88   0.000     2.675922    2.728876
------------------------------------------------------------------------------


Manually calculating the $90 \%(=(1-\alpha) \times 100)$ confidence interval for $\beta_{\text{prop\_women}}$ as $\widehat{C}=\left[\widehat{\beta}_{\text{prop\_women}}-c_\alpha \cdot s\left(\widehat{\beta}_{\text{prop\_women}}\right), \widehat{\beta}_{\text{prop\_women}}+c_\alpha \cdot s\left(\widehat{\beta}_{\text{prop\_women}}\right)\right]$ where $c_\alpha=F^{-1}(1-\alpha / 2)$ and $F(\cdot)$ represents the cumulative distribution function of a standard normal distribution function.

In [9]:
%%stata
scalar c_min=_b[prop_women] + invnormal(0.05)*_se[prop_women]
scalar c_max=_b[prop_women] + invnormal(0.95)*_se[prop_women]
display _n "90% C.I. for b[prop_women]: (" c_min ", " c_max ")"


. scalar c_min=_b[prop_women] + invnormal(0.05)*_se[prop_women]

. scalar c_max=_b[prop_women] + invnormal(0.95)*_se[prop_women]

. display _n "90% C.I. for b[prop_women]: (" c_min ", " c_max ")"

90% C.I. for b[prop_women]: (-.0279816, -.00980503)

. 
