# Setting up PYSTATA

In [None]:
import sys
sys.path.append('Applications/Stata/utilities')
import stata_setup
from pystata import config
config.init('se')

: 

## Introduction

This is a Jupyter workbook for Session 1 of DRE 7006. Let us first try it and see if it works:

In [None]:
%%stata
display 2+2

That was maybe not so STATA-like.

In [None]:
%%stata
clear
set obs 100
gen x = runiform()
histogram x

You can see that this is STATA from the beautiful colors!

# The central limit theorem / introduction to "simulate"

As you sum random variables or take the mean of random variables, you get something that converges to a normal distribution.

In [None]:
%%stata
program makemean
    args n
    drop _all
    set obs `n'    
    gen x = runiform()
    summarize x
end

In [None]:
%%stata
clear
makemean 5

In [None]:
%%stata
clear
simulate r(mean), reps(10000) nodots: makemean 1
sum
histogram _sim_1


In [None]:
%%stata
clear
simulate r(mean), reps(10000) nodots: makemean 30
sum
histogram _sim_1

In [None]:
%%stata
program drop makemean

# Assessing standard errors

We define programs that generate data and run linear regressions. The first one is with homoskedasticity and the second one with heteroskedasticity.

In [None]:
%%stata
program linregsim
        drop _all
        set obs 942
        gen z = exp(rnormal())
        gen x = rnormal()
        gen epsi = rnormal()
        gen y = 1 + 2* x + 3* z + epsi
        regress y x z
    end


We need the command below, because we need to drop the program before every time we redefine the program:

In [None]:
%%stata
program drop linregsim

In [None]:
%%stata
program define linregsim2
        drop _all
        set obs 942
        gen z = exp(rnormal())
        gen x = rnormal()
        gen epsi = rnormal()
        gen y = 1 + 2* x + 3* z + epsi+ x*x*epsi
        regress y x z
    end

The variance of epsi is now $(1+x^2)^2$. So the expectation of epsi does not depend on x, but the variance does.

In [None]:
%%stata
program drop linregsim2

When these programs are defined, we can run the programs.

In [None]:
%%stata
clear
linregsim

# Simulation

We can simulate data and run the regression 1000 times. Then look at the dispersion in estimates. This is the thought experiment we are doing when we talk about "standard errors" and statistical inference in general.

In [None]:
%%stata
simulate _b, reps(1000): linregsim
summarize

In [None]:
%%stata
linregsim
regress y x z, robust
regress y x z, vce(bootstrap, reps(1000))

In [None]:
%%stata
simulate _b, reps(1000): linregsim2
summarize

In [None]:
%%stata
linregsim2
regress y x z, robust
regress y x z, vce(bootstrap, reps(1000))