### Problem 2

This question asks you to do a Monte Carlo simulation using the setup from Problem 1. The idea is to compare finite sample properties of the OLS estimator $\hat{\beta}_1$ and the alternative estimator $\tilde{\beta}_1$.

Please attach your code to the solutions. In addition to providing the code, please write down separate answers for each part of the problem. For example, to answer part (a), write down what the approximate means, standard deviations, and mean squared errors of the estimators are. We should be able to check the correctness of your solutions without looking at the code.

Suppose $e_i , v_i , w_i$ , and $x_i^*$ are mutually independent. Also suppose that $x_i^* \sim N(1, 1)$ and $e_i , v_i , w_i \sim N (0, 0.5)$. Let $\beta_1 = 2$ and $\beta_2 = 3$. The sample size is $n = 200$. Now do the following at least 1.000 times.

***

#### *Remark on implementation:*

In most situations it suffices to loop over simulation runs. Here I don't loop over simulations but include the simulations in the array creation directly. This is usually faster and often times easier to read; however, it can also lead to overlooked errors, as the dimensionality of the arrays can easily grow to 3 or 4.

***

   1. Use the distributions and parameters given above as well as the model in Problem 1 to generate a random sample $\{y_i , x_i , z_i \}_{i=1}^n$.

In [1]:
import numpy as np

In [2]:
beta_1 = 2

def outcome(x_star, e, beta_1=beta_1, beta_2=3):
    y = beta_1 * x_star + beta_2 + e
    return y

In [3]:
def generate_random_sample(n_samples, n_simulations, seed=0):
    np.random.seed(seed)
    
    x_star = np.random.normal(1, 1, size=(n_samples, n_simulations))
    e, v, w = (np.random.normal(0, 0.5, size=(n_samples, n_simulations)) for _ in range(3))
    x = x_star + v
    z = x_star + w
    y = outcome(x_star, e)
    
    return y, x, x_star, z, e, v, w

   2. Obtain $\hat{\beta}$ and $\tilde{\beta}$.

In [4]:
def compute_beta_1_hat(y, x):
    x_mean = x.mean(axis=0)
    y_mean = y.mean(axis=0)
    xy_mean = (x * y).mean(axis=0)
    xx_mean = (x ** 2).mean(axis=0)    
    
    beta = (xy_mean - x_mean * y_mean) / (xx_mean - x_mean ** 2)
    return beta

In [5]:
def compute_beta_1_tilde(y, x, z):
    z_centered = z - z.mean(axis=0)
    
    beta = (z_centered * y).mean(axis=0) / (z_centered * x).mean(axis=0)
    return beta

   3. Construct the usual 95% confidence interval for $\beta_1$ based on the OLS estimator.
   

For this I need functions which estimate the error variance, which in turn needs calculation of the predicted values.

In [6]:
def compute_beta_2(y, x, beta_1):
    beta_2 = y.mean(axis=0) - beta_1 * x.mean(axis=0)
    return beta_2

In [7]:
def compute_ols_residuals(y, x, beta_1, beta_2):
    residuals = y - beta_1 * x - beta_2
    return residuals

In [8]:
def compute_ols_standard_error(residuals, x):
    error_estimate = (residuals ** 2).mean(axis=0)
    se = np.sqrt(error_estimate * np.var(x, axis=0))
    return se

In [9]:
def confidence_interval(beta_1, se, n_samples, crit_value=.95):
    const = 1.96 * se / np.sqrt(n_samples)
    left = beta_1 - const
    right = beta_1 + const
    ci = np.c_[left, right]
    return ci

   4. Construct a 95% confidence interval for $\beta_1$ based on $\tilde{\beta_1}$ and the results in part (e) of Problem 1.

In [10]:
def compute_tilde_standard_error(residuals, x, z):
    z_mean = z.mean(axis=0)
    x_mean = x.mean(axis=0)
    
    cov_xz = (x * z).mean(axis=0) - z_mean * x_mean
    
    tmp = (z - z_mean) ** 2 * (residuals ** 2)
    
    se = np.sqrt(tmp.mean(axis=0)) / cov_xz
    return se

### Actual Computation:

In [11]:
n_samples = 200
n_simulations = 100_000

y, x, x_star, z, e, v, w = generate_random_sample(n_samples, n_simulations)

beta_1_hat = compute_beta_1_hat(y, x)
beta_2_hat = compute_beta_2(y, x, beta_1_hat)
residuals_hat = compute_ols_residuals(y, x, beta_1_hat, beta_2_hat)
se_hat = compute_ols_standard_error(residuals_hat, x)

beta_1_tilde = compute_beta_1_tilde(y, x, z)
beta_2_tilde = compute_beta_2(y, x, beta_1_tilde)
residuals_tilde = compute_ols_residuals(y, x, beta_1_tilde, beta_2_tilde)
se_tilde = compute_tilde_standard_error(residuals_tilde, x, z)

ci_hat = confidence_interval(beta_1_hat, se_hat, n_samples)
ci_tilde = confidence_interval(beta_1_tilde, se_tilde, n_samples)

5. Now answer the following questions:

    1. What are the approximate means, standard deviations, and mean squared errors of the estimators $\hat{\beta_1}$ and $\tilde{\beta_1}$? Which estimator do you prefer?

    2. What is the coverage probability of the confidence interval based on the OLS estimator $\hat{\beta_1}$  ? In particular, how often is ${\beta_1}$  in the confidence interval?

    3. What is the coverage probability of the confidence interval based on the estimator $\tilde{\beta_1}$ ? In particular, how often is ${\beta_1}$ in the confidence interval?

    4. Discuss if the simulation results are in line with the theoretical properties you derived in Problem 1.

## A:

#### Hat Estimator:

In [12]:
beta_1_hat.mean()

1.5999493691160271

In [13]:
beta_1_hat.std()

0.06522993965920468

In [14]:
((beta_1_hat - beta_1) ** 2).mean()

0.16429545229860842

#### Tilde Estimator:

In [15]:
beta_1_tilde.mean()

2.0034079524442654

In [16]:
beta_1_tilde.std()

0.08966781895365132

In [17]:
((beta_1_tilde - beta_1) ** 2).mean()

0.008051931895767169

## B:

#### Hat Estimator:

In [18]:
((ci_hat[:, 0] < beta_1) & (beta_1 < ci_hat[:, 1])).mean()

0.00015

## C:
#### Tilde Estimator:

In [19]:
((ci_tilde[:, 0] < beta_1) & (beta_1 < ci_tilde[:, 1])).mean()

0.94479