# **Notebook 19.5: Control variates**

This notebook investigates the method of control variates as described in figure 19.16


Work through the cells below, running each cell in turn. In various places you will see the words "TO DO". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.

Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.

In [16]:
import numpy as np
import matplotlib.pyplot as plt

Genearate from our two variables, $a$ and $b$.  We are interested in estimating the mean of $a$, but we can use $b$$ to improve our estimates if it is correlated

In [17]:
# Sample from two variables with mean zero, standard deviation one, and a given correlation coefficient
def get_samples(n_samples, correlation_coeff=0.8):
  a = np.random.normal(size=(1,n_samples))
  temp = np.random.normal(size=(1, n_samples))
  b = correlation_coeff * a + np.sqrt(1-correlation_coeff * correlation_coeff) * temp
  return a, b

In [18]:
N = 10000000
a,b, = get_samples(N)

# Verify that these two variables have zero mean and unit standard deviation
print("Mean of a = %3.3f,  Std of a = %3.3f"%(np.mean(a),np.std(a)))
print("Mean of b = %3.3f,  Std of b = %3.3f"%(np.mean(b),np.std(b)))

Mean of a = -0.000,  Std of a = 0.999
Mean of b = -0.000,  Std of b = 0.999


Now let's samples $N=10$ examples from $a$ and estimate their mean $\mathbb{E}[a]$.  We'll do this 1000000 times and then compute the variance of those estimates.

In [21]:
n_estimate = 1000000

N = 5

# TODO -- sample N examples of variable $a$
# Compute the mean of each
# Compute the mean and variance of these estimates of the mean
# Replace this line
mean_of_estimator1 = -1; std_of_estimateor1 = -1

# BEGIN_ANSWER
mean_a_estimates_1 = np.zeros((n_estimate))
for i in range(n_estimate):
    a,b = get_samples(N)
    mean_a_estimates_1[i] = np.mean(a)
mean_of_estimator_1 = np.mean(mean_a_estimates_1)
std_of_estimator_1 = np.std(mean_a_estimates_1)
# END_ANSWER


print("Standard estimator mean = %3.3f, Standard estimator variance = %3.3f"%(mean_of_estimator_1, std_of_estimator_1))

Standard estimator mean = -0.000, Standard estimator variance = 0.448


Now let's estimate the mean $\mathbf{E}[a]$ of $a$ by computing $a-b$ where $b$ is correlated with $a$

In [22]:
n_estimate = 1000000

N = 5

# TODO -- sample N examples of variables $a$ and $b$
# Compute $c=a-b$ for each and then compute the mean of $c$
# Compute the mean and variance of these estimates of the mean of $c$
# Replace this line
mean_of_estimator2 = -1; std_of_estimateor2 = -1

# BEGIN_ANSWER
mean_a_estimates_2 = np.zeros((n_estimate))
for i in range(n_estimate):
    a,b = get_samples(N)
    mean_a_estimates_2[i] = np.mean(a-b)
mean_of_estimator_2 = np.mean(mean_a_estimates_2)
std_of_estimator_2 = np.std(mean_a_estimates_2)
# END_ANSWER


print("Control variate estimator mean = %3.3f, Control variate estimator variance = %3.3f"%(mean_of_estimator_2, std_of_estimator_2))

Control variate estimator mean = -0.000, Control variate estimator variance = 0.283


Note that they both have a very similar mean, but the second estimator has a lower variance.   

TODO -- Experiment with different samples sizes $N$ and correlation coefficients.