- Population $N$ is the entirety of your sample space, sample $n$ is a subset of it

- Sample mean $\bar{x}$ is going to have a distribution around the population mean $\mu_X$
    - The larger your sample size $n$, the more tightly your samples $\bar{x}$ will cluster around $\mu_X$

- Estimating population variance from sample variance is not quite the same as estimating the population mean from sample mean. We can prove it with a simple test below
    - Let's suppose there is a fair 3 sided die, and let X be the population of observations is (1,2,3)
        - $E[X] = 2$
        - $Var[X] = \frac{(1-2)^2 + (2-2)^2 + (3-2)^2}{3} = \frac{2}{3}$
    - We will take repeated samples of 2 rolls from this die, and try to estimate the population mean and variance from sample mean and variance
    - Clearly, the usual variance formula applied to the 2 sample case gives the wrong estimate of $Var[X]$

In [25]:
import numpy as np

dice_outcomes = np.array([1,2,3])
population_mean=np.mean(dice_outcomes)
population_var=np.var(dice_outcomes)

n = 10000
roll_1 = [np.random.choice(dice_outcomes) for _ in range(n)]
roll_2 = [np.random.choice(dice_outcomes) for _ in range(n)]
sample_mean = np.mean([np.mean(np.array([x,y])) for x,y in zip(roll_1, roll_2)])
sample_var_wrong = np.mean([((x - np.mean([x,y]))**2 + (y - np.mean([x,y]))**2)/2 for x,y in zip(roll_1, roll_2)])
sample_var_right = np.mean([((x - np.mean([x,y]))**2 + (y - np.mean([x,y]))**2)/(2-1) for x,y in zip(roll_1, roll_2)])

print(f'Population Mean: {population_mean} || Population Variance: {population_var}')
print(f'Sample Mean: {sample_mean} || Sample Variance Wrong: {sample_var_wrong} || Sample Variance Right: {sample_var_right}')

Population Mean: 2.0 || Population Variance: 0.6666666666666666
Sample Mean: 1.9973 || Sample Variance Wrong: 0.33325 || Sample Variance Right: 0.6665


- Law of large numbers
    - if sample is randomly drawn, sample size is sufficiently large, independent observations, then $E[X] \rightarrow \mu_X$ as $n \rightarrow \infty$

- Central limit theorem
    - Regardless of the underlying distribution, with sufficiently large sample size $n$ from a population of finite variance, $\mu_{\bar{x}} \rightarrow \mu_X$ and $s_x^2 \rightarrow \sigma_X^2$
    - Also, $\mu_{\bar{x}} \sim N(\mu_X, \frac{\sigma_X^2}{n})$