# Confidence intervals for difference in means

Assume two samples of sizes $n_1$ and $n_2$ for the two different populations.

Compute the two sampling averages, $\hat{\mu_1}$ and $\hat{\mu_2}$ and the two empirical standard deviations $\hat{\sigma_1}$ and $\hat{\sigma_2}$.

Set $$SE=\sqrt{\frac{\hat{\sigma_1}^2}{n_1} +\frac{\hat{\sigma_2}^2}{n_2}}$$

If $n_1$ and $n_2$ are large enough and if the sampling was independent (within each group and between the groups) then we have, with confidence $1-\alpha$:
$$\mu_1-\mu_2 \in (\hat{\mu_1}-\hat{\mu_2})±SE\Phi^{-1}(1-\frac{\alpha}{2}) $$


In [1]:
from scipy.stats import norm
import numpy as np
import statsmodels.stats.api as sms
# Comparing two sample means.
# D: theoretical difference
# a: confidence level
# R: number of repeats

D = 5
a = 0.05
R = 1000
n1, mu1, std1 = 1000, 10, 2
n2, mu2, std2 = 1000, mu1+D, 2
c = 0.0
c1 = 0.0
for _ in range(R):
    # generate data
    x1 = norm.rvs(mu1, std1, n1)
    x2 = norm.rvs(mu2, std2, n2)
    # manual calculation
    m1, s1 = x1.mean(), x1.std(ddof=1)
    m2, s2 = x2.mean(), x2.std(ddof=1)
    SE = np.sqrt(s1**2 / n1 + s2**2 / n2)
    conf_int = (m2-m1 - SE*norm.ppf(1-a/2) , m2-m1 + SE*norm.ppf(1-a/2))
    c += D > conf_int[0] and D < conf_int[1]
    # using statsmodels package
    comp = sms.CompareMeans(sms.DescrStatsW(x2),sms.DescrStatsW(x1))
    conf_int1 = comp.tconfint_diff()
    c1 += D > conf_int1[0] and D < conf_int1[1]
print(a,c/R)
print(a,c1/R)

0.05 0.942
0.05 0.942
