# Test 8: t-test for two population means (variances unknown but equal)

## Objective

- To investigate the significance of the difference between the means of two populations.

## Assumptions

- Assume variance of both populations are equal
- If the variance is known, use **Test 2: Z-test for two population means**
- The test is accurate only if the populations are normally distributed. 
    - If the populations are not normal, the test will give an approximate guide

## Method

- Assume you have 2 populations with means $mu_1$ and $mu_2$
- I take samples of both populations of size $n_1$ and $n_2$
- From this, I compute the sample means $\bar{x_1}$ and $\bar{x_2}$
- For the sample, I compute the sum of squared differences for each population
$$\begin{aligned}
    s_1^2 &= \sum_{i=1}^{n_1} (x_i - \bar{x_1})^2 \\
    s_2^2 &= \sum_{i=1}^{n_2} (x_i - \bar{x_2})^2 \\
\end{aligned}$$

- Using these, we compute a pooled estimate of population variance
$$\begin{aligned}
    s^2 &= \frac{[(n_1 - 1) \cdot s_1^2 + (n_2 - 1) \cdot s_2^2]}{n_1 + n_2 - 2}
\end{aligned}$$

- Finally, we can compute the test statistic as
$$\begin{aligned}
    t &= \frac{(\bar{x_1} - \bar{x_2}) - (\mu_1 - \mu_2)}{s \cdot \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}
\end{aligned}$$

## Example

- Problem
    - Two snack foods are made and sold in 30 gm packets. 
    - Samples of size $n_1=12, n_2=12$ are taken from two production lines of each snack food and means $\mu_1, \mu_2$ and sum of squared differences $s_1, s_2$ obtained (see formulas above)
    - From these, it is found that 
        - $\bar{x_1} = 31.75$
        - $s_1^2 = 112.25$ 
        - $\bar{x_2} = 28.67$
        - $s_2^2 = 66.64$ 

- Question
    - Do the 2 processes produce outputs of the same weight?
    - Is process 1 producing heavier/lighter products?

In [7]:
n1 = n2 = 12
mu1=mu2=30
xbar1 = 31.75
xbar2 = 28.67
s1_sq = 112.25
s2_sq = 66.64
degrees_of_freedom = n1 + n2 - 2

def compute_pooled_variance(n1, n2, s1_sq, s2_sq):
    return (((n1-1) * s1_sq) + ((n2-1) * s2_sq))/(n1 + n2 - 2)

s_sq = compute_pooled_variance(n1, n2, s1_sq, s2_sq)

def compute_test_statistic(xbar1, xbar2, mu1, mu2, n1, n2, s_sq):
    return ((xbar1 - xbar2) - (mu1 - mu2)) / (s_sq * (1/n1 + 1/n2))**0.5

print(f'{degrees_of_freedom=}')
compute_test_statistic(xbar1, xbar2, mu1, mu2, n1, n2, s_sq)

degrees_of_freedom=22


0.7977160084385088