# Test 9: t-test for two population means (variances unknown and unequal)


## Objective

- To investigate the significance of the difference between the means of two populations

## Assumptions

- The great thing about this test is that we need to know nothing about the variance of the original 2 populations
    - If we know the variance of the populations, use **Test 3: Z-test for two population means** 

- The test is approximate if the populations are normally distributed / if the sample sizes are sufficiently large

- The test **can only be used as two tail test**

## Method

- You have two populations with means $\mu_1$ and $\mu_2$
- Take a sample from each population of size $n_1$ and $n_2$
- Use this to compute sample means $\bar{x_1}$ and $\bar{x_2}$
- Use this to compute sample variance 
$$\begin{aligned}
    s_1^2 &= \frac{\sum_{i=1}^{n_1} (x_i - \bar{x_1})^2}{n_1 - 1} \\
    s_2^2 &= \frac{\sum_{i=1}^{n_1} (x_i - \bar{x_2})^2}{n_2 - 1} \\
\end{aligned}$$

- Finally, compute the test statistic using

$$\begin{aligned}
    t &= \frac{(\bar{x_1} - \bar{x_2}) - (\mu_1 - \mu_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}
\end{aligned}$$

- The test statistic follows a students' t-distribution with degrees of freedom given by:

$$\begin{aligned}
    v &= \frac{(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2})^2}{\frac{s_1^4}{n_1^2 \cdot (n_1 -1)} + \frac{s_2^4}{n_2^2 \cdot (n_2 - 1)}}
\end{aligned}$$

## Example

- Problem
    - Two financial organizations are about to merge and, as part of the rationalization process, some consideration is to be made of service duplication. 
    - Two sales teams responsible for identical products are compared by selecting samples from each and reviewing their respective profit contribution levels per employee over two weeks 
    - Assume the population means $\mu_1 = \mu_2$ are equal
    - The sample sizes are $n_1 = 4$ and $n_2 = 9$
    - The sample means are $x_1 = 3166$ and $x_2 = 2240$ 
    - The sample variances are $s_1^2 = 6328.27$ and $s_2^2 = 221661.3$

- Question
    - Are the means significantly different?
    - Remember, only 2 tail test possible
    

In [6]:
mu1 = mu2 = 0
n1=4
n2=9
xbar1 = 3166
xbar2 = 2240
s1_sq = 6328.27
s2_sq = 221661.3

def compute_test_statistic(xbar1, xbar2, mu1, mu2, s1_sq, s2_sq, n1, n2):
    return ((xbar1-xbar2) - (mu1-mu2)) / ((s1_sq/n1 + s2_sq/n2)**0.5)

compute_test_statistic(xbar1, xbar2, mu1, mu2, s1_sq, s2_sq, n1, n2)

def compute_degrees_of_freedom(s1_sq, s2_sq, n1, n2):
    numerator = (s1_sq/n1 + s2_sq/n2)**2
    denom1 = s1_sq**2 / (n1**2 * (n1-1))
    denom2 = s2_sq**2 / (n2**2 * (n2-1))
    return numerator / (denom1 + denom2)

compute_degrees_of_freedom(s1_sq, s2_sq, n1, n2)

8.96217028679