# 1. The ratio of comparing two variances of two populations

## 1.1 Assumptions
- Independent simple random samples.
- Normally distributed populations. (A very important assumption even having a very large sample.)
    - When normality assumption violated: go Levene's or Brown-forsythe test.

## 1.2 Formulas

$$F = \frac {s_1^2} {s_2^2}$$

$$d.f. = n_1 - 1, n_2 - 1$$

## 1.3 Question

- **The ratio of the length of the second digit (index finger) to that of the fourth digit (ring finger), known as the 2D:4D ratio, is often studied by researchers. A study investigated possible differences in the 2D:4D ratio in children born to parents who were first cousins, and children born to parents who were not related.**
- **Children born to parents who were first cousins has the variance of 2D:4D ratio = 0.00072 and the sample size = 122. Children born to parents who were not related has the variance of 2D:4D ratio = 0.00038 and the sample size = 142.**

- **Test the null hypothesis that the two variances are equal.**

$$H_0: \sigma_1^2 = \sigma_2^2$$

$$H_1: \sigma_1^2 \neq \sigma_2^2$$

- **Construct a 95% confidence interval for the estimator.**

$$C.I. = (\frac {1}{F_\frac{\alpha}{2}} \frac {s_1^2}{s_2^2}, \frac {1}{F_{1-\frac{\alpha}{2}}} \frac {s_1^2}{s_2^2})$$

In [1]:
# Reference: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f.html
from scipy.stats import f

F = 0.00072 / 0.00038
p_value = (1 - f.cdf(F, 122-1, 142-1)) * 2
p_value

0.0002695226754223512

In [2]:
f.ppf(0.05/2, 122-1, 142-1), f.ppf(1 - (0.05/2), 122-1, 142-1)

(0.7065917265888972, 1.4087498360325232)

In [3]:
# 95% C.I.
(1 / 0.7065917265888972)*F, (1/1.4087498360325232)*F

(2.681515747788598, 1.3449775067526752)

In [4]:
# Wrap all the desired outputs into a function
def f_test_two_p_variance(var1, var2, n1, n2, alpha_level):
    '''
    Perform F-test for comparing variances between two populations.
    Outputs: F ratio, p-value, and confidence intervals given an alpha level.
    var1: the variance of the population 1
    var2: the variance of the population 2
    n1: sample size of the population 1
    n2: sample size of the population 2
    alpha_level: can be either 0.1, 0.05, or 0.1
    '''
    # F test statistic
    F_ratio = var1 / var2
    
    # p-value (usually the interest is to compare whether two variances are equal
    p_value = (1 - f.cdf(F, n1-1, n2-1)) * 2
    
    # 95% C.I.
    upper = 1 / (f.ppf(alpha_level/2, n1-1, n2-1)) * F
    lower = 1 / (f.ppf(1 - (alpha_level/2), n1-1, n2-1)) * F
    
    return F_ratio, p_value, (lower, upper)

In [5]:
# Hypothesis testing: reject the null hypothesis that the two variances are equal (p-value is significant)
# From the 95% C.I. the lower range is also above 1 (further reject the null hypothesis)
f_test_two_p_variance(var1 = 0.00072, var2 = 0.00038,
                      n1 = 122, n2 = 142,
                      alpha_level = 0.05)

(1.8947368421052633,
 0.0002695226754223512,
 (1.3449775067526752, 2.681515747788598))