# 1. The difference in two population proportions

## 1.1 Assumptions
- Independent simple random samples.
- The sample sizes are large enough for the normal approximation to be reasonable.

## 1.2 Formulas

$$Z = \frac {\hat{p_1} - \hat{p_2}} {SE_0(\hat{p_1} - \hat{p_2})}$$

$$SE_0(\hat{p_1} - \hat{p_2}) = \sqrt{\hat{p}(1-\hat{p})
(\frac{1}{n_1}+\frac{1}{n_2})}$$

$$\hat{p} = \frac{X_1+X_2}{n_1+n_2}$$

## 1.3 Question

- **A study investigated a possible effect of a magnetic pulse on the ability of homing pigeons to navigate back to the home loft. Pigeons were randomly divided into a magnetic pulse group and a control group. The pigeons were then released from a location 106 km from the home loft.**
- **21 of the 39 pigeons that received a magnetic pulse returned to the home loft. 22 of the 38 control group pigeons returned to the home loft.**

In [1]:
from statsmodels.stats.proportion import proportions_ztest
import numpy as np
from scipy.stats import norm

In [2]:
def z_test_two_p_proportions(count, nobs, value, alternative, alpha_level):
    
    # Hypothesis testing
    z_test_statistic, p_value = proportions_ztest(count = count, nobs = nobs, 
                                                  value = value, alternative = alternative)
    # C.I. given an alpha level
    hat_p1 = count[0] / nobs[0]
    hat_p2 = count[1] / nobs[1]
    estimator = hat_p1 - hat_p2
    standard_error = np.sqrt((hat_p1*(1-hat_p1))/nobs[0] + (hat_p2*(1-hat_p2))/nobs[1])
    margin_error = abs((standard_error*norm.ppf(alpha_level/2)))
    
    upper = estimator + margin_error
    lower = estimator - margin_error
    
    # Final outputs
    return z_test_statistic, p_value, (lower, upper)

- **Test the null hypothesis that the population proportions are equal. (The magnetic pulse has no effect.)**

$$H_0: p_1 = p_2$$

$$H_0: p_1 \neq p_2$$

- **Construct a 95% confidence interval for the estimator.**

$$SE(\hat{p_1} - \hat{p_2}) = \sqrt{\frac{\hat{p_1}(1-\hat{p_1})}{n_1} + \frac{\hat{p_2}(1-\hat{p_2})}{n_2}
}$$

In [3]:
# Hypothesis testing: we should not reject the null hypothesis that the population proportions are equal
# or the magnetic pulse has no effect given the p-value 0.7206.
# 95% C.I. is also included 0 (further support that we should not reject the null hypothesis).
z_test_two_p_proportions(count = [21,22], nobs = [39,38], value = 0, 
                         alternative = 'two-sided', alpha_level = 0.05)

(-0.3576834327152399,
 0.7205802327368019,
 (-0.2621200200972479, 0.18114836017821948))