In [2]:
import pandas as pd
import numpy as np
import scipy.stats as st

## Topic 2: Hypothesis Test
Is there strong evidence that the population mean $\mu$ is differenet from some value that is of interest to us.
Is it different from some hypothesized value?

The null hypothesis is $H_0: \mu = \mu_0$  
possible alternatives hypothesis: 
- $H_a: \mu < \mu_0$ (one-sided/tailed test)
- $H_a: \mu > \mu_0$ (one-sided/tailed)
- $H_a: \mu \neq \mu_0$ (two-sided/tailed)


There are two scenarios:
- $\sigma$ is known, Z-test
- $\sigma$ is unknown, T-test

### 1. Z-test & T-test

_suppose we have a simple random sample of n observations from a normally distributed population where $\sigma$ is known. The normality assumption, that we are sampling from a normally distributed populaiton is very important when the sample size is small. But as the sample size goes bigger and bigger, it is less and less important due to the centrel limit theorem._

To test $H_0: \mu = \mu_0$, we can sample from the populaiton: sample size n, sample mean $\bar{X}$)

Because of the centrel limit theorem: $$ \bar{X} \sim N(\mu, \frac{\sigma^2}{n}) $$ 

#### Case 1:if we know the population standard deviation, then we can use the Z-test:
$$Zscore = \frac{\bar{X} - \mu_0}{\sigma/\sqrt{n}} \sim N(0, 1) $$

#### Case 2: if we don't know the population standard deviation, then we can use the T statistic:
\begin{equation}
Tscore = \frac{\bar{X} - \mu_0}{\sigma_\bar{X}} = \frac{\bar{X} - \mu_0}{s/\sqrt{n}} \sim T(n-1\ degree\ of\ freedom)\\
where:\ \sigma_\bar{X} = \frac{s}{\sqrt{n}}\ and\ s = \sqrt{\frac{\Sigma(x_i - \bar{x})^2}{n - 1}} 
\end{equation}
Note:
- we use sample standard deviation(s) to estimate population standard deviation $\sigma$. We use $\frac{s}{\sqrt{n}}$, which called standard error of $\bar{X}$, to eastimate the standard deviation($\frac{\sigma}{\sqrt{n}}$) of the sampling distribution of the sample mean.
- T distribution is very similar to standard normal distribution except lower peak and heavier tails. As the degree of freedom increase, the T distribution tends toward the standard normal distribution, because the standard error estimates the standard deviation better and better.

### 2. Rejection Region Approach
1. choose a value for $\alpha$, the significance level of the test.  
$\alpha$ is the probability of rejecting the null hypothesis if it is true
2. Find the appropriate rejectoin region.
3. Reject the null hypothesis if the test statistic falls in the rejection region.

### 3. P-value  
The p-value measure how strength of the evidence against the null hypothesis.

__The definition__:  
the p-value is the probability of getting the ovserved value of the test statistic, or a vlaue with even greater evidence against null hypothesis, if the null hypothesis is true.  

- The smaller the p-value, the greater the evidence against the null hypothesis.  
- If we have a given significance level $\alpha$, then: reject $H_0$ if p-value <= $\alpha$.  
(if p-value <= $\alpha$, the evidence against $H_0$ is significant at the $\alpha$ level of significance)



### 4. Type I errors, Type II errors, Power of the Test.

- A Type I error is rejecting $H_0$ when, in reality, it is true.  
P(Tytpe I error|$H_0$ is true) = $\alpha$  
- A Type II error is failing to reject $H_0$ when, in reality, it is false.  
P(Tytpe II error|$H_0$ is false) = $\beta$  
$\beta$ depends on a number of factors, including the choice of $\alpha$, the sample size, and the true value of the parameter  
- Power is the probability of rejecting the null hypothesis, given it is false  
Power = 1 - P(Type II error) = 1 - $\beta$

|   ||               __Underlying reality__(unknown)|
|----|------------|----------------|----------------|
|    |            |$H_0$ is false  |$H_0$ is true   |
|__conclusion from test__(known)|reject $H_0$|correct decision|Type I error|
|    |Accept $H_0$|Type II error   |correct decision|

if we choose a very small $\alpha$, we will be making it very hard to reject the $H_0$. Then we will have a small chance to make a Type I error, but we have a very high chance to make a Type II error. The Power of the test is high.

If we choose a larger value of $\alpha$, it will be vice versa. The Power of the test is low.

||||Relationship between parameters|
|-|-|-|-|-|-|
|$\alpha$ up|P(reject) up|P(Type I) up|P(Type II) down|$\beta$ down|Power up|
|$\alpha$ down|P(reject) down|P(Type I) down|P(Type II) up|$\beta$ up|Power down|

### 5. Confident Interval
A 95% confidence interval is a range of values that you can be 95% certain contains the true mean of the population. This is not the same as a range that contains 95% of the values. 

Z-teste: we would reject $H_0$ at $\alpha$ = 0.05, if
$$
 \frac{\bar{X} - \mu_0}{\sigma/\sqrt{n}} <= -1.96\ or\ \frac{\bar{X} - \mu_0}{\sigma/\sqrt{n}} >= 1.96
$$
isolating $\mu_0$, we would reject $H_0$:
$$
\mu_0 >= \bar{X} + 1.96*\frac{\sigma}{\sqrt{n}}\ or\ \mu_0 <= \bar{X} - 1.96*\frac{\sigma}{\sqrt{n}}
$$

- The upper bound of the 95% confidence interval is $\bar{X} + 1.96*\frac{\sigma}{\sqrt{n}}$  
- The lower bound of the 95% confidence interval is $\bar{X} - 1.96*\frac{\sigma}{\sqrt{n}}$

### 6. Distribution of P-value

$H_0$: $\mu$ = 0  
$H_a$: $\mu$ > 0  
where population_sigma = 1.

I am testing 3 scenarios.
- The population mean is 0, which means $H_0$ is ture
- The population mean is 1, which means $H_0$ is false
- The populaiton mean is 2, which means $H_0$ is false  

I picked up 1000 samples from each scenario with sample size 50. After calculating the P-value of each sample, we can histgram the distribution of P-value for each scenario.

In [7]:
sample_size = 50
num_sample = 10000

for i in range(0, num_sample):
    # population_1
    mu1, sigma1 = 0, 1
    sample1 = np.random.normal(mu1, sigma1, sample_size)
    
    

    # population_2
    mu2, sigma2 = 1, 1
    sample1 = np.random.normal(mu2, sigma2, sample_size)

    # population_3
    mu3, sigma3 = 2, 1
    sample1 = np.random.normal(mu3, sigma3, sample_size)

In [6]:
sample1 = np.random.normal(mu1, sigma1, sample_size1)

array([-0.44128516, -0.60979832, -0.34385109,  1.30108423, -0.71732452,
       -1.91477831, -0.21408047,  1.27536415,  1.06191106, -1.24144945,
       -0.76279453, -1.0733659 ,  0.40119336,  0.53604004,  1.1890227 ,
       -1.91696102, -1.64305129, -0.4724886 , -0.98144517,  0.75864234])

### 7.Two Means Test:

#### Case 1: 
- independent simple random samples.
- Normally distributed populations.(not very important for large sample sizes.
- $\sigma_1$ and $\sigma_2$ are known.


The sample mean distribution of $\bar{X_1} - \bar{X_2}$:
- Has a mean of $\mu_1 - \mu_2$
- Has a standard deviation of $\sigma_{\bar{X_1} - \bar{X_2}} = \sqrt{\frac{\sigma_1^2}{n1} + \frac{\sigma_2^2}{n2}}$
- Is normally distributed


We want to test $H_0$: $\mu1 = \mu2$
\begin{equation}
Zscore = \frac{\bar{X_1} - \bar{X_2}}{\sqrt{\frac{\sigma_1^2}{n1} + \frac{\sigma_2^2}{n2}}} \sim N(0,1)
\end{equation}

Note: If we don't feel comfortable assuming normality, there are other options:
- Mann-Whitney U.(Also known as the Wicoxon rank-sum test)
- Bootstrap methods and permutaiton tests.

#### Case 2(pooled variance T-test):
- everything is the same as case 1 except we don't know $\sigma_1$ and $\sigma_2$
- equal population variances($\sigma_1^2 = \sigma_2^2 = \sigma^2$)

Step 1:  Estimate the common populaiton variance $\sigma^2$ with the pooled sample variance  
        $s_p^2 = \frac{(n_1 -1)s_1^2 + (n_2 -1)s_2^2}{n_1+n_2-2}$  
Step 2: Estimate the standard error of the difference in sample means:  
        $SE(\bar{X_1} - \bar{X_2}) = s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}$  
SE estimates the true standard deviation of the sampling distribution of the difference in sample means  
Step 3: $\bar{X_1} - \bar{X_2} \sim T(n1+n2-2\ degrees\ of freedom)$  
Step 4: A(1-$\alpha$)\*100% confident interval for $\mu_1 - \mu_2$ is __Estimator + margin of Error__  
$\bar{X_1} - \bar{X_2} \pm t_{\alpha/2} * SE(\bar{X_1} - \bar{X_2})$

Advantages:
- it is an exact procedure
- it is consistent with other commen statistical procedure. It means the conclusion will be the same if we use pooled

Disadvantages:
- additional assumptions.
- perform very poorly for some violations of the equal virance assumption
- the problem is worse if the sample sizes are very different, from which large sample sizes do not save us

#### Case 3(unpooled variance T-test, Welch T-test):
- everything is the same as case 1 except we don't know $\sigma_1$ and $\sigma_2$, and they are not equal
- Welch procedure is an approximate(not exact) procedure.

Step 1: Calculate the standard error of the difference in smaple means:
$SE_W(\bar{X_1} - \bar{X_2}) = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$  

$\bar{X_1} - \bar{X_2} \pm t_{\alpha/2} * SE_W(\bar{X_1} - \bar{X_2})$

Welch-Satterthwaite approximation to the degrees of freedom:  
$df = \frac{(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2})^2}{\frac{1}{n_1-1}(\frac{s_1^2}{n_1})^2 + \frac{1}{n_2-1}(\frac{s_2^2}{n_2})^2}$

Advantages:
- no require the assumption of equal population variances
- It works better than the pooled-variance t in many situations  

Disadvantages:
- only an approximate(not exact procedure)
- degree of freedom is a little complicated
- not consistent with other commen statistical techniques, like regression and ANOVA

### 8 Paired Difference t Procedure
When we take two measurements on the same individual or we take two measurements on a paired of people, like Twins, the samples are not independent, but dependent and paired. So we can't use the pooled variance T procedure. What we can do is calculate the difference between the paired groups and do a paired difference T procedure.  

Assumption:
- The sample differences are a simple random sample from the population of differences.
- The population of differences is normally distributed.

After we got the sample difference, it becomes a normal T procedure.  
$\frac{\bar{X} - \mu_0}{s/\sqrt{n}} \sim T(n-1\ degree\ of\ freedom)$

Note: if we don't feel it is reasonable to assume normality, we can use a nonparametric(distirbution-free) procedure.(e.g. Sign test, signed-rank test)