# Hypothesis testing

## Outline

- Hypothesis testing in Python
- P-values
- Python examples
- Wrap up

## Introduction to Hypothesis testing

In general, the purpose of statistics is to test a **hypothesis**. Hypothesis testing is used to determine if there is enough statistical evidence in favor of a specific hypothesis. A hypothesis test evaluates two **mutually exclusive** statements about a population to determine which statement is best supported by the sample data.

The statement that is favored is called the **null hypothesis**. The “alternative” (or antithesis) to the null hypothesis is, naturally, called the **alternative hypothesis**.

Hypothesis testing allows us to draw conclusions about an entire population based on a representative sample. 

You gain tremendous benefits by working with a sample, because in most cases, it is simply impossible to observe the entire population to understand its properties.

For example, we many want to determine if a new drug is effective in curing a certain disease. A sample of patients is randomly selected. Half of them are given the drug while the other half are given a placebo. The conditions of the patients are then measured and compared. The **null hypothesis** in this case is the efficacy of the new drug.

### Statement of the hypotheses
We use the following nomenclature to describe the two hypotheses:

- The null hypothesis is represented by $H_0$ 
- The alternative hypothesis is represented by $H_A$

### The **p-value**
In the context of hypothesis testing, the  **p-value** represents the probability of generating observed data that is favorable to the alternative hypothesis under the assumption of the null hypothesis. 

A small  p-value is evidence **against** the null hypothesis or **for** the alternative hypothesis.

- A p-value <= alpha-value indicates strong evidence **against** the null hypothesis
- A p-value >= alpha-value indicates strong evidence **for** the null hypothesis

Graphically, the **p-value** is the area in the tail of a probability distribution. The p-value is calculated using the sampling distribution of the test statistic under the null hypothesis 

### Test Statistic

A test statistic is used in a hypothesis test when you are **deciding to support or reject the null hypothesis**. The test statistic takes your data from sample and compares your results to the results you would expect from the null hypothesis. We will be exploring two types of test statistic, the **Z-score and the t-score**. This allows us to compare our test statistic value to a normal or t-distribution respectively.

### The **alpha-value**
Before you run any statistical test, you must first determine your alpha level, which is also called the **“significance level.”** By definition, **the alpha value is the probability of rejecting the null hypothesis when the null hypothesis is true.** Translation: **It's the probability of making a wrong decision.**

Under most circumstances, an alpha value of 0.05 (or 95% confidence level) is used which means that the likelihood of a wrong decision is intended to be very small. Keep in mind, that if you are analyzing something like airplane engine failures, you may want to use a smaller alpha, like 0.01 or 99% confidence level.

### What type of test to run?

As we discussed, the **p-value** is the area under the tail of the curve that is outside of the confidence interval.

Since the distribution is symmetrical, the confidence level can apply to either side of the distribution. Depending on the Hypothesis condition expressed determines which type of test we will apply to determine if the null hypothesis is accepted or rejected. 

A hypothesis test can be **one-tailed or two-tailed**. If the inequality in the alternative hypothesis is < or >, the test is one-tailed. If the inequality is ≠, the test is two-tailed.

![](images/hyp_test_types1.png)

### Now we have the details to be able to interpret the results of our hypothesis tests

#### First: Confidence level + alpha = 1
   If the alpha equals 0.05, then the confidence level is 0.95
#### Second: If the p-value is low, the null hypothesis is rejected
   If the p-value is less than the alpha, then you risk making the wrong decision, so you reject the null hypothesis.
#### Third: The confidence interval and the p-value will lead to the same conclusion
   If the p-value is less than the alpha, then the **confidence interval** will NOT contain the hypothesized mean.

## Standard calculation of the p-value with Python
In this case we will use a set of male body temperatures for our analysis. We know the average temperature should be 98.6. We will use a 95% confidence level.

We will set the hypotheses as follows:
$$
H_0 = 98.6\\
H_A \neq 98.6
$$

So in this case, since the alternate hypothesis has a $\neq$ condition, we will use a **two-sided test**

### Set up our environment

In [22]:
# set up our environment
import pandas as pd
import numpy as np
from scipy.stats import norm
import scipy as sp

# Read into the data called `bodytemp.csv`
body = pd.read_csv('data/bodytemp.csv')

In [23]:
# View our data
# 0 means male and 1 means female
body.head()

Unnamed: 0,temp,sex,bpm
0,96.3,0,70
1,96.7,0,71
2,96.9,0,74
3,97.0,0,80
4,97.1,0,73


In [24]:
# 65 male samples in our data set
m = body[body['sex'] == 0].temp.mean()
std_dev = body[body['sex'] == 0].temp.std()
std_err = std_dev/np.sqrt(65)
[m,std_dev,std_err]

[98.1046153846154, 0.6987557623265904, 0.08666998552285868]

Our mean value is off by about 0.5 degrees. **Is this sufficient to reject the null hypothesis that the mean is 98.6?**

To examine this question, we compute the probability that we could get that computed mean or worse under the assumption of the null hypothesis. Put another way, we need to find the shaded area below where the normal curve has mean  98.6  and standard deviation  0.6987 as dictated by the problem.

This area can be computed by looking the appropriate  Z -score up in a table, or it can be computed as follows:

## Calculate the test statistic using the Z-score (normal)

In [26]:
# calculate the test statistic (Z score)
test_stat = (m - 98.6)/std_err
test_stat

-5.7157574493183665

In [30]:
#calculate the p-value
p_value = norm.pdf(abs(test_stat))*2 #two-sided test using the probability function 
print('two sided: %.11f' % p_value)

two sided: 0.00000006423


### Using the P-value to determine acceptance or rejection of the null hypothesis

So at a 95% confidence level, the small p-value in this case indicates that the null hypothesis IS rejected, meaning that the mean is NOT within our confidence level of 95%

## Calculate the test statistic using the t-score (t-distribution)

Effectively we will be performing the same calculation as above, but by using the scipy function **t.sf** AND using the T-test.


In [38]:
# calculate sample mean and variance
import statistics as stats
#sm = sample mean
#sv = sample standard deviation
sm = stats.mean(body[body['sex'] == 0].temp)
sd = stats.stdev(body[body['sex'] == 0].temp)
print('sample mean: ', sm, 'sample variance: ', sd)

sample mean:  98.10461538461539 sample variance:  0.698755762326591


In [39]:
# calculate the t-statistic and the p-value
tt = (sm-98.6)/np.sqrt(sd/float(65))  # t-statistic for mean
pval = sp.stats.t.sf(np.abs(tt), 65-1)*2  # two-sided pvalue = Prob(abs(t)>tt)
print ('t-statistic =', tt, ' pvalue = %.11f' % pval)

t-statistic = -4.7778937989579315  pvalue = 0.00001073986


### Using the P-value to determine acceptance or rejection of the null hypothesis

So at a 95% confidence level, the small p-value in this case indicates that the null hypothesis IS rejected, meaning that the mean is NOT within our confidence level of 95%

# Wrap up

- Hypothesis testing in Python
- P-values
- Python examples
- Wrap up