# Statistical hypothesis and tests

A _statistical hypothesis_ is a statement about population parameter(s), while a _statistical test_ is a method to determine the validity of that statement. A _statistical hypothesis test_ is used to determine whether the data support the null hypothesis or the alternative hypothesis. The decision to accept or reject the null hypothesis is based on the results of the _statistical test_.

## Hypothesis

**H0:** =   
**H1:** <, >, !=

H0 is ... | True | False
--- | --- | ---
Rejected | **Type I error** <br> False positive <br> P = $\alpha$ | Correct decision <br> True positive <br> P = 1 - $\beta$
Not Rejected | Correct decision <br> True negative <br> P = 1 - $\alpha$ | **Type II error** <br> False negative <br> P = $\beta$

![image](./images/Type-I-and-II-errors.jpg)

$\alpha$ = 0.05 (0.01 - 0.1, smaller more accurate)  
$\beta$ = 0.8

If p-value < $\alpha$: the result of test is _statistical significance_, another - _statistical non-significance_

## Tests (criterions)

Flowchart for choose a test:
- [StatsTest.com](https://www.statstest.com/)
- [Statsflowchart.co.uk](https://www.statsflowchart.co.uk/)

## Parametric test

Only for normal distributed data

![image](./images/ParametricTestMeansFlowchart.jpg)

### <a name="normal_dist_test"></a> Normal test

In [9]:
# Shapiro–Wilk test

import pandas as pd
from scipy.stats import shapiro

df = pd.read_csv('./data/pizzas.csv')
display(df.head())
alpha = 0.05

_, p_value = shapiro(df)
print(round(p_value, 4))

if p_value > alpha:
    print('The data is distributed normally')
else:
    print('The data is not distributed normally (we reject H0)')

Unnamed: 0,Making Unit 1,Making Unit 2
0,6.809,6.7703
1,6.4376,7.5093
2,6.9157,6.73
3,7.3012,6.7878
4,7.4488,7.1522


0.2045
The data is distributed normally


In [7]:
# D'Agostino's K^2 test

import pandas as pd
from scipy.stats import normaltest

df = pd.read_csv('./data/pizzas.csv')
alpha = 0.05

_, p_value = normaltest(df)
print(round(p_value[0], 4))

if p_value[0] > alpha/2:
    print('The data is distributed normally')
else:
    print('The data is not distributed normally (we reject H0)')

0.2514
The data is distributed normally


## Non Parametric test

For normal and not distributed data

![image](./images/NonParametricFlowchartMedians.jpg)