# Effect Size 


"After four decades of severe criticism, the ritual of null hypothesis significance testing---mechanical dichotomous decisions around a sacred .05 criterion---still persist. This article reviews the problems with this practice..." ... "What's wrong with [null hypothesis significance testing]? Well, among many other things, it does not tell us what we want to know, and we so much want to know what we want to know that, out of desperation, we nevertheless believe that it does!" (Cohen 1994)

### Effect Size

Q: OK! it is statistically significant but is it also practically significant too?

Let's try to explain what we mean.


__Scenerio__: Are SAT-Math scores at one college greater than the known population mean of 500?

Data is collected from a random sample of 1,200 students at that college. The population standard deviation is known to be 100. 
- Find a one-sample mean test and determine $p-value$. 
- Then determine whether null hypothesis should be rejected ($\alpha = 0.05$).


Suppose the null-hypothesis is 

$H_{0}$: $\mu = 500$

Write alternative hypothesis.

In [None]:
import numpy as np

In [None]:
np.random.seed(1800)
sample = np.random.normal(loc = 506, scale = 100, size = 1200)

## population mean is mu = 500
mu = 500

## Sample mean is x1_bar
x1_bar = sample.mean()
n1 = 1200
std_error = 100/ np.sqrt(n1)
## x1_bar is 506.0888
## Is this significant difference from population mean 500?

In [None]:
from scipy import stats

In [None]:
z = (x1_bar - mu)/ std_error

## let's use survival function (sf) sf = 1 - cdf

## is this significant for alpha = 0.05?
stats.norm.sf(z)

For some tests there are commonly used measures of effect size. For example, when comparing the difference in two means we often compute Cohen's which is the difference between the two observed sample means in standard deviation units. 

$$ \begin{gather}
 d = \frac{\bar{x}_{1} - \bar{x}_{2}}{s_{p}}\\
\text{where} \qquad s_{p} = \sqrt{\frac{(n_{1}-1)s_{1}^{2} + (n_{2}-1)s_{2}^{2} }{n_{1} + n_{2} - 2}}
\end{gather}$$


and if we have only one sample we use 

$$d = \frac{\bar{x}_{1} - \mu}{s}$$


where $s$ is the standard deviation of the sample

In [None]:
d = (x1_bar - mu)/ 100

d

Below are commonly used standards when interpreting Cohen's d:


<img src="cohens_d.png" alt="Cohen's d-table"
	title="Cohen's d-statistic" width="450" height="400" />
    
Image Source: [PennState Stat 200](https://newonlinecourses.science.psu.edu/stat200/lesson/6/6.4)

In [None]:
## in fact the similar result with less data would look like:
n1 = 9
np.random.seed(1800)
sample = np.random.normal(loc = 506, scale = 100, size = n1)

## population mean is mu = 500
mu = 500

## Sample mean is x1_bar
x1_bar = sample.mean()
n1 = 12
std_error = 100/ np.sqrt(n1)
## x1_bar is 506.0888
## Is this significant difference from population mean 500?

z = (x1_bar - mu)/ std_error

## recall that sf = 1 - cdf

## is this significant for alpha = 0.05 and with two-tailed test?
stats.norm.sf(z)



Note that for small sample size we didn't get a significant result but for very big sample size we were able to show that the mean is significantly different from the population. So the take away is, we should support the use of p_values with other statistics.

## Resources

- Null Hypothesis Significance Testing: A Review of an Old and Continuing Controversy - RS Nickerson

- [Penn State Statistics Courses](https://newonlinecourses.science.psu.edu/stat200/lesson/6/6.4)

- [Statistics For Business and Economics - 9.6](https://www.amazon.com/Statistics-Business-Economics-Book-Only/dp/0324783256)