<a id='section5'></a>

## Chi-square test

Also called "Goodness of fit test".   
This is also used to test if two distributions are consistent.   


The test statistic (CV) is computed as    

$\chi^2 = \sum_i (O_i - E_i)^2/E_i$

where $O_i$ and $E_i$ are observed and expected frequency counts. 

after computing CV and degrees of freedom ($= n - 1$), we use the chi-square calculator to get the probability of P(chi2 $\leq$ CV) http://stattrek.com/online-calculator/chi-square.aspx

**Note** $\chi^2$ is the ratio between non negative values, so we need to shift our data to the positive scale before doing chi-square tests.

### Toy exercise  

Let's reproduce the example at
http://stattrek.com/chi-square-test/goodness-of-fit.aspx?Tutorial=AP

In [1]:
import numpy as np
from scipy import stats

E=np.array([30,60,10])
O=np.array([50,45,5])

cv = np.sum((O-E)**2/E)
print(cv)
df=2
crit = stats.chi2.ppf(q = 0.95, # Find the critical value for 95% confidence*
                      df = df)   # *


print("Critical value for 95% confidence",crit)

p_value = float(1.0 - stats.chi2.cdf(x=cv, df=df))
print("P value %12.6f" % p_value)

19.5833333333
Critical value for 95% confidence 5.99146454711
P value     0.000056


Here we reproduce the findings in the web site, finding a small P value for this test.   
Let's test normality for our X distribution.   
This time we will compare X to a normal distribution with the mean value of X and same standard deviation.

### Scipy's $\chi^2$ test

Let's do the same test, using the scipy's implementation of $\chi^2$ test




In [2]:
stats.chisquare(f_obs = O,f_exp = E)


Power_divergenceResult(statistic=19.583333333333336, pvalue=5.5915626856371765e-05)