# ANOVA

**An**alysis **O**f **VA**riance  

Method to partition variance within and between groups  

## ANOVA

Main test for one-way ANOVA  
- Do any of the groups differ from the global mean?  

Assumes  
- homoskedasticity (equal variances)  
- Normal distribution of errors (Normality assumption)  

Is more robust to deviation from normality assumption than from homoskedasticity assumption  

## ANOVA

Based on dividing **Sum of Squares** into:  
- within groups  
- between groups  


### Example  

Read in the file "Rhizobium.csv"  

For each column (i), we need to compute
1. Sum:  $\sum_{j}{} Y_{ij} = Y_{i\cdot}$ 
2. Sum of squares: $\sum_{j}{} Y_{i,j}^2$   
3. Squared sum divided by r (replicates): $\large\frac{(Y_{i\cdot})^2}{r}$
4. Sum of squared deviants: $\sum_{j}(Y_{ij} - \bar Y_{i\cdot})^2$  
5. Mean:  $\bar Y_{i\cdot}$


### Example  

Calculate the row totals for 1 to 4 above  
1. Sum of Sums: $\sum_{i} Y_{i\cdot} = Y_{\cdot\cdot} $ 
2. Sum of Sum of squares : $\sum_{i} \sum_{j}{} Y_{i,j}^2 $
3. ...


### Example  

$\large SS(total) = \sum_{i,j} Y_{i,j}^2 - C $  

C is a "correction" term 

$\large C = \frac{Y_{\cdot\cdot}^2}{rt}$  

where  
- r = replicates
- t = treatments




### Example  

$\large SS(treatment) = \frac{\sum_\limits{i=1}^{t} Y_{i,\cdot}^2}{r} - C = \frac{Y_{1\cdot}^2 + Y_{2\cdot}^2+\cdots + Y_{t\cdot}^2} {r} -C$ 

$\large \frac{\sum_\limits{i=1}^{t} Y_{i,\cdot}^2}{r}$ was computed in row 3

### Example  
$SS(error) = SS(total) - SS(treatment)$

Here:
$\large C=\frac{(596.6)^2}{(5)(6)} = 11,864.38$

$$\large SS(total)= 12,994.36 - 11,864.38 = 1,129.98$$

$$\large SS(treatment) = 12711.43 - 11,864.38 = 847.05$$

$$\large SS(error) = 1,129.98 - 847.05= 282.93$$

But, SS(error) = sum of row 4  




### Example  

SS(error) can futner be computed as:

$$\large SS(error) = \sum_\limits{i}{}(\sum_\limits{j}{}Y_{ij}^2-\frac{Y_{i\cdot}^2}{r})$$

For each column: row<sub>2</sub> - (row<sub>1</sub>)<sup>2</sup>/r
Sum all the products  

## ANOVA  -- Puting it together

ANOVA of "Rhizobium" data:

|Source of varitation|df|sum of squares|Mean square|F|  
|----|--|---|---|---|   
|Among cultures |5|847.05|169.41|14.37|
|Within cultures|24|282.93|11.79| |
|Total|29|1,129.98| | |

In [8]:
import scipy.stats as st
print("P value for F(5,24): %7.5f" % (st.f.sf(14.37,24,5) ))

P value for F(5,24): 0.00380
