https://plot.ly/python/anova/

  **Table of Contents**

  <div id="toc"></div>
  <script type="text/javascript"
  src="https://raw.github.com/kmahelona/ipython_notebook_goodies/master/ipython_notebook_toc.js">
  </script>


In [2]:
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.tools import FigureFactory as FF

import numpy as np
import pandas as pd
import scipy

import statsmodels
import statsmodels.api as sm
from statsmodels.formula.api import ols

# One-way ANOVA

An Analysis of Variance Test or an ANOVA is a generalization of the t-tests to more than 2 groups. Our null hypothesis states that there are equal means in the populations from which the groups of data were sampled.  More succinctly:

$$\mu_1 = \mu_2 = \cdots = \mu_n$$

for $n$ groups of data. Our alternative hypothesis would be that any one of the equivalences in the above equation fail to be met.

In [None]:
moore = sm.datasets.get_rdataset("Moore", "car", cache=True)

data = moore.data
data = data.rename(columns={"partner.status" :"partner_status"})  # make name pythonic

In [6]:
data.head()

Unnamed: 0,partner_status,conformity,fcategory,fscore
0,low,8,low,37
1,low,4,high,57
2,low,8,high,65
3,low,7,low,20
4,low,10,low,36


In [7]:
moore_lm = ols('conformity ~ C(fcategory, Sum)*C(partner_status, Sum)', data=data).fit()
table = sm.stats.anova_lm(moore_lm, typ=2) # Type 2 ANOVA DataFrame

print(table)

                                            sum_sq    df        F  PR(>F)
C(fcategory, Sum)                          11.6147   2.0   0.2770  0.7596
C(partner_status, Sum)                    212.2138   1.0  10.1207  0.0029
C(fcategory, Sum):C(partner_status, Sum)  175.4889   2.0   4.1846  0.0226
Residual                                  817.7640  39.0      NaN     NaN


# Two-way ANOVA
In a Two-Way ANOVA, there are two variables to consider. The question is whether our variable in question (tooth length $len$) is related to the two other variables $supp$ and $dose$ by the equation:

$$len = supp+dose+supp\times dose$$

In [8]:
data = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/tooth_growth_csv')
df = data[0:10]

table = FF.create_table(df)
py.iplot(table, filename='tooth-data-sample')


In [9]:
formula = 'len ~ C(supp) + C(dose) + C(supp):C(dose)'
model = ols(formula, data).fit()
aov_table = statsmodels.stats.anova.anova_lm(model, typ=2)
print(aov_table)


                    sum_sq    df       F      PR(>F)
C(supp)           205.3500   1.0  15.572  2.3118e-04
C(dose)          2426.4343   2.0  92.000  4.0463e-18
C(supp):C(dose)   108.3190   2.0   4.107  2.1860e-02
Residual          712.1060  54.0     NaN         NaN
