<center> <img src="res/ds3000.png"> </center>

<center> <h1> Week 9 - Day 2 </h1> </center>

<center> <h2> Part 2: One-Way ANOVA</h2></center>

## Outline
1. <a href='#1'>The Dataset</a>
2. <a href='#2'>Exploratory Data Analysis</a>
3. <a href='#3'>One-Way ANOVA</a>
4. <a href='#4'>Assumption Checks</a>
5. <a href='#5'>Post-hoc Tests</a>
6. <a href='#6'>Reporting the Results</a>

<a id="1"></a>

## 1. The Dataset
* Fundamental library for scientific computing
    * https://docs.scipy.org/doc/scipy/reference/
* SciPy has a special module, stats, dedicated to common statistical tests used in data analysis
    * https://docs.scipy.org/doc/scipy/reference/tutorial/stats.html

In [None]:
from scipy import stats

In [None]:
import pandas as pd

### 1.1. Dataset from a Between-Subjects Experiment

In [None]:
data = pd.read_csv("res/wand_candles_three_data.csv")

In [None]:
data

<a id="2"></a>

## 2. Exploratory Data Analysis
* Involves checking the descriptive stats and visualizing the data before conducting the test

### 2.1. Descriptive Statistics

In [None]:
descriptives = data.groupby("Group").agg(["count", "mean", "std", "sem"])
descriptives

### 2.2. Visualizing the Data

In [None]:
descriptives = descriptives["Candles"]
descriptives

In [None]:
descriptives.reset_index(inplace=True)

In [None]:
import plotly.express as plt
graph = plt.bar(descriptives, x = "Group", y = "mean", error_x = "sem", error_y = "sem", template='none', width=500, 
                labels = {"mean": "Number of Candles", "Group": "Wand Used"})

graph.update_traces(marker_color=["#d3d3d3", "#FFF", "#FFF"])
graph.update_traces(marker= dict(line={"width":3,"color":"#000000"}))

graph.update_xaxes(title_font={"size":16}, tickfont = {"size":14, "color":"gray"})
graph.update_yaxes(title_font={"size":16}, tickfont = {"size":14, "color":"gray"})


graph.show()

<a id="3"></a>

## 3. One-Way ANOVA
* Use the **f_oneway()** method available in SciPy's stats module
* **f_oneway()** accepts two or more sequence-like objects (lists, Series, etc) corresponding to the distribution of scores in each group being compared
    * **f_oneway(group_a, group_b, group_c)**
* **f_oneway()** returns a tuple containing the calculated t statistic and p-value
* https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html

In [None]:
stats.f_oneway?

In [None]:
elder_wand = data[data["Group"] == "Elder Wand"]["Candles"]
elder_wand

In [None]:
regular_wand = data[data["Group"] == "Regular Wand"]["Candles"]
regular_wand

In [None]:
personal_wand = data[data["Group"] == "Personal Wand"]["Candles"]
personal_wand

In [None]:
stats.f_oneway(elder_wand, regular_wand, personal_wand)

### 3.1. F Test Results
* f_oneway() method returns a tuple containing the calculated F statistic and p-value
    * the first element of the tuple is the F statistic
    * the second element of the tuple is the p-value

In [None]:
results = stats.f_oneway(elder_wand, regular_wand, personal_wand)

In [None]:
#t value
fstatistic = results[0]
fstatistic

In [None]:
#p value in scientific notation
pvalue = results[1]
pvalue

In [None]:
format(pvalue, '.10f')

### 3.3. Degrees of Freedom
* Unfortunately, the f_oneway() method does not provide the degrees of freedom (df) value. 
* We can calculate it ourselves though!
* For a one-way ANOVA test, two df values are calculated:
    * df1 = (k - 1), where k is the number of groups being compared
    * df2 = (n1 -1) + (n2 - 1) + (n3 - 1)

In [None]:
#number of rows in descriptives corresponds to number of groups we have in this dataset
df1 = len(descriptives) - 1
df1

In [None]:
df2 = (len(elder_wand) - 1) + (len(regular_wand) - 1) + (len(personal_wand) - 1)
df2

<a id="4"></a>

## 4. Assumption Checks
* One-way ANOVA makes two assumptions:
    * Assumption of equality of variances
    * Assumption of normality


### 4.1. Checking for Equality of Variances
* Levene’s Test of Equality of Variances
    * Use the **levene()** method in SciPy's stats module
    * https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.levene.html#scipy.stats.levene
 
* **levene()** returns a tuple containing the results of the assumption check
    * You want non-significant results from assumption checks (p > .05)

In [None]:
levene_results = stats.levene(elder_wand, regular_wand, personal_wand)
levene_results

### 4.2. Checking for Normality
* Shapiro-Wilk Test of Normality
    * Use the **shapiro()** method in SciPy's stats module
    * https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html#scipy.stats.shapiro

* **shapiro()** returns a tuple containing the results of the assumption check
    * You want non-significant results from assumption checks (p > .05)

In [None]:
shapiro_elder = stats.shapiro(elder_wand)
shapiro_elder

In [None]:
shapiro_regular = stats.shapiro(regular_wand)
shapiro_regular

In [None]:
shapiro_personal = stats.shapiro(personal_wand)
shapiro_personal

<a id="5"></a>

  
## 5. Post-hoc tests
* Need to follow up a significant ANOVA with a post hoc test
* Stats Models library has a multicomparison method that would allow you to compare multiple groups using Tukey HSD correction
* http://www.statsmodels.org/devel/generated/statsmodels.sandbox.stats.multicomp.MultiComparison.html

In [None]:
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.stats.multicomp import MultiComparison

In [None]:
mc = MultiComparison(data["Candles"], data["Group"])

tukey_result = mc.tukeyhsd()
 
print(tukey_result)

<a id="6"></a>

  
## 6. Reporting the Results
* Report
    * descriptives
    * assumption checks
    * f statistic, degrees of freedom (df1 and df2), and p-value
    * a bar graph

In [None]:
descriptives

In [None]:
fstatistic, pvalue, df1, df2

In [None]:
levene_results

In [None]:
graph.show()