# Challenge 2 - ANOVA

In statistics, **Analysis of Variance (ANOVA)** is also used to analyze the differences among group means. The difference between t-test and ANOVA is the former is ued to compare two groups whereas the latter is used to compare three or more groups. [Read more about the difference between t-test and ANOVA](http://b.link/anova24).

From the ANOVA test, you receive two numbers. The first number is called the **F-value** which indicates whether your null-hypothesis can be rejected. The critical F-value that rejects the null-hypothesis varies according to the number of total subjects and the number of subject groups in your experiment. In [this table](http://b.link/eda14) you can find the critical values of the F distribution. **If you are confused by the massive F-distribution table, don't worry. Skip F-value for now and study it at a later time. In this challenge you only need to look at the p-value.**

The p-value is another number yielded by ANOVA which already takes the number of total subjects and the number of experiment groups into consideration. **Typically if your p-value is less than 0.05, you can declare the null-hypothesis is rejected.**

In this challenge, we want to understand whether there are significant differences among various types of pokemons' `Total` value, i.e. Grass vs Poison vs Fire vs Dragon... There are many types of pokemons which makes it a perfect use case for ANOVA.

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Load the data:
pokemon = pd.read_csv("pokemon.csv")

**To achieve our goal, we use three steps:**

1. **Extract the unique values of the pokemon types.**

1. **Select dataframes for each unique pokemon type.**

1. **Conduct ANOVA analysis across the pokemon types.**

#### First let's obtain the unique values of the pokemon types. These values should be extracted from Type 1 and Type 2 aggregated. Assign the unique values to a variable called `unique_types`.

*Hint: the correct number of unique types is 19 including `NaN`. You can disregard `NaN` in next step.*

In [12]:
unique_types = pokemon["Type 2"].unique()

len(unique_types)

19

#### Second we will create a list named `pokemon_totals` to contain the `Total` values of each unique type of pokemons.

Why we use a list instead of a dictionary to store the pokemon `Total`? It's because ANOVA only tells us whether there is a significant difference of the group means but does not tell which group(s) are significantly different. Therefore, we don't need know which `Total` belongs to which pokemon type.

*Hints:*

* Loop through `unique_types` and append the selected type's `Total` to `pokemon_groups`.
* Skip the `NaN` value in `unique_types`. `NaN` is a `float` variable which you can find out by using `type()`. The valid pokemon type values are all of the `str` type.
* At the end, the length of your `pokemon_totals` should be 18.

In [14]:
pokemon_totals = []

for types in unique_types:
    if type(types) != float:
        type_mean=list(pokemon[(pokemon["Type 1"]== types)|(pokemon["Type 2"]== types)]["Total"])
        pokemon_totals.append(type_mean)
    else:
        continue

len(pokemon_totals) 
pokemon_totals

[[318,
  405,
  525,
  625,
  195,
  205,
  395,
  495,
  288,
  438,
  275,
  365,
  505,
  273,
  365,
  505,
  245,
  455,
  320,
  395,
  490,
  305,
  450,
  300,
  390,
  490,
  335,
  515,
  325,
  500,
  310,
  405,
  500,
  600,
  340,
  490,
  250,
  390,
  535,
  430,
  385,
  400,
  302,
  467,
  458,
  280,
  515,
  329,
  479,
  330,
  500,
  300,
  490,
  260,
  360,
  485,
  329,
  474,
  294,
  464,
  320,
  494],
 [534,
  634,
  395,
  251,
  349,
  479,
  579,
  262,
  442,
  245,
  455,
  352,
  310,
  460,
  500,
  600,
  540,
  515,
  615,
  580,
  580,
  580,
  600,
  262,
  442,
  265,
  390,
  535,
  405,
  320,
  470,
  250,
  340,
  460,
  390,
  405,
  430,
  330,
  465,
  465,
  680,
  680,
  395,
  270,
  430,
  270,
  430,
  414,
  456,
  310,
  490,
  460,
  600,
  700,
  680,
  780,
  245,
  340,
  485,
  424,
  244,
  474,
  348,
  498,
  505,
  411,
  345,
  545,
  515,
  510,
  520,
  600,
  264,
  358,
  488,
  313,
  425,
  490,
  401,
  567,
  305

#### Now we run ANOVA test on `pokemon_totals`.

*Hints:*

* To conduct ANOVA, you can use `scipy.stats.f_oneway()`. Here's the [reference](http://b.link/scipy44).

* What if `f_oneway` throws an error because it does not accept `pokemon_totals` as a list? The trick is to add a `*` in front of `pokemon_totals`, e.g. `stats.f_oneway(*pokemon_groups)`. This trick breaks the list and supplies each list item as a parameter for `f_oneway`.

In [16]:
import scipy.stats as st

In [17]:
st.f_oneway(*pokemon_totals)

F_onewayResult(statistic=6.617538296005537, pvalue=2.6457458815984803e-15)

#### Interpret the ANOVA test result. Is the difference significant?

In [None]:
# There is a significant difference between the pokemons types of the table.