# Bonus Challenge 2 - ANOVA

In statistics, **Analysis of Variance (ANOVA)** is also used to analyze the differences among group means. The difference between t-test and ANOVA is the former is ued to compare two groups whereas the latter is used to compare three or more groups. [Read more about the difference between t-test and ANOVA](http://b.link/anova24).

From the ANOVA test, you receive two numbers. The first number is called the **F-value** which indicates whether your null-hypothesis can be rejected. The critical F-value that rejects the null-hypothesis varies according to the number of total subjects and the number of subject groups in your experiment. In [this table](http://b.link/eda14) you can find the critical values of the F distribution. **If you are confused by the massive F-distribution table, don't worry. Skip F-value for now and study it at a later time. In this challenge you only need to look at the p-value.**

The p-value is another number yielded by ANOVA which already takes the number of total subjects and the number of experiment groups into consideration. **Typically if your p-value is less than 0.05, you can declare the null-hypothesis is rejected.**

In this challenge, we want to understand whether there are significant differences among various types of pokemons' `Total` value, i.e. Grass vs Poison vs Fire vs Dragon... There are many types of pokemons which makes it a perfect use case for ANOVA. Use Ironhack's database to load the pokemon data (db: pokemon, table: pokemon_stats). 

In [9]:
# Import libraries
import pandas as pd
import numpy as np
from scipy.stats import f_oneway

In [2]:
# Load the data:
pokemon = pd.read_csv('Pokemon.csv')

**To achieve our goal, we use three steps:**

1. **Extract the unique values of the pokemon types.**

1. **Select dataframes for each unique pokemon type.**

1. **Conduct ANOVA analysis across the pokemon types.**

#### First let's obtain the unique values of the pokemon types. These values should be extracted from Type 1 and Type 2 aggregated. Assign the unique values to a variable called `unique_types`.

*Hint: the correct number of unique types is 19 including `NaN`. You can disregard `NaN` in next step.*

### Answer

We arrive to the same conclusion by our own

In [3]:
lista=list(pokemon['Type 2'].unique())
for i in np.sort(pokemon['Type 1'].unique()):
    lista.append(i)

unique_types=dict()
for i in lista:
    a=unique_types.get(i,0)
    b=a+1
    unique_types[i]=b
    
print(unique_types)
print(len(unique_types))
unique_types.keys()

{'Poison': 2, nan: 1, 'Flying': 2, 'Dragon': 2, 'Ground': 2, 'Fairy': 2, 'Grass': 2, 'Fighting': 2, 'Psychic': 2, 'Steel': 2, 'Ice': 2, 'Rock': 2, 'Dark': 2, 'Water': 2, 'Electric': 2, 'Fire': 2, 'Ghost': 2, 'Bug': 2, 'Normal': 2}
19


dict_keys(['Poison', nan, 'Flying', 'Dragon', 'Ground', 'Fairy', 'Grass', 'Fighting', 'Psychic', 'Steel', 'Ice', 'Rock', 'Dark', 'Water', 'Electric', 'Fire', 'Ghost', 'Bug', 'Normal'])

However, I would standardize unique_types in this way

In [4]:
pokemon['unique']=[i+'-'+x if x!="nan" else i for i,x in zip(pokemon["Type 1"].astype(str),pokemon["Type 2"].astype(str))]

In [5]:
len(pokemon['unique'].unique())

154

#### Second we will create a list named `pokemon_totals` to contain the `Total` values of each unique type of pokemons.

Why we use a list instead of a dictionary to store the pokemon `Total`? It's because ANOVA only tells us whether there is a significant difference of the group means but does not tell which group(s) are significantly different. Therefore, we don't need know which `Total` belongs to which pokemon type.

*Hints:*

* Loop through `unique_types` and append the selected type's `Total` to `pokemon_groups`.
* Skip the `NaN` value in `unique_types`. `NaN` is a `float` variable which you can find out by using `type()`. The valid pokemon type values are all of the `str` type.
* At the end, the length of your `pokemon_totals` should be 18.

In [6]:
unique_types=list(unique_types.keys())

In [7]:
pokemon_groups = []
for i in unique_types:
    if isinstance(i,str):
        pokemon_groups.append(pokemon[pokemon['Type 1']==i]['Total'])

# Your code here

len(pokemon_groups) # you should see 18

18

#### Now we run ANOVA test on `pokemon_totals`.

*Hints:*

* To conduct ANOVA, you can use `scipy.stats.f_oneway()`. Here's the [reference](http://b.link/scipy44).

* What if `f_oneway` throws an error because it does not accept `pokemon_totals` as a list? The trick is to add a `*` in front of `pokemon_totals`, e.g. `stats.f_oneway(*pokemon_groups)`. This trick breaks the list and supplies each list item as a parameter for `f_oneway`.

In [13]:
# Your code here
F,p=f_oneway(*pokemon_groups)
p

2.077215448842098e-09

#### Interpret the ANOVA test result. Is the difference significant?

### Answer

As the p-value is <0.05, we could reject the null hypothesis. Therefore, we assume that the mean of the different samples is essentially different