# Bonus Challenge 2 - ANOVA

In statistics, **Analysis of Variance (ANOVA)** is also used to analyze the differences among group means. The difference between t-test and ANOVA is the former is ued to compare two groups whereas the latter is used to compare three or more groups. [Read more about the difference between t-test and ANOVA](http://b.link/anova24).

From the ANOVA test, you receive two numbers. The first number is called the **F-value** which indicates whether your null-hypothesis can be rejected. The critical F-value that rejects the null-hypothesis varies according to the number of total subjects and the number of subject groups in your experiment. In [this table](http://b.link/eda14) you can find the critical values of the F distribution. **If you are confused by the massive F-distribution table, don't worry. Skip F-value for now and study it at a later time. In this challenge you only need to look at the p-value.**

The p-value is another number yielded by ANOVA which already takes the number of total subjects and the number of experiment groups into consideration. **Typically if your p-value is less than 0.05, you can declare the null-hypothesis is rejected.**

In this challenge, we want to understand whether there are significant differences among various types of pokemons' `Total` value, i.e. Grass vs Poison vs Fire vs Dragon... There are many types of pokemons which makes it a perfect use case for ANOVA. Use Ironhack's database to load the pokemon data (db: pokemon, table: pokemon_stats). 

In [30]:
# import numpy and pandas
import numpy as np
import pandas as pd
from scipy.stats import ttest_ind
from scipy.stats import chi2_contingency
from scipy.stats import f_oneway

In [31]:
# Your code here:

pokemon = pd.read_csv('pokemon.csv')


display (pokemon.head())

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


**To achieve our goal, we use three steps:**

1. **Extract the unique values of the pokemon types.**

1. **Select dataframes for each unique pokemon type.**

1. **Conduct ANOVA analysis across the pokemon types.**

#### First let's obtain the unique values of the pokemon types. These values should be extracted from Type 1 and Type 2 aggregated. Assign the unique values to a variable called `unique_types`.

*Hint: the correct number of unique types is 19 including `NaN`. You can disregard `NaN` in next step.*

In [32]:
#1  Extract the unique values of the Pokémon types.

# Combine 'Type 1' and 'Type 2' columns and drop NaN values
combined_types = pd.concat([pokemon['Type 1'], pokemon['Type 2']]).dropna()

# Get unique types
unique_types = combined_types.unique()

#Select dataframes for each unique Pokémon type & Conduct ANOVA analysis across the Pokémon types.
# Create a dictionary to store 'Total' stats for each type
type_totals = {}

# Iterate through each unique type and store its corresponding 'Total' stats
for pokemon_type in unique_types:
    # Filter the dataframe to include only rows where the Pokémon is of the current type
    type_data = pokemon[(pokemon['Type 1'] == pokemon_type) | (pokemon['Type 2'] == pokemon_type)]
    type_totals[pokemon_type] = type_data['Total']

# Perform ANOVA analysis for the 'Total' column across different Pokémon types
anova_results = f_oneway(*type_totals.values())

# Print the ANOVA results
display("ANOVA Results:")
display("F-statistic:", anova_results.statistic)
display("p-value:", anova_results.pvalue)


'ANOVA Results:'

'F-statistic:'

6.6175382960055344

'p-value:'

2.6457458815984803e-15

#### Second we will create a list named `pokemon_totals` to contain the `Total` values of each unique type of pokemons.

Why we use a list instead of a dictionary to store the pokemon `Total`? It's because ANOVA only tells us whether there is a significant difference of the group means but does not tell which group(s) are significantly different. Therefore, we don't need know which `Total` belongs to which pokemon type.

*Hints:*

* Loop through `unique_types` and append the selected type's `Total` to `pokemon_groups`.
* Skip the `NaN` value in `unique_types`. `NaN` is a `float` variable which you can find out by using `type()`. The valid pokemon type values are all of the `str` type.
* At the end, the length of your `pokemon_totals` should be 18.

In [55]:

# Combine 'Type 1' and 'Type 2' columns and drop NaN values
combined_types = pd.concat([pokemon['Type 1'], pokemon['Type 2']]).dropna()

# Get unique types (excluding NaN)
unique_types = combined_types.unique()
valid_types = [type for type in unique_types if isinstance(type, str)]

# Create a list of lists to store 'Total' stats for each type
pokemon_totals = []

# Iterate through each unique type and store its corresponding 'Total' stats as a list
for pokemon_type in valid_types:
    # Filter the dataframe to include only rows where the Pokémon is of the current type
    type_data = pokemon[(pokemon['Type 1'] == pokemon_type) | (pokemon['Type 2'] == pokemon_type)]
    # Append the 'Total' values of the current type as a list
    pokemon_totals.append(type_data['Total'].tolist())

# Print the 'Total' values for each type
print("Total values for each type:")
for pokemon_type, totals in zip(valid_types, pokemon_totals):
    print(pokemon_type, totals)

# Perform ANOVA analysis on 'pokemon_totals'
anova_results = f_oneway(*pokemon_totals)

# Print the ANOVA results
display("ANOVA Results:")
display("F-statistic:", anova_results.statistic)
display("p-value:", anova_results.pvalue)


Total values for each type:
Grass [318, 405, 525, 625, 320, 395, 490, 285, 405, 300, 390, 490, 325, 520, 435, 318, 405, 525, 490, 250, 340, 460, 180, 425, 600, 310, 405, 530, 630, 220, 340, 480, 220, 340, 480, 295, 460, 400, 335, 475, 355, 495, 460, 318, 405, 525, 280, 515, 424, 275, 450, 454, 334, 494, 594, 535, 525, 520, 600, 600, 308, 413, 528, 316, 498, 310, 380, 500, 280, 480, 280, 480, 461, 335, 475, 294, 464, 305, 489, 580, 313, 405, 530, 350, 531, 309, 474, 335, 335, 335, 335, 494, 494, 494, 494]
Fire [309, 405, 534, 634, 634, 299, 505, 350, 555, 410, 500, 495, 525, 580, 309, 405, 534, 250, 410, 330, 500, 600, 365, 580, 680, 310, 405, 530, 630, 305, 460, 560, 470, 770, 309, 405, 534, 540, 520, 600, 600, 308, 418, 528, 316, 498, 315, 480, 540, 275, 370, 520, 484, 360, 550, 680, 307, 409, 534, 382, 499, 369, 507, 600]
Water [314, 405, 530, 630, 320, 500, 300, 385, 510, 335, 515, 315, 490, 590, 325, 475, 305, 525, 325, 475, 295, 440, 320, 450, 340, 520, 200, 540, 640, 535, 525, 35

'ANOVA Results:'

'F-statistic:'

6.6175382960055344

'p-value:'

2.6457458815984803e-15

#### Now we run ANOVA test on `pokemon_totals`.

*Hints:*

* To conduct ANOVA, you can use `scipy.stats.f_oneway()`. Here's the [reference](http://b.link/scipy44).

* What if `f_oneway` throws an error because it does not accept `pokemon_totals` as a list? The trick is to add a `*` in front of `pokemon_totals`, e.g. `stats.f_oneway(*pokemon_groups)`. This trick breaks the list and supplies each list item as a parameter for `f_oneway`.

In [None]:

# Combine 'Type 1' and 'Type 2' columns and drop NaN values
combined_types = pd.concat([pokemon['Type 1'], pokemon['Type 2']]).dropna()

# Get unique types (excluding NaN)
unique_types = combined_types.unique()
valid_types = [type for type in unique_types if isinstance(type, str)]

# Create a dictionary to store 'Total' stats for each type
type_totals = {}

# Iterate through each unique type and store its corresponding 'Total' stats
for pokemon_type in valid_types:
    # Filter the dataframe to include only rows where the Pokémon is of the current type
    type_data = pokemon[(pokemon['Type 1'] == pokemon_type) | (pokemon['Type 2'] == pokemon_type)]
    # Store the 'Total' values of the current type in the dictionary
    type_totals[pokemon_type] = type_data['Total']

# Convert the 'type_totals' dictionary into a list of lists
pokemon_totals = list(type_totals.values())

# Perform ANOVA analysis on 'pokemon_totals'
anova_results = f_oneway(*pokemon_totals)

# Print the ANOVA results
print("ANOVA Results:")
print("F-statistic:", anova_results.statistic)
print("p-value:", anova_results.pvalue)



ANOVA Results:
F-statistic: 6.6175382960055344
p-value: 2.6457458815984803e-15


#### Interpret the ANOVA test result. Is the difference significant?

In [None]:
#  The ANOVA test indicates that there are significant differences in the 'Total' stats among the different Pokémon types.
#  But ANOVA itself does not tell us which specific groups have significantly different means.