#  ANOVA

In statistics, **Analysis of Variance (ANOVA)** is also used to analyze the differences among group means. The difference between t-test and ANOVA is the former is ued to compare two groups whereas the latter is used to compare three or more groups. [Read more about the difference between t-test and ANOVA](http://b.link/anova24).

From the ANOVA test, you receive two numbers. The first number is called the **F-value** which indicates whether your null-hypothesis can be rejected. The critical F-value that rejects the null-hypothesis varies according to the number of total subjects and the number of subject groups in your experiment. In [this table](http://b.link/eda14) you can find the critical values of the F distribution. **If you are confused by the massive F-distribution table, don't worry. Skip F-value for now and study it at a later time. In this challenge you only need to look at the p-value.**

The p-value is another number yielded by ANOVA which already takes the number of total subjects and the number of experiment groups into consideration. **Typically if your p-value is less than 0.05, you can declare the null-hypothesis is rejected.**

In this challenge, we want to understand whether there are significant differences among various types of pokemons' `Total` value, i.e. Grass vs Poison vs Fire vs Dragon... There are many types of pokemons which makes it a perfect use case for ANOVA. 

In [2]:
# Import libraries
import pandas as pd
import numpy as np
import scipy.stats as st

In [3]:
# Load the data:
pokemon = pd.read_csv("Pokemon_clean.csv")
pokemon

Unnamed: 0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...
795,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True
796,Mega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True


In [4]:
pokemon.Name.value_counts(dropna = False).index

Index(['Bulbasaur', 'Uxie', 'Mega Gallade', 'Probopass', 'Dusknoir',
       'Froslass', 'Rotom', 'RotomHeat Rotom', 'RotomWash Rotom',
       'RotomFrost Rotom',
       ...
       'Suicune', 'Larvitar', 'Pupitar', 'Tyranitar', 'Mega Tyranitar',
       'Lugia', 'Ho-oh', 'Celebi', 'Treecko', 'Volcanion'],
      dtype='object', length=800)

In [5]:
pokemon[pokemon["Name"] == "Rotom"]

Unnamed: 0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
531,Rotom,Electric,Ghost,440,50,50,77,95,77,91,4,False


In [6]:
pokemon[pokemon["Name"] == "RotomHeat Rotom"]

Unnamed: 0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
532,RotomHeat Rotom,Electric,Fire,520,50,65,107,105,107,86,4,False


In [7]:
pokemon[pokemon['Name'].str.len()>16]

Unnamed: 0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
422,KyogrePrimal Kyogre,Water,,770,100,150,90,180,160,90,3,True
424,GroudonPrimal Groudon,Ground,Fire,770,100,180,160,150,90,90,3,True
428,DeoxysNormal Forme,Psychic,,600,50,150,50,150,50,150,3,True
429,DeoxysAttack Forme,Psychic,,600,50,180,20,180,20,150,3,True
430,DeoxysDefense Forme,Psychic,,600,50,70,160,70,160,90,3,True
431,DeoxysSpeed Forme,Psychic,,600,50,95,90,95,90,180,3,True
458,WormadamPlant Cloak,Bug,Grass,424,60,59,85,79,105,36,4,False
459,WormadamSandy Cloak,Bug,Ground,424,60,79,105,59,85,36,4,False
460,WormadamTrash Cloak,Bug,Steel,424,60,69,95,69,95,36,4,False
544,GiratinaAltered Forme,Ghost,Dragon,680,150,100,120,100,120,90,4,True


In [8]:
pokemon[pokemon['Name'].str.len()>16].shape

(38, 12)

In [9]:
pokemon["Name"].str.len().max()

24

**To achieve our goal, we use three steps:**

1. **Extract the unique values of the pokemon types.**

1. **Select dataframes for each unique pokemon type.**

1. **Conduct ANOVA analysis across the pokemon types.**

#### First let's obtain the unique values of the pokemon types. These values should be extracted from Type 1 and Type 2 aggregated. Assign the unique values to a variable called `unique_types`.

*Hint: the correct number of unique types is 19 including `NaN`. You can disregard `NaN` in next step.*

In [10]:
# Your code here

unique_types = pd.unique(pokemon[['Type 1', 'Type 2']].values.ravel())
print(unique_types)
len(unique_types)

['Grass' 'Poison' 'Fire' nan 'Flying' 'Dragon' 'Water' 'Bug' 'Normal'
 'Electric' 'Ground' 'Fairy' 'Fighting' 'Psychic' 'Rock' 'Steel' 'Ice'
 'Ghost' 'Dark']


19

#### Second we will create a list named `pokemon_totals` to contain the `Total` values of each unique type of pokemons.

Why we use a list instead of a dictionary to store the pokemon `Total`? It's because ANOVA only tells us whether there is a significant difference of the group means but does not tell which group(s) are significantly different. Therefore, we don't need know which `Total` belongs to which pokemon type.

*Hints:*

* Loop through `unique_types` and append the selected type's `Total` to `pokemon_groups`. Be sure to loop through BOTH `Type 1` and `Type 2` to cover all occurrances of each unique type.
* Skip the `NaN` value in `unique_types`. `NaN` is a `float` variable which you can find out by using `type()`. The valid pokemon type values are all of the `str` type.
* At the end, the length of your `pokemon_totals` should be 18.

In [11]:
# Your code here

pokemon_totals = []

for type_val in unique_types:
    
    # Avoiding NaNs
    if type(type_val) == str:  
        total = pokemon[pokemon['Type 1'] == type_val]['Total'].append(pokemon[pokemon['Type 2'] == type_val]['Total'])
        pokemon_totals.append(total)
        

print(pokemon_totals)

  total = pokemon[pokemon['Type 1'] == type_val]['Total'].append(pokemon[pokemon['Type 2'] == type_val]['Total'])
  total = pokemon[pokemon['Type 1'] == type_val]['Total'].append(pokemon[pokemon['Type 2'] == type_val]['Total'])
  total = pokemon[pokemon['Type 1'] == type_val]['Total'].append(pokemon[pokemon['Type 2'] == type_val]['Total'])
  total = pokemon[pokemon['Type 1'] == type_val]['Total'].append(pokemon[pokemon['Type 2'] == type_val]['Total'])
  total = pokemon[pokemon['Type 1'] == type_val]['Total'].append(pokemon[pokemon['Type 2'] == type_val]['Total'])
  total = pokemon[pokemon['Type 1'] == type_val]['Total'].append(pokemon[pokemon['Type 2'] == type_val]['Total'])
  total = pokemon[pokemon['Type 1'] == type_val]['Total'].append(pokemon[pokemon['Type 2'] == type_val]['Total'])
  total = pokemon[pokemon['Type 1'] == type_val]['Total'].append(pokemon[pokemon['Type 2'] == type_val]['Total'])
  total = pokemon[pokemon['Type 1'] == type_val]['Total'].append(pokemon[pokemon['Type 2

[0      318
1      405
2      525
3      625
48     320
      ... 
783    335
784    494
785    494
786    494
787    494
Name: Total, Length: 95, dtype: int64, 28     288
29     438
34     275
35     365
36     505
      ... 
603    260
604    360
605    485
651    294
652    464
Name: Total, Length: 62, dtype: int64, 4      309
5      405
6      534
7      634
8      634
      ... 
669    370
670    520
697    360
698    550
706    680
Name: Total, Length: 64, dtype: int64, 702    580
703    580
790    245
791    535
6      534
      ... 
730    382
731    499
734    411
771    500
793    680
Name: Total, Length: 101, dtype: int64, 159    300
160    420
161    600
365    490
366    590
406    300
407    420
408    600
409    700
417    600
418    700
419    600
420    700
425    680
426    780
491    300
492    410
493    600
494    700
671    320
672    410
673    540
682    485
706    680
707    680
710    660
711    700
712    700
774    300
775    452
776    600
794    600
7     

In [13]:
type(pokemon_totals)

list

In [14]:
pokemon[pokemon['Type 1'] == "Grass"]["Total"]

0      318
1      405
2      525
3      625
48     320
      ... 
718    313
719    405
720    530
740    350
741    531
Name: Total, Length: 70, dtype: int64

In [15]:
pokemon[pokemon['Type 2'] == "Poison"]["Total"]

0      318
1      405
2      525
3      625
16     195
17     205
18     395
19     495
48     320
49     395
50     490
53     305
54     450
75     300
76     390
77     490
78     335
79     515
99     310
100    405
101    500
102    600
181    250
182    390
227    430
292    385
344    400
451    280
452    515
603    260
604    360
605    485
651    294
652    464
Name: Total, dtype: int64

#### Now we run ANOVA test on `pokemon_totals`.

*Hints:*

* To conduct ANOVA, you can use `scipy.stats.f_oneway()`. Here's the [reference](http://b.link/scipy44).

* What if `f_oneway` throws an error because it does not accept `pokemon_totals` as a list? The trick is to add a `*` in front of `pokemon_totals`, e.g. `stats.f_oneway(*pokemon_groups)`. This trick breaks the list and supplies each list item as a parameter for `f_oneway`.

In [16]:
# Your code here
from scipy.stats import f_oneway

# H0: The mean of total for all unique type of pokemon is same.
# H1: The mean of total for all unique type of pokemon is not same.

st.f_oneway(*pokemon_totals)

F_onewayResult(statistic=6.617538296005533, pvalue=2.6457458815984803e-15)

#### Interpret the ANOVA test result. Is the difference significant?

In [17]:
# Your comment here

# As p-val is significantly less than 0.05 we reject the null hypothesis. 