# Challenge 1

In this challenge you will be working on **Pokemon**. You will answer a series of questions in order to practice dataframe calculation, aggregation, and transformation.

![Pokemon](../images/pokemon.jpg)

Follow the instructions below and enter your code.

#### Import all required libraries.

In [1]:
# import libraries
import pandas as pd
import numpy as np
import missingno
from IPython.display import set_matplotlib_formats
%matplotlib inline
set_matplotlib_formats('svg')

#### Import data set.

Import data set `Pokemon` from Ironhack's database. Read the data into a dataframe called `pokemon`.

*Data set attributed to [Alberto Barradas](https://www.kaggle.com/abcsds/pokemon/)*

In [95]:
# import data set from Ironhack's database
path= './Pokemon.csv'
df_p= pd.read_csv(f'{path}',low_memory=False)


#### Print first 10 rows of `pokemon`.

In [156]:
# your code here
df_p.head(10)
#df_p.info(memory_usage='deep')

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio,Combo Type
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,1.0,Grass-Poison
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,0.984127,Grass-Poison
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,0.987952,Grass-Poison
3,3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,0.813008,Grass-Poison
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,1.209302,Fire
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False,1.103448,Fire
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False,1.076923,Fire-Flying
7,6,Mega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False,1.171171,Fire-Dragon
8,6,Mega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False,1.333333,Fire-Flying
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False,0.738462,Water


When you look at a data set, you often wonder what each column means. Some open-source data sets provide descriptions of the data set. In many cases, data descriptions are extremely useful for data analysts to perform work efficiently and successfully.

For the `Pokemon.csv` data set, fortunately, the owner provided descriptions which you can see [here](https://www.kaggle.com/abcsds/pokemon/home). For your convenience, we are including the descriptions below. Read the descriptions and understand what each column means. This knowledge is helpful in your work with the data.

| Column | Description |
| --- | --- |
| # | ID for each pokemon |
| Name | Name of each pokemon |
| Type 1 | Each pokemon has a type, this determines weakness/resistance to attacks |
| Type 2 | Some pokemon are dual type and have 2 |
| Total | A general guide to how strong a pokemon is |
| HP | Hit points, or health, defines how much damage a pokemon can withstand before fainting |
| Attack | The base modifier for normal attacks (eg. Scratch, Punch) |
| Defense | The base damage resistance against normal attacks |
| SP Atk | Special attack, the base modifier for special attacks (e.g. fire blast, bubble beam) |
| SP Def | The base damage resistance against special attacks |
| Speed | Determines which pokemon attacks first each round |
| Generation | Number of generation |
| Legendary | True if Legendary Pokemon False if not |

#### Obtain the distinct values across `Type 1` and `Type 2`.

Exctract all the values in `Type 1` and `Type 2`. Then create an array containing the distinct values across both fields.

In [84]:
# your code here
df_p[['Type 1','Type 2']].nunique()

Type 1    18
Type 2    18
dtype: int64

In [85]:
df_p['Type 1'].value_counts()

Water       112
Normal       98
Grass        70
Bug          69
Psychic      57
Fire         52
Electric     44
Rock         44
Dragon       32
Ghost        32
Ground       32
Dark         31
Poison       28
Fighting     27
Steel        27
Ice          24
Fairy        17
Flying        4
Name: Type 1, dtype: int64

In [86]:
df_p['Type 2'].value_counts()

Flying      97
Ground      35
Poison      34
Psychic     33
Fighting    26
Grass       25
Fairy       23
Steel       22
Dark        20
Dragon      18
Rock        14
Ice         14
Water       14
Ghost       14
Fire        12
Electric     6
Normal       4
Bug          3
Name: Type 2, dtype: int64

In [87]:
df_p['Type 1'].unique()

array(['Grass', 'Fire', 'Water', 'Bug', 'Normal', 'Poison', 'Electric',
       'Ground', 'Fairy', 'Fighting', 'Psychic', 'Rock', 'Ghost', 'Ice',
       'Dragon', 'Dark', 'Steel', 'Flying'], dtype=object)

In [88]:
df_p['Type 2'].unique()

array(['Poison', nan, 'Flying', 'Dragon', 'Ground', 'Fairy', 'Grass',
       'Fighting', 'Psychic', 'Steel', 'Ice', 'Rock', 'Dark', 'Water',
       'Electric', 'Fire', 'Ghost', 'Bug', 'Normal'], dtype=object)

In [89]:
#unique of both arrays
type1_and_type2= []

for x in df_p['Type 1'].unique():
    type1_and_type2.append(x)
for y in df_p['Type 2'].unique():   
    type1_and_type2.append(y)
    
print(len(set(type1_and_type2)))
print(set(type1_and_type2))

    

19
{nan, 'Ground', 'Fire', 'Electric', 'Normal', 'Ghost', 'Poison', 'Fighting', 'Fairy', 'Steel', 'Dark', 'Ice', 'Psychic', 'Bug', 'Flying', 'Grass', 'Water', 'Rock', 'Dragon'}


#### Cleanup `Name` that contain "Mega".

If you have checked out the pokemon names carefully enough, you should have found there are junk texts in the pokemon names which contain "Mega". We want to clean up the pokemon names. For instance, "VenusaurMega Venusaur" should be "Mega Venusaur", and "CharizardMega Charizard X" should be "Mega Charizard X".

In [90]:
df_p.nunique()

#             721
Name          800
Type 1         18
Type 2         18
Total         200
HP             94
Attack        111
Defense       103
Sp. Atk       105
Sp. Def        92
Speed         108
Generation      6
Legendary       2
dtype: int64

In [96]:
df_p.loc[df_p['Name'].str.contains('Mega'),'Name']

3          VenusaurMega Venusaur
7      CharizardMega Charizard X
8      CharizardMega Charizard Y
12       BlastoiseMega Blastoise
19         BeedrillMega Beedrill
23           PidgeotMega Pidgeot
71         AlakazamMega Alakazam
87           SlowbroMega Slowbro
102            GengarMega Gengar
124    KangaskhanMega Kangaskhan
137            PinsirMega Pinsir
141        GyaradosMega Gyarados
154    AerodactylMega Aerodactyl
163          MewtwoMega Mewtwo X
164          MewtwoMega Mewtwo Y
168                     Meganium
196        AmpharosMega Ampharos
224          SteelixMega Steelix
229            ScizorMega Scizor
232      HeracrossMega Heracross
248        HoundoomMega Houndoom
268      TyranitarMega Tyranitar
275        SceptileMega Sceptile
279        BlazikenMega Blaziken
283        SwampertMega Swampert
306      GardevoirMega Gardevoir
327          SableyeMega Sableye
329            MawileMega Mawile
333            AggronMega Aggron
336        MedichamMega Medicham
339      M

In [97]:
# your code here
df_p['Name']=df_p['Name'].str.replace(r'\w*Mega\b', 'Mega')



# test transformed data
#pokemon.head(10)

In [98]:
# I prefer this view instead of .head(10)

df_p.loc[df_p['Name'].str.contains('Mega'),'Name']

3         Mega Venusaur
7      Mega Charizard X
8      Mega Charizard Y
12       Mega Blastoise
19        Mega Beedrill
23         Mega Pidgeot
71        Mega Alakazam
87         Mega Slowbro
102         Mega Gengar
124     Mega Kangaskhan
137         Mega Pinsir
141       Mega Gyarados
154     Mega Aerodactyl
163       Mega Mewtwo X
164       Mega Mewtwo Y
168            Meganium
196       Mega Ampharos
224        Mega Steelix
229         Mega Scizor
232      Mega Heracross
248       Mega Houndoom
268      Mega Tyranitar
275       Mega Sceptile
279       Mega Blaziken
283       Mega Swampert
306      Mega Gardevoir
327        Mega Sableye
329         Mega Mawile
333         Mega Aggron
336       Mega Medicham
339      Mega Manectric
349       Mega Sharpedo
354       Mega Camerupt
366        Mega Altaria
387        Mega Banette
393          Mega Absol
397         Mega Glalie
409      Mega Salamence
413      Mega Metagross
418         Mega Latias
420         Mega Latios
426       Mega R

#### Create a new column called `A/D Ratio` whose value equals to `Attack` devided by `Defense`.

For instance, if a pokemon has the Attack score 49 and Defense score 49, the corresponding `A/D Ratio` is 49/49=1.

In [102]:
# your code here
df_p['A/D Ratio'] =df_p['Attack'] /df_p['Defense'] 

df_p.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,1.0
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,0.984127
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,0.987952
3,3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,0.813008
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,1.209302
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False,1.103448
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False,1.076923
7,6,Mega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False,1.171171
8,6,Mega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False,1.333333
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False,0.738462


#### Identify the pokemon with the highest `A/D Ratio`.

In [111]:
# your code here
df_p[['Name','A/D Ratio']].sort_values(by='A/D Ratio', ascending=False).head(10)

Unnamed: 0,Name,A/D Ratio
429,DeoxysAttack Forme,9.0
347,Carvanha,4.5
19,Mega Beedrill,3.75
453,Cranidos,3.125
348,Sharpedo,3.0
428,DeoxysNormal Forme,3.0
750,AegislashBlade Forme,3.0
454,Rampardos,2.75
798,HoopaHoopa Unbound,2.666667
186,Pichu,2.666667


#### Identify the pokemon with the lowest A/D Ratio.

In [112]:
# your code here
df_p[['Name','A/D Ratio']].sort_values(by='A/D Ratio', ascending=True).head(10)

Unnamed: 0,Name,A/D Ratio
230,Shuckle,0.043478
139,Magikarp,0.181818
484,Bronzor,0.27907
103,Onix,0.28125
616,DarmanitanZen Mode,0.285714
189,Togepi,0.307692
456,Bastiodon,0.309524
323,Nosepass,0.333333
751,AegislashShield Forme,0.333333
773,Carbink,0.333333


#### Create a new column called `Combo Type` whose value combines `Type 1` and `Type 2`.

Rules:

* If both `Type 1` and `Type 2` have valid values, the `Combo Type` value should contain both values in the form of `<Type 1> <Type 2>`. For example, if `Type 1` value is `Grass` and `Type 2` value is `Poison`, `Combo Type` will be `Grass-Poison`.

* If `Type 1` has valid value but `Type 2` is not, `Combo Type` will be the same as `Type 1`. For example, if `Type 1` is `Fire` whereas `Type 2` is `NaN`, `Combo Type` will be `Fire`.

In [122]:
# your code here
df_p['Combo Type'] = np.where(df_p['Type 2'].isnull()==True,df_p['Type 1'],df_p['Type 1']+'-'+df_p['Type 2'])

df_p.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio,Combo Type
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,1.0,Grass-Poison
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,0.984127,Grass-Poison
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,0.987952,Grass-Poison
3,3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,0.813008,Grass-Poison
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,1.209302,Fire
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False,1.103448,Fire
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False,1.076923,Fire-Flying
7,6,Mega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False,1.171171,Fire-Dragon
8,6,Mega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False,1.333333,Fire-Flying
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False,0.738462,Water


#### Identify the pokemon whose `A/D Ratio` are among the top 5.

In [126]:
# your code here
df_p[['Name','A/D Ratio']].sort_values(by='A/D Ratio', ascending=False).iloc[:5]

Unnamed: 0,Name,A/D Ratio
429,DeoxysAttack Forme,9.0
347,Carvanha,4.5
19,Mega Beedrill,3.75
453,Cranidos,3.125
348,Sharpedo,3.0


In [127]:
df_p.sort_values(by='A/D Ratio', ascending=False).iloc[:5]

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio,Combo Type
429,386,DeoxysAttack Forme,Psychic,,600,50,180,20,180,20,150,3,True,9.0,Psychic
347,318,Carvanha,Water,Dark,305,45,90,20,65,20,65,3,False,4.5,Water-Dark
19,15,Mega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False,3.75,Bug-Poison
453,408,Cranidos,Rock,,350,67,125,40,30,30,58,4,False,3.125,Rock
348,319,Sharpedo,Water,Dark,460,70,120,40,95,40,95,3,False,3.0,Water-Dark


#### For the 5 pokemon printed above, aggregate `Combo Type` and use a list to store the unique values.

Your end product is a list containing the distinct `Combo Type` values of the 5 pokemon with the highest `A/D Ratio`.

In [154]:
# your code here
pos_Combo_Type=df_p.columns.get_loc('Combo Type')

print(pos_Combo_Type)       
Combo_Type_list=df_p.sort_values(by='A/D Ratio', ascending=False).iloc[:5,pos_Combo_Type].unique().tolist()
Combo_Type_list

14


['Psychic', 'Water-Dark', 'Bug-Poison', 'Rock']

#### For each of the `Combo Type` values obtained from the previous question, calculate the mean scores of all numeric fields across all pokemon.

Your output should look like below:

![Aggregate](../images/aggregated-mean.png)

In [187]:
# your code here
#version 01
df_p.loc[df_p['Combo Type'].apply(lambda x: x in Combo_Type_list)]
#version 02
df_p_combo_type=df_p.loc[df_p['Combo Type'].isin(Combo_Type_list)]
       

In [271]:
print(df_p_combo_type.dtypes)
numerics = [x for x in df_p_combo_type.columns \
            if (df_p_combo_type[x].dtype == np.float64 or df_p_combo_type[x].dtype == np.int64)]
 
numerics   

#               int64
Name           object
Type 1         object
Type 2         object
Total           int64
HP              int64
Attack          int64
Defense         int64
Sp. Atk         int64
Sp. Def         int64
Speed           int64
Generation      int64
Legendary        bool
A/D Ratio     float64
Combo Type     object
dtype: object


['#',
 'Total',
 'HP',
 'Attack',
 'Defense',
 'Sp. Atk',
 'Sp. Def',
 'Speed',
 'Generation',
 'A/D Ratio']

In [276]:
df_p_combo_type.groupby('Combo Type')[numerics].agg(['mean'])

Unnamed: 0_level_0,#,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,A/D Ratio
Unnamed: 0_level_1,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean
Combo Type,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Bug-Poison,199.166667,347.916667,53.75,68.333333,58.083333,42.5,59.333333,65.916667,2.333333,1.315989
Psychic,381.973684,464.552632,72.552632,64.947368,67.236842,98.552632,82.394737,78.868421,3.342105,1.164196
Rock,410.111111,409.444444,67.111111,103.333333,107.222222,40.555556,58.333333,32.888889,3.888889,1.260091
Water-Dark,347.666667,493.833333,69.166667,120.0,65.166667,88.833333,63.5,87.166667,3.166667,2.291949


In [278]:
df_p_combo_type.groupby('Combo Type')[numerics].mean()

Unnamed: 0_level_0,#,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,A/D Ratio
Combo Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Bug-Poison,199.166667,347.916667,53.75,68.333333,58.083333,42.5,59.333333,65.916667,2.333333,1.315989
Psychic,381.973684,464.552632,72.552632,64.947368,67.236842,98.552632,82.394737,78.868421,3.342105,1.164196
Rock,410.111111,409.444444,67.111111,103.333333,107.222222,40.555556,58.333333,32.888889,3.888889,1.260091
Water-Dark,347.666667,493.833333,69.166667,120.0,65.166667,88.833333,63.5,87.166667,3.166667,2.291949
