# Exercise 3

<img src="https://img.itch.zone/aW1nLzM0MzUxOTUuanBn/original/FuMxog.jpg" />

In this exercise, you will perform EDA on the popular Pokémon dataset, which contains detailed information about hundreds of Pokémon, including their stats, types, generation, and whether they are legendary.

The Pokémon dataset is ideal for practicing EDA because it includes a mix of numerical features (such as Attack, Defense, Speed, HP) and categorical features (such as Type 1, Type 2, and Generation). This combination allows you to apply a wide range of EDA techniques, including summary statistics, data visualizations, grouping and aggregation, correlation analysis, and comparisons across categories.

Throughout the exercise, you will explore questions such as:
- Which Pokémon tend to have the highest or lowest stats?
- How do different Pokémon types compare in terms of strength or defense?
- Do Legendary Pokémon differ significantly from non-Legendary ones?
- How are various stats distributed across the entire dataset?
- Are there relationships or trade-offs between certain attributes (e.g., Attack vs. Defense)?

In [1]:
import kagglehub
import os
import pandas as pd

In [2]:
# Download latest version
path = kagglehub.dataset_download("abcsds/pokemon")
print("Path to dataset files:", path)

Using Colab cache for faster access to the 'pokemon' dataset.
Path to dataset files: /kaggle/input/pokemon


In [3]:
if os.path.isdir(path):
  print(True)

contents = os.listdir(path)
contents

mydataset = path + "/" + contents[0]
mydataset


df = pd.read_csv(mydataset)

True



## 1: Data Understanding (4 pts)

1. Display the first 10 rows.

In [4]:
# put your answer here
df.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


2. Show dataset shape.

In [5]:
# put your answer here
df.shape

(800, 13)

3. Show all columns and its data types.

In [6]:
# put your answer here
df.dtypes

Unnamed: 0,0
#,int64
Name,object
Type 1,object
Type 2,object
Total,int64
HP,int64
Attack,int64
Defense,int64
Sp. Atk,int64
Sp. Def,int64


4. Identify which columns contain missing values.

In [7]:
# put your answer here
df.isnull().sum()

Unnamed: 0,0
#,0
Name,0
Type 1,0
Type 2,386
Total,0
HP,0
Attack,0
Defense,0
Sp. Atk,0
Sp. Def,0


In [8]:
df.isna().sum()

Unnamed: 0,0
#,0
Name,0
Type 1,0
Type 2,386
Total,0
HP,0
Attack,0
Defense,0
Sp. Atk,0
Sp. Def,0


## 2. Summary Statistics (4 pts)

1. Generate `df.describe()`.

In [9]:
# put your answer here
df.describe()

Unnamed: 0,#,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation
count,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0
mean,362.81375,435.1025,69.25875,79.00125,73.8425,72.82,71.9025,68.2775,3.32375
std,208.343798,119.96304,25.534669,32.457366,31.183501,32.722294,27.828916,29.060474,1.66129
min,1.0,180.0,1.0,5.0,5.0,10.0,20.0,5.0,1.0
25%,184.75,330.0,50.0,55.0,50.0,49.75,50.0,45.0,2.0
50%,364.5,450.0,65.0,75.0,70.0,65.0,70.0,65.0,3.0
75%,539.25,515.0,80.0,100.0,90.0,95.0,90.0,90.0,5.0
max,721.0,780.0,255.0,190.0,230.0,194.0,230.0,180.0,6.0


2. Get mean, median, and mode of Attack.

In [12]:
# put your answer here
print(df["Attack"].mean())
print(df["Attack"].mode())
print(df["Attack"].median())

79.00125
0    100
Name: Attack, dtype: int64
75.0


3. Compute 25th and 75th percentiles for HP.

In [14]:
# put your answer here
df["HP"].quantile(0.25)

np.float64(50.0)

In [15]:
df["HP"].quantile(0.75)

np.float64(80.0)

4. Compute standard deviation and variance of Speed.

In [16]:
# put your answer here
df["Speed"].std()

29.06047371716149

## 3. Filtering & Selection `(7 pts)`

Select all Pokémon with Attack > 100.

In [17]:
# put your answer here
df[df["Attack"] > 100]

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
12,9,BlastoiseMega Blastoise,Water,,630,79,103,120,135,115,78,1,False
19,15,BeedrillMega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False
39,34,Nidoking,Poison,Ground,505,81,102,77,85,75,85,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
793,717,Yveltal,Dark,Flying,680,126,131,95,131,98,99,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True


Select all Pokémon whose primary type (Type 1) is "Fire".

In [18]:
# put your answer here
df[df["Type 1"] == "Fire"]

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
42,37,Vulpix,Fire,,299,38,41,40,50,65,65,1,False
43,38,Ninetales,Fire,,505,73,76,75,81,100,100,1,False
63,58,Growlithe,Fire,,350,55,70,45,70,50,60,1,False
64,59,Arcanine,Fire,,555,90,110,80,100,80,95,1,False
83,77,Ponyta,Fire,,410,50,85,55,65,65,90,1,False


Select all Pokémon that are Legendary.

In [19]:
# put your answer here
df[df["Legendary"] == True]

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
156,144,Articuno,Ice,Flying,580,90,85,100,95,125,85,1,True
157,145,Zapdos,Electric,Flying,580,90,90,85,125,90,100,1,True
158,146,Moltres,Fire,Flying,580,90,100,90,125,85,90,1,True
162,150,Mewtwo,Psychic,,680,106,110,90,154,90,130,1,True
163,150,MewtwoMega Mewtwo X,Psychic,Fighting,780,106,190,100,154,100,130,1,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True


Select all Pokémon that are Generation 1 AND Legendary.

In [20]:
# put your answer here
df[(df["Legendary"] == True) & (df["Generation"] == 1)]

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
156,144,Articuno,Ice,Flying,580,90,85,100,95,125,85,1,True
157,145,Zapdos,Electric,Flying,580,90,90,85,125,90,100,1,True
158,146,Moltres,Fire,Flying,580,90,100,90,125,85,90,1,True
162,150,Mewtwo,Psychic,,680,106,110,90,154,90,130,1,True
163,150,MewtwoMega Mewtwo X,Psychic,Fighting,780,106,190,100,154,100,130,1,True
164,150,MewtwoMega Mewtwo Y,Psychic,,780,106,150,70,194,120,140,1,True


Select all Pokémon that are Water type OR Grass type.

In [33]:
# put your answer here
df[(df["Type 1"] == "Water") | (df["Type 1"] == "Grass") | (df["Type 2"] == "Water") | (df["Type 2"] == "Grass")]

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
784,711,GourgeistAverage Size,Ghost,Grass,494,65,90,122,58,75,84,6,False
785,711,GourgeistSmall Size,Ghost,Grass,494,55,85,122,58,75,99,6,False
786,711,GourgeistLarge Size,Ghost,Grass,494,75,95,122,58,75,69,6,False
787,711,GourgeistSuper Size,Ghost,Grass,494,85,100,122,58,75,54,6,False


Select all Pokémon that are Fire type AND Attack > 120.

In [35]:
# put your answer here
df[(df["Type 1"] == "Fire") & (df["Attack"] > 120) & (df["Type 2"].isnull())]

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
147,136,Flareon,Fire,,525,65,130,60,95,110,65,1,False
615,555,DarmanitanStandard Mode,Fire,,480,105,140,55,30,55,95,5,False


Select all Pokémon whose type is in this list:
`["Dragon", "Ghost", "Dark"]`.

In [32]:
# put your answer here
df[df["Type 1"].isin(["Dragon", "Ghost", "Dark"])]

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
99,92,Gastly,Ghost,Poison,310,30,35,30,100,35,80,1,False
100,93,Haunter,Ghost,Poison,405,45,50,45,115,55,95,1,False
101,94,Gengar,Ghost,Poison,500,60,65,60,130,75,110,1,False
102,94,GengarMega Gengar,Ghost,Poison,600,60,65,80,170,95,130,1,False
159,147,Dratini,Dragon,,300,41,64,45,50,50,50,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
785,711,GourgeistSmall Size,Ghost,Grass,494,55,85,122,58,75,99,6,False
786,711,GourgeistLarge Size,Ghost,Grass,494,75,95,122,58,75,69,6,False
787,711,GourgeistSuper Size,Ghost,Grass,494,85,100,122,58,75,54,6,False
793,717,Yveltal,Dark,Flying,680,126,131,95,131,98,99,6,True


## 4. Categorical Exploration `(9 pts)`

Find the number of Pokémon per primary type.

In [67]:
df["Type 1"].value_counts()

Unnamed: 0_level_0,count
Type 1,Unnamed: 1_level_1
Water,112
Normal,98
Grass,70
Bug,69
Psychic,57
Fire,52
Rock,44
Electric,44
Ground,32
Ghost,32


Find the number of Pokémon per generation.

In [102]:
# put your answer here
for i in df["Generation"].unique():
  print(i, len(df[df["Generation"] == i]))

df["Generation"].value_counts()

1 166
2 106
3 160
4 121
5 165
6 82


Unnamed: 0_level_0,count
Generation,Unnamed: 1_level_1
1,166
5,165
3,160
4,121
2,106
6,82


Which type appears the most? Which appears the least?

In [68]:
 # put your answer here
s = df["Type 1"].value_counts()

l1 = s.idxmax()
l2 = s.idxmin()

print("Label: ", l1, "Value: ", s[l1])
print("Label: ", l2, "Value: ", s[l2])

Label:  Water Value:  112
Label:  Flying Value:  4


How many unique primary types (Type 1) exist?

In [70]:
 # put your answer here
print("Unique:", *df['Type 1'].dropna().unique())
print("Counts: ", df['Type 1'].nunique())

Unique: Grass Fire Water Bug Normal Poison Electric Ground Fairy Fighting Psychic Rock Ghost Ice Dragon Dark Steel Flying
Counts:  18


How many unique secondary types (Type 2) exist?

In [101]:
 # put your answer here
print("Unique:", df['Type 2'].dropna().unique())
print("Counts: ", df['Type 2'].nunique())

Unique: ['Poison' 'Flying' 'Dragon' 'Ground' 'Fairy' 'Grass' 'Fighting' 'Psychic'
 'Steel' 'Ice' 'Rock' 'Dark' 'Water' 'Electric' 'Fire' 'Ghost' 'Bug'
 'Normal']
Counts:  18



Which primary types have the most dual-type combinations?

In [73]:
 # put your answer here
df.groupby('Type 1')['Type 2'].count().sort_values(ascending=False).head(1)

Unnamed: 0_level_0,Type 2
Type 1,Unnamed: 1_level_1
Water,53


Which type has the highest mean Attack?

In [100]:
# put your answer here
print(df.groupby('Type 1')['Attack'].mean().sort_values(ascending=False).head(1))
print()
print(df.groupby('Type 2')['Attack'].mean().sort_values(ascending=False).head(1))


Type 1
Dragon    112.125
Name: Attack, dtype: float64

Type 2
Fighting    112.846154
Name: Attack, dtype: float64


Which type has the lowest mean Defense?

In [99]:
# put your answer here
print(df.groupby('Type 1')['Defense'].mean().sort_values(ascending=True).head(1))
print()
print(df.groupby('Type 2')['Defense'].mean().sort_values(ascending=True).head(1))

Type 1
Normal    59.846939
Name: Defense, dtype: float64

Type 2
Normal    53.75
Name: Defense, dtype: float64



Which generation has the highest average Speed?

In [76]:
# put your answer here
df.groupby("Generation")["Speed"].mean().sort_values(ascending=False).head(1)

Unnamed: 0_level_0,Speed
Generation,Unnamed: 1_level_1
1,72.584337


## 5. Groupby & Aggregation `(13 pts)`

Compute the average Attack per primary type.

In [98]:
# put your answer here
print(df.groupby("Type 1")["Attack"].mean())
print()
print(df.groupby("Type 2")["Attack"].mean())


Type 1
Bug          70.971014
Dark         88.387097
Dragon      112.125000
Electric     69.090909
Fairy        61.529412
Fighting     96.777778
Fire         84.769231
Flying       78.750000
Ghost        73.781250
Grass        73.214286
Ground       95.750000
Ice          72.750000
Normal       73.469388
Poison       74.678571
Psychic      71.456140
Rock         92.863636
Steel        92.703704
Water        74.151786
Name: Attack, dtype: float64

Type 2
Bug          90.000000
Dark        109.800000
Dragon       94.444444
Electric     72.666667
Fairy        61.608696
Fighting    112.846154
Fire         81.250000
Flying       80.288660
Ghost        84.142857
Grass        74.160000
Ground       89.857143
Ice          98.000000
Normal       52.750000
Poison       67.588235
Psychic      74.696970
Rock         84.000000
Steel        92.590909
Water        70.142857
Name: Attack, dtype: float64


Compute the maximum HP per generation.


In [78]:
# put your answer here
df.groupby("Generation")["HP"].max()

Unnamed: 0_level_0,HP
Generation,Unnamed: 1_level_1
1,250
2,255
3,170
4,150
5,165
6,126


Compute the total number of Pokémon per primary type.

In [97]:
# put your answer here
print(df.groupby("Type 1")["Name"].count())
print()
print(df.groupby("Type 2")["Name"].count())


Type 1
Bug          69
Dark         31
Dragon       32
Electric     44
Fairy        17
Fighting     27
Fire         52
Flying        4
Ghost        32
Grass        70
Ground       32
Ice          24
Normal       98
Poison       28
Psychic      57
Rock         44
Steel        27
Water       112
Name: Name, dtype: int64

Type 2
Bug          3
Dark        20
Dragon      18
Electric     6
Fairy       23
Fighting    26
Fire        12
Flying      97
Ghost       14
Grass       25
Ground      35
Ice         14
Normal       4
Poison      34
Psychic     33
Rock        14
Steel       22
Water       14
Name: Name, dtype: int64


For each type, compute:

- mean Attack

- mean Defense

- mean Speed

In [96]:
# put your answer here
print(df.groupby("Type 1")[["Attack", "Defense", "Speed"]].mean())
print()

print(df.groupby("Type 2")[["Attack", "Defense", "Speed"]].mean())

              Attack     Defense       Speed
Type 1                                      
Bug        70.971014   70.724638   61.681159
Dark       88.387097   70.225806   76.161290
Dragon    112.125000   86.375000   83.031250
Electric   69.090909   66.295455   84.500000
Fairy      61.529412   65.705882   48.588235
Fighting   96.777778   65.925926   66.074074
Fire       84.769231   67.769231   74.442308
Flying     78.750000   66.250000  102.500000
Ghost      73.781250   81.187500   64.343750
Grass      73.214286   70.800000   61.928571
Ground     95.750000   84.843750   63.906250
Ice        72.750000   71.416667   63.458333
Normal     73.469388   59.846939   71.551020
Poison     74.678571   68.821429   63.571429
Psychic    71.456140   67.684211   81.491228
Rock       92.863636  100.795455   55.909091
Steel      92.703704  126.370370   55.259259
Water      74.151786   72.946429   65.964286

              Attack     Defense      Speed
Type 2                                     
Bug        

For each generation, compute:

- count

- mean Total

- number of Legendary Pokémon (hint: use sum on Boolean)

In [83]:
# put your answer here
df.groupby('Generation').agg(
    Namecount=('Name', 'count'),
    Totalmean=('Total', 'mean'),
    Legendarycount=('Legendary', 'sum')
)

Unnamed: 0_level_0,Namecount,Totalmean,Legendarycount
Generation,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,166,426.813253,6
2,106,418.283019,5
3,160,436.225,18
4,121,459.016529,13
5,165,434.987879,15
6,82,436.378049,8


Which type combination (Type 1 + Type 2) has the highest average Attack?

In [86]:
# put your answer here
df.groupby(['Type 1', 'Type 2'])['Attack'].mean().sort_values(ascending=False).head(1)

Unnamed: 0_level_0,Unnamed: 1_level_0,Attack
Type 1,Type 2,Unnamed: 2_level_1
Ground,Fire,180.0


Which generation has the highest proportion of Legendary Pokémon?

In [87]:
# put your answer here
df.groupby('Generation')['Legendary'].mean().sort_values(ascending=False).head(1)

Unnamed: 0_level_0,Legendary
Generation,Unnamed: 1_level_1
3,0.1125



Which primary type has the largest variance in HP?

In [95]:
# put your answer here
print(df.groupby('Type 1')['HP'].var().sort_values(ascending=False).head(1))
print()
print(df.groupby('Type 2')['HP'].var().sort_values(ascending=False).head(1))

Type 1
Normal    1312.861456
Name: HP, dtype: float64

Type 2
Ice    942.923077
Name: HP, dtype: float64


Which primary type has the highest median Speed?

In [89]:
# put your answer here
df.groupby('Type 1')['Speed'].median().sort_values(ascending=False).head(1)

Unnamed: 0_level_0,Speed
Type 1,Unnamed: 1_level_1
Flying,116.0


Group Pokémon by whether they are Legendary or not. Compare:

- mean Total

- mean Attack

- mean Defense

- mean Speed

In [None]:
# put your answer here
df.groupby('Legendary')[['Total', 'Attack', 'Defense', 'Speed']].mean()

Show the top 5 strongest types by mean Total.

In [90]:
# put your answer here
df.groupby('Type 1')['Total'].mean().sort_values(ascending=False).head(5)

Unnamed: 0_level_0,Total
Type 1,Unnamed: 1_level_1
Dragon,550.53125
Steel,487.703704
Flying,485.0
Psychic,475.947368
Fire,458.076923


Rank generations by their average Attack.

In [91]:
# put your answer here
df.groupby('Generation')['Attack'].mean().sort_values(ascending=False)

Unnamed: 0_level_0,Attack
Generation,Unnamed: 1_level_1
4,82.867769
5,82.066667
3,81.625
1,76.638554
6,75.804878
2,72.028302


Show the top 10 fastest Pokémon using nlargest(10, "Speed").

In [92]:
# put your answer here
df.nlargest(10, "Speed")


Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
431,386,DeoxysSpeed Forme,Psychic,,600,50,95,90,95,90,180,3,True
315,291,Ninjask,Bug,Flying,456,61,90,45,50,50,160,3,False
71,65,AlakazamMega Alakazam,Psychic,,590,55,50,65,175,95,150,1,False
154,142,AerodactylMega Aerodactyl,Rock,Flying,615,80,135,85,70,95,150,1,False
428,386,DeoxysNormal Forme,Psychic,,600,50,150,50,150,50,150,3,True
429,386,DeoxysAttack Forme,Psychic,,600,50,180,20,180,20,150,3,True
19,15,BeedrillMega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False
275,254,SceptileMega Sceptile,Grass,Dragon,630,70,110,75,145,85,145,3,False
678,617,Accelgor,Bug,,495,80,70,40,100,60,145,5,False
109,101,Electrode,Electric,,480,60,50,70,80,80,140,1,False
