# Dataset Summer


The dataset consists of the following columns:

- Year - Year of the Olympic event.
- City - Host city of the event.
- Sport - The sport category.
- Discipline - Specific discipline within the sport.
- Athlete - Name of the athlete.
- Country - Country the athlete represents.
- Gender - Gender of the athlete.
- Event - Specific event within the discipline.
- Medal - Medal awarded (Gold, Silver, Bronze).

### Questions
- Display the first five rows of the dataset.
- Display the data type of each column.
- Show all unique values in the "Sport" column.
- Filter the data to show only the rows where the athlete is from Indonesia
- How many medals did athletes from Indonesia win?
- List all disciplines that included events where "Japan" won a Gold medal.
- For each country, list the year they first won a Gold medal.
- Find the top 5 athletes who won the most medals.
- Identify the athlete who has won medals in the most unique events.
- How many medals did athletes from each country win in "Badminton"?
- For each discipline, determine which athlete has the highest total number of medals won.

- Display the first five rows of the dataset.

In [3]:
import pandas as pd
df = pd.read_csv('summer.csv')
df.head()

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
0,1896,Athens,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100M Freestyle,Gold
1,1896,Athens,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100M Freestyle,Silver
2,1896,Athens,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100M Freestyle For Sailors,Bronze
3,1896,Athens,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100M Freestyle For Sailors,Gold
4,1896,Athens,Aquatics,Swimming,"CHASAPIS, Spiridon",GRE,Men,100M Freestyle For Sailors,Silver


- Display the data type of each column.

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31165 entries, 0 to 31164
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Year        31165 non-null  int64 
 1   City        31165 non-null  object
 2   Sport       31165 non-null  object
 3   Discipline  31165 non-null  object
 4   Athlete     31165 non-null  object
 5   Country     31161 non-null  object
 6   Gender      31165 non-null  object
 7   Event       31165 non-null  object
 8   Medal       31165 non-null  object
dtypes: int64(1), object(8)
memory usage: 2.1+ MB


- Show all unique values in the "Sport" column.

In [7]:
df['Sport'].unique()

array(['Aquatics', 'Athletics', 'Cycling', 'Fencing', 'Gymnastics',
       'Shooting', 'Tennis', 'Weightlifting', 'Wrestling', 'Archery',
       'Basque Pelota', 'Cricket', 'Croquet', 'Equestrian', 'Football',
       'Golf', 'Polo', 'Rowing', 'Rugby', 'Sailing', 'Tug of War',
       'Boxing', 'Lacrosse', 'Roque', 'Hockey', 'Jeu de paume', 'Rackets',
       'Skating', 'Water Motorsports', 'Modern Pentathlon', 'Ice Hockey',
       'Basketball', 'Canoe / Kayak', 'Handball', 'Judo', 'Volleyball',
       'Table Tennis', 'Badminton', 'Baseball', 'Softball', 'Taekwondo',
       'Triathlon', 'Canoe'], dtype=object)

- Filter the data to show only the rows where the athlete is from Indonesia

In [10]:
df[df['Country']=='INA']

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
18274,1988,Seoul,Archery,Archery,"HANDAYANI, Lilies",INA,Women,Teams Fita Round,Silver
18275,1988,Seoul,Archery,Archery,"SAIMAN, Nurfitriyana",INA,Women,Teams Fita Round,Silver
18276,1988,Seoul,Archery,Archery,"WARDHANI, Kusuma",INA,Women,Teams Fita Round,Silver
20033,1992,Barcelona,Badminton,Badminton,"GUNAWAN, Rudy",INA,Men,Doubles,Silver
20034,1992,Barcelona,Badminton,Badminton,"HARTONO, Eddy",INA,Men,Doubles,Silver
20044,1992,Barcelona,Badminton,Badminton,"SUSANTO, Hermawan",INA,Men,Singles,Bronze
20045,1992,Barcelona,Badminton,Badminton,"BUDI KUSUMA, Alan",INA,Men,Singles,Gold
20046,1992,Barcelona,Badminton,Badminton,"WIRANATA, Ardy Bernardus",INA,Men,Singles,Silver
20049,1992,Barcelona,Badminton,Badminton,"SUSANTI, Susi",INA,Women,Singles,Gold
21768,1996,Atlanta,Badminton,Badminton,"IRIANTO, Antonius",INA,Men,Doubles,Bronze


- How many medals did athletes from Indonesia win?

In [11]:
df[df['Country']=='INA']['Medal'].count()

38

- List all disciplines that included events where "Japan" won a Gold medal.

In [14]:
df[df['Country']=='JPN']['Discipline'].unique()

array(['Tennis', 'Wrestling Free.', 'Swimming', 'Athletics', 'Jumping',
       'Hockey', 'Artistic G.', 'Boxing', 'Shooting', 'Weightlifting',
       'Judo', 'Volleyball', 'Wrestling Gre-R', 'Football', 'Archery',
       'Synchronized S.', 'Cycling Track', 'Baseball', 'Sailing',
       'Softball', 'Taekwondo', 'Fencing', 'Badminton',
       'Gymnastics Artistic', 'Table Tennis', 'Wrestling Freestyle'],
      dtype=object)

- For each country, list the year they first won a Gold medal.

In [16]:
df[df['Medal']=='Gold'].groupby('Country')[['Year']].min()

Unnamed: 0_level_0,Year
Country,Unnamed: 1_level_1
ALG,1992
ANZ,1908
ARG,1924
ARM,1996
AUS,1896
...,...
UZB,2000
VEN,1968
YUG,1924
ZIM,1980


- Find the top 5 athletes who won the most medals.

In [49]:
df.groupby('Athlete')['Medal'].count().sort_values(ascending=False).reset_index().head()

Unnamed: 0,Athlete,Medal
0,"PHELPS, Michael",22
1,"LATYNINA, Larisa",18
2,"ANDRIANOV, Nikolay",15
3,"MANGIAROTTI, Edoardo",13
4,"ONO, Takashi",13


In [53]:
df[df['Discipline']=='Swimming']['Event'].unique()

array(['100M Freestyle', '100M Freestyle For Sailors', '1200M Freestyle',
       '400M Freestyle', '1500M Freestyle', '200M Backstroke',
       '200M Freestyle', '200M Obstacle Event', '200M Team Swimming',
       '4000M Freestyle', 'Underwater Swimming', '100M Backstroke',
       '400M Breaststroke', '4X50Y Freestyle Relay',
       '50Y Freestyle (45.72M)', '880Y Freestyle (804.66M)',
       '200M Breaststroke', '4X200M Freestyle Relay',
       '4X100M Freestyle Relay', '100M Butterfly', '200M Butterfly',
       '4X100M Medley Relay', '400M Individual Medley',
       '100M Breaststroke', '200M Individual Medley', '800M Freestyle',
       '50M Freestyle', 'Marathon 10KM', '200M Medley', '400M Medley',
       '4X100M Freestyle', '4X100M Medley', '4X200M Freestyle'],
      dtype=object)

- Identify the athlete who has won medals in the most unique events.

In [None]:
df.groupby('Athlete')['Event'].nunique().sort_values(ascending=False).head(3)

Athlete
PHELPS, Michael          12
OSBURN, Carl Townsend    11
VAN INNIS, Hubert         9
Name: Event, dtype: int64

- How many medals did athletes from each country win in "Badminton"?

In [59]:
df[df['Sport']=='Badminton'].groupby('Country')['Medal'].count().sort_values(ascending=False).reset_index()

Unnamed: 0,Country,Medal
0,CHN,59
1,KOR,33
2,INA,26
3,DEN,9
4,MAS,7
5,GBR,4
6,JPN,2
7,RUS,2
8,IND,1
9,NED,1


- For each discipline, determine which athlete has the highest total number of medals won.

In [61]:
top_athlete_by_discipline = df.groupby(['Discipline','Athlete'])['Medal'].count().reset_index()

In [64]:
top_athlete_by_discipline = top_athlete_by_discipline.loc[top_athlete_by_discipline.groupby('Discipline')['Medal'].idxmax()]

In [66]:
top_athlete_by_discipline.sort_values(by='Medal',ascending=False).query('Medal > 1')

Unnamed: 0,Discipline,Athlete,Medal
18897,Swimming,"PHELPS, Michael",22
792,Artistic G.,"LATYNINA, Larisa",18
8887,Fencing,"MANGIAROTTI, Edoardo",13
6348,Canoe / Kayak F,"FISCHER, Birgit",12
3190,Athletics,"NURMI, Paavo",12
17564,Shooting,"OSBURN, Carl Townsend",11
170,Archery,"VAN INNIS, Hubert",9
8140,Dressage,"VAN GRUNSVEN, Anky",9
14882,Rowing,"LIPA, Elisabeta",8
7943,Diving,"SAUTIN, Dmitry",8
