## [mlcourse.ai](https://mlcourse.ai) – Open Machine Learning Course 
Author: Arina Lopukhova (@erynn). Edited by [Yury Kashnitskiy](https://yorko.github.io) (@yorko) and Vadim Shestopalov (@vchulski). This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose.

The dataset has the following features:

- __ID__ - Unique number for each athlete
- __Name__ - Athlete's name
- __Sex__ - M or F
- __Age__ - Integer
- __Height__ - In centimeters
- __Weight__ - In kilograms
- __Team__ - Team name
- __NOC__ - National Olympic Committee 3-letter code
- __Games__ - Year and season
- __Year__ - Integer
- __Season__ - Summer or Winter
- __City__ - Host city
- __Sport__ - Sport
- __Event__ - Event
- __Medal__ - Gold, Silver, Bronze, or NA

In [74]:
import pandas as pd
from functools import reduce
from math import isnan, sqrt

In [3]:
# Change the path to the dataset file if needed. 
PATH = 'athlete_events.csv'

In [4]:
data = pd.read_csv(PATH)
data.head()

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal
0,1,A Dijiang,M,24.0,180.0,80.0,China,CHN,1992 Summer,1992,Summer,Barcelona,Basketball,Basketball Men's Basketball,
1,2,A Lamusi,M,23.0,170.0,60.0,China,CHN,2012 Summer,2012,Summer,London,Judo,Judo Men's Extra-Lightweight,
2,3,Gunnar Nielsen Aaby,M,24.0,,,Denmark,DEN,1920 Summer,1920,Summer,Antwerpen,Football,Football Men's Football,
3,4,Edgar Lindenau Aabye,M,34.0,,,Denmark/Sweden,DEN,1900 Summer,1900,Summer,Paris,Tug-Of-War,Tug-Of-War Men's Tug-Of-War,Gold
4,5,Christine Jacoba Aaftink,F,21.0,185.0,82.0,Netherlands,NED,1988 Winter,1988,Winter,Calgary,Speed Skating,Speed Skating Women's 500 metres,


__1. How old were the youngest male and female participants of the 1992 Olympics?__


- 16 and 15
- 14 and 13 
- 13 and 11
- 11 and 12 ✔

In [39]:
data_1992 = data[data['Year'] == 1992]
data_1992.groupby(['Sex']).min().sort_values(['Age'], ascending=[True])

Unnamed: 0_level_0,ID,Name,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
M,1,A Dijiang,11.0,140.0,45.0,Albania,AHO,1992 Summer,1992,Summer,Albertville,Alpine Skiing,Alpine Skiing Men's Combined
F,5,Aadjijatmiko Finarsih H.,12.0,136.0,30.0,Albania,AHO,1992 Summer,1992,Summer,Albertville,Alpine Skiing,Alpine Skiing Women's Combined


__2. What was the percentage of male basketball players among all the male participants of the 2012 Olympics? Round the answer to the first decimal.__

*Hint:* drop duplicate athletes where necessary to count each athlete just once. This applies to other questions too. 

- 0.2
- 1.5 
- 2.5 ✔
- 7.7

In [61]:
males = data[(data['Year'] == 2012) & (data['Sex'] == 'M')]
basketball = males[(males['Sport'] == 'Basketball')]
percentage = round (basketball['ID'].nunique() * 100 / males['ID'].nunique(), 1)
print(percentage)

2.5


__3. What are the mean and standard deviation of height for female tennis players who participated in the 2000 Olympics? Round the answer to the first decimal.__

- 171.8 and 6.5 ✔
- 179.4 and 10
- 180.7 and 6.7
- 182.4 and 9.1 

In [76]:
females = data[(data['Year'] == 2000) & (data['Sex'] == 'F') & (data['Sport'] == 'Tennis')].drop_duplicates('ID')
mean_height = tp_of_2000['Height'].mean()
print(round(mean_height,1))

values = list(tp_of_2000['Height'])
count = len(values)
values = map(lambda x: (x - mean_height) ** 2, filter(lambda x: not isnan(x), values))
deviation = sqrt(reduce(lambda x, y: x + y, values) / count)
print(round(deviation,1))

171.8
6.5


__4. Find the heaviest athlete among 2006 Olympics participants. What sport did he or she do?__


- Judo
- Bobsleigh 
- Skeleton ✔
- Boxing

In [104]:
data_2006 = data[data['Year'] == 2006].drop_duplicates('ID')
data_2006.groupby(['Sport']).max().sort_values(['Weight'], ascending=[False]).head(5)

Unnamed: 0_level_0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Event
Sport,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Skeleton,133963,Tyler Botha,M,41.0,197.0,127.0,United States,USA,2006 Winter,2006,Winter,Torino,Skeleton Women's Skeleton
Bobsleigh,135343,Zsolt Istvn Krtsi,M,44.0,198.0,116.0,United States-2,USA,2006 Winter,2006,Winter,Torino,Bobsleigh Women's Two
Ice Hockey,135115,ubomr Viovsk,M,44.0,206.0,116.0,United States,USA,2006 Winter,2006,Winter,Torino,Ice Hockey Women's Ice Hockey
Curling,134694,Yumie Hayashi-Funayama,M,50.0,189.0,105.0,United States,USA,2006 Winter,2006,Winter,Torino,Curling Women's Curling
Alpine Skiing,135492,rka Zhrobsk-Strachov,M,36.0,194.0,100.0,Uzbekistan,UZB,2006 Winter,2006,Winter,Torino,Alpine Skiing Women's Super G


__5. How many times did John Aalberg participate in the Olympics held in different years?__


- 0
- 1 
- 2 ✔
- 3 

In [107]:
print(data[data['Name']=='John Aalberg']['Year'].nunique())

2


__6. How many gold medals in tennis did the Switzerland team win at the 2008 Olympics?__


- 0
- 1 
- 2
- 3 

In [111]:
print(data[(data['Year'] == 2008) & (data['Team'] == 'Switzerland') & (data['Medal'] == 'Gold')]['Name'].nunique())

3


__7. Is it true that Spain won fewer medals than Italy at the 2016 Olympics? Do not consider NaN values in _Medal_ column.__ 


- Yes ✔
- No

In [118]:
italy = data[(data['Year'] == 2016) & (data['Team'] == 'Italy') & (data['Medal'].notnull())]['Name'].nunique()
spain = data[(data['Year'] == 2016) & (data['Team'] == 'Spain') & (data['Medal'].notnull())]['Name'].nunique()
print(spain < italy)

True


__8. What are the most and least common age groups among the participants of the 2008 Olympics?__


- [45-55] and [25-35) correspondingly ✔ ( most common - [23-35), least common - [45-55] )
- [45-55] and [15-25) correspondingly
- [35-45) and [25-35) correspondingly
- [45-55] and [35-45) correspondingly

In [125]:
data_2008 = data[data['Year'] == 2008]

print('[15-25):', data_2008[(data_2008['Age'] >= 15) & (data_2008['Age'] < 25)]['ID'].nunique())
print('[25-35):', data_2008[(data_2008['Age'] >= 25) & (data_2008['Age'] < 35)]['ID'].nunique())
print('[35-45):', data_2008[(data_2008['Age'] >= 35) & (data_2008['Age'] < 45)]['ID'].nunique())
print('[45-55]:', data_2008[(data_2008['Age'] >= 45) & (data_2008['Age'] <= 55)]['ID'].nunique())

[15-25): 4786
[25-35): 5382
[35-45): 630
[45-55]: 78


__9. Is it true that there were Summer Olympics held in Atlanta? Is it true that there were Winter Olympics held in Squaw Valley?__


- Yes, Yes ✔
- Yes, No
- No, Yes 
- No, No 

In [136]:
print (not data[(data['Season'] == 'Summer') & (data['City'] == 'Atlanta')].empty)
print (not data[(data['Season'] == 'Winter') & (data['City'] == 'Squaw Valley')].empty)

True
True


__10. What is the absolute difference between the number of unique sports at the 1986 Olympics and 2002 Olympics?__


- 3 
- 10
- 15 ✔
- 27 

In [138]:
sports_1986 = data.loc[data['Year'] == 1986, ['Sport']]['Sport'].nunique()
sports_2002 = data.loc[data['Year'] == 2002, ['Sport']]['Sport'].nunique()
print(sports_2002 - sports_1986)

15


That's it! Now go and do 30 push-ups! :)