## [mlcourse.ai](https://mlcourse.ai) – Open Machine Learning Course 
Author: Arina Lopukhova (@erynn). Edited by [Yury Kashnitskiy](https://yorko.github.io) (@yorko) and Vadim Shestopalov (@vchulski). This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose.

The dataset has the following features:

- __ID__ - Unique number for each athlete
- __Name__ - Athlete's name
- __Sex__ - M or F
- __Age__ - Integer
- __Height__ - In centimeters
- __Weight__ - In kilograms
- __Team__ - Team name
- __NOC__ - National Olympic Committee 3-letter code
- __Games__ - Year and season
- __Year__ - Integer
- __Season__ - Summer or Winter
- __City__ - Host city
- __Sport__ - Sport
- __Event__ - Event
- __Medal__ - Gold, Silver, Bronze, or NA

In [1]:
import pandas as pd

In [2]:
# Change the path to the dataset file if needed. 
PATH = 'athlete_events.csv'

In [3]:
data = pd.read_csv(PATH)
data.head(5)

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal
0,1,A Dijiang,M,24.0,180.0,80.0,China,CHN,1992 Summer,1992,Summer,Barcelona,Basketball,Basketball Men's Basketball,
1,2,A Lamusi,M,23.0,170.0,60.0,China,CHN,2012 Summer,2012,Summer,London,Judo,Judo Men's Extra-Lightweight,
2,3,Gunnar Nielsen Aaby,M,24.0,,,Denmark,DEN,1920 Summer,1920,Summer,Antwerpen,Football,Football Men's Football,
3,4,Edgar Lindenau Aabye,M,34.0,,,Denmark/Sweden,DEN,1900 Summer,1900,Summer,Paris,Tug-Of-War,Tug-Of-War Men's Tug-Of-War,Gold
4,5,Christine Jacoba Aaftink,F,21.0,185.0,82.0,Netherlands,NED,1988 Winter,1988,Winter,Calgary,Speed Skating,Speed Skating Women's 500 metres,


__1. How old were the youngest male and female participants of the 1992 Olympics?__


- 16 and 15
- 14 and 13 
- 13 and 11
- 11 and 12 ✅

In [14]:
data_of_1992 = data[data['Year'] == 1992]
data_of_1992.groupby(['Sex']).min()['Age']

Sex
F    12.0
M    11.0
Name: Age, dtype: float64

__2. What was the percentage of male basketball players among all the male participants of the 2012 Olympics? Round the answer to the first decimal.__

*Hint:* drop duplicate athletes where necessary to count each athlete just once. This applies to other questions too. 

- 0.2
- 1.5 
- 2.5 ✅
- 7.7

In [14]:
mp_of_2012 = data[(data['Year'] == 2012) & (data['Sex'] == 'M')].drop_duplicates('ID')
bp_of_2012 = mp_of_2012[(mp_of_2012['Sport'] == 'Basketball')].drop_duplicates('ID')

mp_count = len(mp_of_2012['ID'])
bp_count = len(bp_of_2012['ID'])
round(bp_count / mp_count * 100, 1)

2.5

__3. What are the mean and standard deviation of height for female tennis players who participated in the 2000 Olympics? Round the answer to the first decimal.__

- 171.8 and 6.5 ✅
- 179.4 and 10
- 180.7 and 6.7
- 182.4 and 9.1 

In [48]:
from math import isnan
from functools import reduce

def calculate_deviation(values, mean):
    length = len(values)
    values = map(lambda x: (x - mean) ** 2, filter(lambda x: not isnan(x), values))
    return (reduce(lambda x, y: x + y, values) / length) ** (1/2)
        

tp_of_2000 = data[(data['Year'] == 2000) & (data['Sex'] == 'F') & (data['Sport'] == 'Tennis')].drop_duplicates('ID')
mean = tp_of_2000['Height'].mean()
deviation= calculate_deviation(list(tp_of_2000['Height']), mean)

print(round(mean, 1))
print(round(deviation, 1))

171.8
6.5


__4. Find the heaviest athlete among 2006 Olympics participants. What sport did he or she do?__


- Judo ✅
- Bobsleigh 
- Skeleton
- Boxing

In [51]:
athletes_of_2006 = data[data['Year'] == 2006].drop_duplicates('ID')
data[data['Weight']==data.max()['Weight']]['Sport']

23155    Judo
23156    Judo
Name: Sport, dtype: object

__5. How many times did John Aalberg participate in the Olympics held in different years?__


- 0
- 1 
- 2 ✅
- 3 

In [53]:
len(data[data['Name']=='John Aalberg'].drop_duplicates('Year'))

2

__6. How many gold medals in tennis did the Switzerland team win at the 2008 Olympics?__


- 0
- 1 
- 2
- 3 ✅

In [54]:
len(data[(data['Year'] == 2008) & (data['Team'] == 'Switzerland') & (data['Medal'] == 'Gold')])

3

__7. Is it true that Spain won fewer medals than Italy at the 2016 Olympics? Do not consider NaN values in _Medal_ column.__ 


- Yes ✅
- No

In [57]:
italy_length = len(data[(data['Year'] == 2016) & (data['Team'] == 'Italy') & (data['Medal'].notnull())])
spain_length = len(data[(data['Year'] == 2016) & (data['Team'] == 'Spain') & (data['Medal'].notnull())])

print(spain_length < italy_length)

True


__8. What are the most and least common age groups among the participants of the 2008 Olympics?__


- [45-55] and [25-35) correspondingly ✅
- [45-55] and [15-25) correspondingly
- [35-45) and [25-35) correspondingly
- [45-55] and [35-45) correspondingly

In [58]:
def count_of_age_group(age_from, age_to, year):
    return len(data[(data['Age'] >= age_from) & (data['Age'] < age_to) & (data['Year']==year)].drop_duplicates('ID'))
print('[15-25):', count_of_age_group(15, 25, 2008))
print('[25-35):', count_of_age_group(25, 35, 2008))
print('[35-45):', count_of_age_group(35, 45, 2008))
print('[45-55]:', count_of_age_group(45, 56, 2008))

[15-25): 4786
[25-35): 5382
[35-45): 630
[45-55]: 78


__9. Is it true that there were Summer Olympics held in Atlanta? Is it true that there were Winter Olympics held in Squaw Valley?__


- Yes, Yes ✅
- Yes, No
- No, Yes 
- No, No 

In [63]:
print (not data[(data['Season'] == 'Summer') & (data['City'] == 'Atlanta')].empty)
print (not data[(data['Season'] == 'Winter') & (data['City'] == 'Squaw Valley')].empty)

True
True


__10. What is the absolute difference between the number of unique sports at the 1986 Olympics and 2002 Olympics?__


- 3 
- 10
- 15 ✅
- 27 

In [64]:
sp_in_2002 = data.loc[data['Year'] == 2002, ['Sport']].drop_duplicates().count()['Sport']
sp_in_1986 = data.loc[data['Year'] == 1986, ['Sport']].drop_duplicates().count()['Sport']
print(sp_in_2002 - sp_in_1986)

15


That's it! Now go and do 30 push-ups! :)

Done