# Summer Olympics

## Scenario
Just recently the Summer Olympics concluded and all the newspapers were printing the medal rankings. There seem to be different methods on how to rank the table, used in different countries. Apparently there is some room for interpretation.

## Tasks
The sporting associations of the *** countries created a task force in order to investigate if the used medal ranking reflects their achievements. They supplied us with the data they have found on the countries and their main concerns are the following:

- How did the *** countries perform and who has room for improvement and why?
- Are the official rankings "fair" or do you think that e.g. the top 5 achievers should look different?

The questions are being asked as ultimately they are looking who is the candidate within the group to go and learn from.

### How to
Use plots, tables, scoring functions, feature engineering - anything you think helps the task force. Tackle it in teams or alone in order to respond to the two questions.

Why? As often in client projects, there is not one correct answer. But there may be ways to show our clients if they did well or have room for improvements. In the end we want to talk about the strategies that were employed. What libraries did we use? How to best visualize/communicate our suggested solution?

If you have questions, just ask and hopefully you may have some fun!

---

In [180]:
import pandas as pd

df_countries = pd.read_csv('data/countries_data.tsv', sep='\t', header=0)
df_delegations = pd.read_csv('data/delegations_data.tsv', sep='\t', header=0)
df_medals = pd.read_csv('data/tokio2021_medal_counts.tsv', sep='\t', header=0)

# Data cleansing
df_medals['country'] = df_medals['country'].str.replace('The ', '')

In [181]:
# Join datasets
df = df_medals.merge(df_delegations, on='country', how='left')
# print(f'Length after first merge: {len(df)}')

df = df.merge(df_countries, on='country', how='left')
# print(f'Length after second merge: {len(df)}')

### How did ERNI perform?
Let's have a look at Switzerland, Germany, Spain, Romania, Slovakia, Philippines and Singapore

In [None]:
ernians = ['Switzerland', 'Germany', 'Spain', 'Romania', 'Slovakia', 'Philippines', 'Singapore']
df_erni = df[df.country.isin(ernians)]

# Assumption 1: all medalists are winners
df_erni['total_medals'] = df_erni[['gold', 'silver', 'bronze']].sum(axis=1)

# Assumption 2: people in wealthier countries have more spare time, therefore more time for sports activities
df_erni['GDP_K_per_capita'] = df_erni['GDP_M_USD'] / df_erni['population'] * 1000

# Assumption 3: top athletes are only once in a million, therefore larger countries benefit
df_erni['athletes_ratio'] = df_erni.athletes / df_erni.population

df_erni.head()