<h1><center><font size="6">Plotly tutorial - 120 years of Olympic games</font></center></h1>


# <a id='0'>Content</a>

Two types of Content lists are used: for the analysis of the dataset, we use the `Analysis` content list. For the Plotly features, we use the `Plotly chart types and functions` content list.


## Analysis

- <a href='#1'>Introduction</a>  
- <a href='#2'>The data</a>     
- <a href='#3'>Games and venues</a>
- <a href='#4'>Sports</a>
- <a href='#5'>Countries</a>  
- <a href='#6'>Athlets</a>
- <a href='#7'>Medals</a>   
- <a href='#8'>References</a>   
- <a href='#9'>Known issues</a>   

## Plotly chart types and functions

- <a href='#101'>Scatter</a>  
- <a href='#1011'>append_trace</a>  
- <a href='#102'>Bar</a>  
- <a href='#103'>create_table</a> 
- <a href='#104'>Box</a>  
- <a href='#105'>Heatmap</a>  
- <a href='#106'>Pie</a>  
- <a href='#107'>Choropleth</a>  
- <a href='#108'>create_distplot</a>  
- <a href='#109'>Slider (animation)</a>  


# <a id="1">Introduction</a>  

## Kernel objective 

This Kernel objective is to provide an introduction in use of **Plotly** with Python for visualizations.

## Data used

The data used to illustrate **Plotly** features encompasses 120 years of history of Olympic Games. We will explore this data with the purpose to illustrate primarly the plotly features.



# <a id="2">The data</a>

## Load packages


Besides **pandas** and **numpy** we load matplotlib and from **plotly.offline** we load **init_notebook_mode** and **iplot** (so that we can create powerfull graphics with **plotly**).

In [1]:
import pandas as pd 
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline 
#from bubbly.bubbly import bubbleplot 
#from __future__ import division
import plotly.graph_objs as go
import plotly.figure_factory as ff
from plotly import tools
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

IS_LOCAL = False
import os
if(IS_LOCAL):
    PATH="../input/120-years-of-olympic-history-athlets-and-results"
else:
    PATH="../input"
print(os.listdir(PATH))

['noc_regions.csv', 'athlete_events.csv']


## Read the data


There are two data files.

In [2]:
athlete_events_df = pd.read_csv(PATH+"/athlete_events.csv")
noc_regions_df = pd.read_csv(PATH+"/noc_regions.csv")

## Check the data


First, we check the data files shape.

In [3]:
print("Athletes and Events data -  rows:",athlete_events_df.shape[0]," columns:", athlete_events_df.shape[1])
print("NOC Regions data -  rows:",noc_regions_df.shape[0]," columns:", noc_regions_df.shape[1])

Athletes and Events data -  rows: 271116  columns: 15
NOC Regions data -  rows: 230  columns: 3


Let's inspect the two datasets. 

In [4]:
athlete_events_df.head(5)

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal
0,1,A Dijiang,M,24.0,180.0,80.0,China,CHN,1992 Summer,1992,Summer,Barcelona,Basketball,Basketball Men's Basketball,
1,2,A Lamusi,M,23.0,170.0,60.0,China,CHN,2012 Summer,2012,Summer,London,Judo,Judo Men's Extra-Lightweight,
2,3,Gunnar Nielsen Aaby,M,24.0,,,Denmark,DEN,1920 Summer,1920,Summer,Antwerpen,Football,Football Men's Football,
3,4,Edgar Lindenau Aabye,M,34.0,,,Denmark/Sweden,DEN,1900 Summer,1900,Summer,Paris,Tug-Of-War,Tug-Of-War Men's Tug-Of-War,Gold
4,5,Christine Jacoba Aaftink,F,21.0,185.0,82.0,Netherlands,NED,1988 Winter,1988,Winter,Calgary,Speed Skating,Speed Skating Women's 500 metres,


In [5]:
noc_regions_df.head(5)

Unnamed: 0,NOC,region,notes
0,AFG,Afghanistan,
1,AHO,Curacao,Netherlands Antilles
2,ALB,Albania,
3,ALG,Algeria,
4,AND,Andorra,


Let's also check if there is missing data.

In [6]:
def missing_data(data):
    total = data.isnull().sum().sort_values(ascending = False)
    percent = (data.isnull().sum()/data.isnull().count()*100).sort_values(ascending = False)
    return pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
missing_data(athlete_events_df)

Unnamed: 0,Total,Percent
Medal,231333,85.326207
Weight,62875,23.19118
Height,60171,22.193821
Age,9474,3.494445
Event,0,0.0
Sport,0,0.0
City,0,0.0
Season,0,0.0
Year,0,0.0
Games,0,0.0


Only a part of the athlets have medals, which is something we expected. In the same time, there are missing informations about the body measurements of athlets (Weight and Height) and their age (3.5%).

In [7]:
missing_data(noc_regions_df)

Unnamed: 0,Total,Percent
notes,209,90.869565
region,3,1.304348
NOC,0,0.0


A majority of notes are missing (90%) but also some of the region names are missing. We will check that later on.


<a href="#0"><font size="1" color="red">Go to top</font></a>

# <a id="3">Venues and events</a>  

Let's check in what years we had the Olympic events and what were the venues. 

First, let's check the years of the venues and the season. We have both Summmer and Winter Olympics. We group by `Year` and select `Season` and will obtain the number of athlets per event. We use a Scatter plot for this.

In [8]:
tmp = athlete_events_df.groupby(['Year', 'City'])['Season'].value_counts()
df = pd.DataFrame(data={'Athlets': tmp.values}, index=tmp.index).reset_index()

Let's check the structure of `df` DataFrame obtained. 

In [9]:
df.head(3)

Unnamed: 0,Year,City,Season,Athlets
0,1896,Athina,Summer,380
1,1900,Paris,Summer,1936
2,1904,St. Louis,Summer,1301



<a href="#0"><font size="1" color="red">Go to top</font></a>


## <a id="101">Scatter</a>

We prepare the Scatter plot using `Scatter`. 

We specify the following attributes for the `trace`:  

* x - the points coordinates on x axis;  
* y - the points coordinates on y axis;  
* name - the name associated with the sequence (x,y); 
* marker - the marker used for the border of map areas specified in locations; 
* mode - the representation mode of the scatter graph; here we will use `markers` but frequent used are as well `lines` or `markers+lines`;  

Multiple `trace` can be specified; then are added in the `data` that will be then displayed in a figure (`fig`), using a `layout`. For the layout, the following attributes are specified:  
* title - title displayed for the chart;
* xaxis - title and attributes of the title displayed on the x axis;
* yaxes - title and attributes of the title displayed on the y axis;
* hovermode - specify how will be displayed the popups when hover above the points - all popups or only over on the current trace;

In [10]:
trace = go.Scatter(
    x = df['Year'],
    y = df['Athlets'],
    name="Athlets per Olympic game",
    marker=dict(
        color="Blue",
    ),
    mode = "markers"
)
data = [trace]
layout = dict(title = 'Athlets per Olympic game',
          xaxis = dict(title = 'Year', showticklabels=True), 
          yaxis = dict(title = 'Number of athlets'),
          hovermode = 'closest'
         )
fig = dict(data=data, layout=layout)
iplot(fig, filename='events-athlets1')

Let's inspect the graph obtained.

We specified the graph to plot using `trace`. More than one `trace` can be included in a plot. For this, we group them in the `data` collection. 

The presentation of the scatter plot is specified in the `layout`.

For the purpose of ilustrating some of the features of the plotly, we included in the layout options for xaxis and yaxis.

The figure is assembled in a dictionary with `data` and `layout` and is displayed using iplot.


On the x-axis we have the years of the Olympic games and on the y-axis we have the number of athlets per game. 

Above the title we have several controls allowing us to control the plot. 

A group of controls allows various visualization control functions: we can zoom, pan, select a window to zoom, use `lasso` selection, zoom in, zoom out, reset the zoom.

When we hover over the plot, the y-value is displayed in a small popup over the closest point. We have the option to toggle between showing only the y-value of the closest point and the x-axis value or showing all y-values.

We can even control this option when we build the graph, with the option:  `hovermode` set to `compare`.


We observe that there are years when we do have two events (Summer and Winter). Indeed, in the history of the Olympics, the Games started with only summer events, then there were in the same year for a while both Summer and Winter games and then, at a certain moment in time, they started to schedule in different years Summer and Winter games. 

Let's plot again the scatter plot but showing now the `Summer` and `Winter` games with diferent colors, on the same plot. For this, we will create two different traces. 

We will also add lines to the scatter plot, by adding `lines` to the `mode`.

In [11]:
dfS = df[df['Season']=='Summer']; dfW = df[df['Season']=='Winter']

traceS = go.Scatter(
    x = dfS['Year'],y = dfS['Athlets'],
    name="Summer Games",
    marker=dict(color="Red"),
    mode = "markers+lines"
)
traceW = go.Scatter(
    x = dfW['Year'],y = dfW['Athlets'],
    name="Winter Games",
    marker=dict(color="Blue"),
    mode = "markers+lines"
)

data = [traceS, traceW]
layout = dict(title = 'Athlets per Olympic game',
          xaxis = dict(title = 'Year', showticklabels=True), 
          yaxis = dict(title = 'Number of athlets'),
          hovermode = 'closest'
         )
fig = dict(data=data, layout=layout)
iplot(fig, filename='events-athlets2')

Now we show the number of athlets per each game, on each year. 

Each type of game (`Summer` or `Winter`) is shown with a different color. Also, when we hover over the points, plotly displays the small popups with the y-value for each scatter plot, with the name of the plot as well displayed aside. Also, the legend shows the name of each trace, as we defined when we specified the traces.

We can observe that from 1896 to 1920 there were only `Summer` games. From 1924 to 1992 there were `Summer` and `Winter` events on each 4 years, with the interruption due to the WW2 between 1936 and 1948.



The scatter plot allows to see the patterns of the games presence. 

One notable event that can be spotted is that in `1956` the Summer Olympics were held in 2 different cities, `Melbourne` and `Stockholm`. What happened was that due to strict quarantine regulations of Australia, horses could not be admitted in the country and therefore equestrian competitions were held four months before in Stockholm.

Another example, there was a drastic drop in the presence in `1980` when West block boycoted the Moscow Olympic games. Let's see if we can plot the name of the Olympic venue aside each scatter plot point.  



<a href="#0"><font size="1" color="red">Go to top</font></a>

## <a id="1011">append_trace</a>  

Let's show how we can create subplots with Plotly. We will display the previous plot side-by-side, on two colums.

In [12]:
traceS = go.Scatter(
    x = dfS['Year'],y = dfS['Athlets'],
    name="Summer Games",
    marker=dict(color="Red"),
    mode = "markers+lines",
    text=dfS['City'],
)
traceW = go.Scatter(
    x = dfW['Year'],y = dfW['Athlets'],
    name="Winter Games",
    marker=dict(color="Blue"),
    mode = "markers+lines",
    text=dfW['City']
)

data = [traceS, traceW]

fig = tools.make_subplots(rows=1, cols=2, subplot_titles=('Number athlets: Summer Games', 'Number athlets: Winter Games'))
fig.append_trace(traceS, 1, 1)
fig.append_trace(traceW, 1, 2)

iplot(fig, filename='events-athlets2')

This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y2 ]



<a href="#0"><font size="1">Go to top</font></a>


## <a id="102">Bar</a>

Let's display the number of athlets per Olympic Game using `barplot`.   

We will also prepare the dataset for visualization adding the City name.

In [13]:
tmp = athlete_events_df.groupby('Year')['City'].value_counts()
df2 = pd.DataFrame(data={'Athlets': tmp.values}, index=tmp.index).reset_index()
df2 = df2.merge(df)

<a href="#0"><font size="1" color="red">Go to top</font></a>

## <a id="103">create_table</a>

Let's also show how we can display tables with Plotly.

In [14]:
iplot(ff.create_table(df2.head(3)), filename='jupyter-table2')

In [15]:
dfS = df2[df2['Season']=='Summer']; dfW = df2[df2['Season']=='Winter']

traceS = go.Bar(
    x = dfS['Year'],y = dfS['Athlets'],
    name="Summer Games",
    marker=dict(color="Red"),
    text=dfS['City']
)
traceW = go.Bar(
    x = dfW['Year'],y = dfW['Athlets'],
    name="Winter Games",
    marker=dict(color="Blue"),
    text=dfS['City']
)

data = [traceS, traceW]
layout = dict(title = 'Athlets per Olympic game',
          xaxis = dict(title = 'Year', showticklabels=True), 
          yaxis = dict(title = 'Number of athlets'),
          hovermode = 'closest'
         )
fig = dict(data=data, layout=layout)
iplot(fig, filename='events-athlets3')

With this view is easier to see how the Olympic events are scheduled together from 1924 and separated starting from 1999.

We also display the city name in the small popup that appears when hover over the bars, beside the `Season`. 

In the following graph we play a bit more with the options for display for `go.Bar`. 

We will use the options to define color, opacity, margins for each barplot. 

As well, we change the layout to display the bars stacked (to see also the total number of athlets per Game, where there are two different games each year).

In [16]:
traceS = go.Bar(
    x = dfS['Year'],y = dfS['Athlets'],
    name="Summer Games",
     marker=dict(
                color='rgb(238,23,11)',
                line=dict(
                    color='black',
                    width=0.75),
                opacity=0.7,
            ),
    text=dfS['City'],
    
)
traceW = go.Bar(
    x = dfW['Year'],y = dfW['Athlets'],
    name="Winter Games",
    marker=dict(
                color='rgb(11,23,245)',
                line=dict(
                    color='black',
                    width=0.75),
                opacity=0.7,
            ),
    text=dfS['City']
)

data = [traceS, traceW]
layout = dict(title = 'Athlets per Olympic game',
          xaxis = dict(title = 'Year', showticklabels=True), 
          yaxis = dict(title = 'Number of athlets'),
          hovermode = 'closest',
          barmode='stack'
         )
fig = dict(data=data, layout=layout)
iplot(fig, filename='events-athlets4')

<a href="#0"><font size="1" color="red">Go to top</font></a>



## <a id="104">Box</a>

Let's show the distribution of athlets number during the Olympic games editions, grouped by `Season`. 


We display the boxes with transparency (`rgba(238,23,11,0.5)`) and with horizontal orientation.

We change the layout to rotate y-axis so that it is easier to read the labels for the `Seasons`.



In [17]:
traceS = go.Box(
    x = dfS['Athlets'],
    name="Summer Games",
    
     marker=dict(
                color='rgba(238,23,11,0.5)',
                line=dict(
                    color='red',
                    width=1.2),
            ),
    text=dfS['City'],
    orientation='h',
    
)
traceW = go.Box(
    x = dfW['Athlets'],
    name="Winter Games",
    marker=dict(
                color='rgba(11,23,245,0.5)',
                line=dict(
                    color='blue',
                    width=1.2),
            ),
    text=dfS['City'],  orientation='h',
)

data = [traceS, traceW]
layout = dict(title = 'Athlets per Olympic game',
          xaxis = dict(title = 'Number of athlets',showticklabels=True),
          yaxis = dict(title = 'Season', showticklabels=True, tickangle=-90), 
          hovermode = 'closest',
         )
fig = dict(data=data, layout=layout)
iplot(fig, filename='events-athlets5')

The plot of type `go.Box` displays the min, max, 1st Quartile and 3rd Quartile of the data distribution, in this case the number of athlets grouped per Season. 


<a href="#0"><font size="1" color="red">Go to top</font></a>

# <a id="4">Sports</a>  

Let's display informations about the sports using the functions that we already explored `go.Scatter`, `go.Bar` and `go.Box`.  

First, let's count how many different sports were played each Olympics. We will use `go.Scatter` to represent the number of sports per edition.

In [18]:
tmp = athlete_events_df.groupby(['Year', 'City','Season'])['Sport'].nunique()
df = pd.DataFrame(data={'Sports': tmp.values}, index=tmp.index).reset_index()

In [19]:
df.head(3)

Unnamed: 0,Year,City,Season,Sports
0,1896,Athina,Summer,9
1,1900,Paris,Summer,20
2,1904,St. Louis,Summer,18


In [20]:
dfS = df[df['Season']=='Summer']; dfW = df[df['Season']=='Winter']

traceS = go.Bar(
    x = dfS['Year'],y = dfS['Sports'],
    name="Summer Games",
     marker=dict(
                color='rgb(238,23,11)',
                line=dict(
                    color='red',
                    width=1),
                opacity=0.5,
            ),
    text= dfS['City'],
)
traceW = go.Bar(
    x = dfW['Year'],y = dfW['Sports'],
    name="Winter Games",
    marker=dict(
                color='rgb(11,23,245)',
                line=dict(
                    color='blue',
                    width=1),
                opacity=0.5,
            ),
    text=dfS['City']
)

data = [traceS, traceW]
layout = dict(title = 'Sports per Olympic edition',
          xaxis = dict(title = 'Year', showticklabels=True), 
          yaxis = dict(title = 'Number of sports'),
          hovermode = 'closest',
          barmode='stack'
         )
fig = dict(data=data, layout=layout)
iplot(fig, filename='events-sports1')

Let's show now the number of athlets per sport for each year. 

For each sport, each year, a point will be plot.

In [21]:
tmp = athlete_events_df.groupby(['Year', 'City','Season'])['Sport'].value_counts()
df = pd.DataFrame(data={'Athlets': tmp.values}, index=tmp.index).reset_index()
df.head()

Unnamed: 0,Year,City,Season,Sport,Athlets
0,1896,Athina,Summer,Athletics,106
1,1896,Athina,Summer,Gymnastics,97
2,1896,Athina,Summer,Shooting,65
3,1896,Athina,Summer,Cycling,41
4,1896,Athina,Summer,Tennis,23


In [22]:
dfS = df[df['Season']=='Summer']; dfW = df[df['Season']=='Winter']


traceS = go.Scatter(
    x = dfS['Year'],y = dfS['Athlets'],
    name="Summer Games",
     marker=dict(
                color='rgb(238,23,11)',
                line=dict(
                    color='red',
                    width=1),
                opacity=0.5,
            ),
    text= "City:"+dfS['City']+" Sport:"+dfS['Sport'],
    mode = "markers"
)
traceW = go.Scatter(
    x = dfW['Year'],y = dfW['Athlets'],
    name="Winter Games",
    marker=dict(
                color='rgb(11,23,245)',
                line=dict(
                    color='blue',
                    width=1),
                opacity=0.5,
            ),
   text= "City:"+dfW['City']+" Sport:"+dfW['Sport'],
    mode = "markers"
)

data = [traceS, traceW]
layout = dict(title = 'Number of athlets per sport for each Olympic edition',
          xaxis = dict(title = 'Year', showticklabels=True), 
          yaxis = dict(title = 'Number of athlets per sport'),
          hovermode='closest'
         )
fig = dict(data=data, layout=layout)
iplot(fig, filename='events-sports1')

The legend show the sport and city, as well as the number of athlets per sport per edition.

Let's also show the distribution of number of athlets per sport. For this we group by `Year` and `Season` and count the athlets per each sport.

In [23]:
tmp = athlete_events_df.groupby(['Year', 'City','Season'])['Sport'].value_counts()
df = pd.DataFrame(data={'Athlets': tmp.values}, index=tmp.index).reset_index()
df.head(3)

Unnamed: 0,Year,City,Season,Sport,Athlets
0,1896,Athina,Summer,Athletics,106
1,1896,Athina,Summer,Gymnastics,97
2,1896,Athina,Summer,Shooting,65


Let's define a list with all the Sports.

In [24]:
sports = (athlete_events_df.groupby(['Sport'])['Sport'].nunique()).index

We will create a function to display `trace` and one function to display the set of traces.  

We will also filter the Games per Summer and Winter.

In [25]:
def draw_trace(dataset, sport):
    dfS = dataset[dataset['Sport']==sport];
    trace = go.Box(
        x = dfS['Athlets'],
        name=sport,
         marker=dict(
                    line=dict(
                        color='black',
                        width=0.8),
                ),
        text=dfS['City'], 
        orientation = 'h'
    )
    return trace


def draw_group(dataset, title,height=800):
    data = list()
    for sport in sports:
        data.append(draw_trace(dataset, sport))


    layout = dict(title = title,
              xaxis = dict(title = 'Number of athlets',showticklabels=True),
              yaxis = dict(title = 'Sport', showticklabels=True, tickfont=dict(
                family='Old Standard TT, serif',
                size=8,
                color='black'),), 
              hovermode = 'closest',
              showlegend=False,
                  width=800,
                  height=height,
             )
    fig = dict(data=data, layout=layout)
    iplot(fig, filename='events-sports1')

# select only Summer Olympics
df_S = df[df['Season']=='Summer']
# draw the boxplots for the Summer Olympics
draw_group(df_S, "Athlets per Sport (Summer Olympics)")

Let's now use the same function defined above to plot the sports in Winter Olympics.

In [26]:
# select only Winter Olympics
df_W = df[df['Season']=='Winter']
# draw the boxplots for the Summer Olympics
draw_group(df_W, "Athlets per Sport (Winter Olympics)",600)


<a href="#0"><font size="1" color="red">Go to top</font></a>



## <a id="105">Heatmap</a>  


Let's also use a `Heatmap` to show the number of athlets per Game event and per Sport. 


We will process here only the Summer Olympics data.  

We create first a matrix with rows `Year` and columns `Sport` having the values the number of athlets per year and sport.


In [27]:
piv = pd.pivot_table(df_S, values="Athlets",index=["Year"], columns=["Sport"], fill_value=0)
m = piv.values

We prepare the `Heatmap`.

The attributes we use are:
* z - the matrix with values to be displayed;
* x - the columns names;
* y - the rows names;  
* colorsacale - the color scale to be used for display; 

In [28]:
trace = go.Heatmap(z = m, y= list(piv.index), x=list(piv.columns),colorscale='Reds',reversescale=False)
data=[trace]
layout = dict(title = "Number of athlets per year and sport (Summer Olympics)",
              xaxis = dict(title = 'Sport',
                        showticklabels=True,
                           tickangle = 45,
                        tickfont=dict(
                                size=10,
                                color='black'),
                          ),
              yaxis = dict(title = 'Year', 
                        showticklabels=True, 
                        tickfont=dict(
                            size=10,
                            color='black'),
                      ), 
              hovermode = 'closest',
              showlegend=False,
                  width=1000,
                  height=800,
             )
fig = dict(data=data, layout=layout)
iplot(fig, filename='labelled-heatmap')

Let's show also the corresponding heatmap plot for Winter Olympics.

In [29]:
piv = pd.pivot_table(df_W, values="Athlets",index=["Year"], columns=["Sport"], fill_value=0)
m = piv.values

In [30]:
trace = go.Heatmap(z = m, y= list(piv.index), x=list(piv.columns),colorscale='Blues',reversescale=True)
data=[trace]
layout = dict(title = "Number of athlets per year and sport (Winter Olympics)",
              xaxis = dict(title = 'Sport',
                        showticklabels=True,
                           tickangle = 30,
                        tickfont=dict(
                                size=8,
                                color='black'),
                          ),
              yaxis = dict(title = 'Year', 
                        showticklabels=True, 
                        tickfont=dict(
                            size=10,
                            color='black'),
                      ), 
              hovermode = 'closest',
              showlegend=False,
                  width=800,
                  height=800,
             )
fig = dict(data=data, layout=layout)
iplot(fig, filename='labelled-heatmap')

<a href="#0"><font size="1" color="red">Go to top</font></a>


## <a id="106">Pie</a>  


We are not recommending to use `Pie` for visualization (see also references [9], [10]). Instead, we recommend you to use `Bar` plots.   

There is a joke about usage of `Pie` charts :-) that we are presenting here :

In [31]:
labels = ['Sunny side of pyramid','Shaddy side of pyramid','Sky']
values = [300,150,1200]
colors = ['gold', 'brown', 'lightblue']

BOTTOM_OF_THE_PYRAMID_ACCORDING_TO_NEWTON_LAWS = 220

trace = go.Pie(labels=labels, values=values,
               hoverinfo='label', textinfo='none', 
               textfont=dict(size=20),
               rotation=BOTTOM_OF_THE_PYRAMID_ACCORDING_TO_NEWTON_LAWS,
               marker=dict(colors=colors, 
                           line=dict(color='#000000', width=1)))
iplot([trace], filename='styled_pie_chart')

We used `rotation` to align the base part of the pyramid to the ground. 

We used `textinfo` = `none` to remove percent or label text from the pie slices.


Let's use here to show the proportion of athlets number per sports, separatelly for Summer and Winter Olympics.

In [32]:
tmp = athlete_events_df.groupby(['Season'])['Sport'].value_counts()
df = pd.DataFrame(data={'Athlets': tmp.values}, index=tmp.index).reset_index()
df.head(3)

Unnamed: 0,Season,Sport,Athlets
0,Summer,Athletics,38624
1,Summer,Gymnastics,26707
2,Summer,Swimming,23195


In [33]:
df_S = df[df['Season']=='Summer']

trace = go.Pie(labels=df_S['Sport'], 
               values=df_S['Athlets'],
               hoverinfo='label+value+percent', 
               textinfo='value+percent', 
               textfont=dict(size=8),
               rotation=180,
               marker=dict(colors=colors, 

                           line=dict(color='#000000', width=1)
                        )
            )

data = [trace]
layout = dict(title = "Number of athlets per sport (Summer Olympics)",
                  width=800,
                  height=1200,
              legend=dict(orientation="h")
             )
fig = dict(data=data,layout=layout)
iplot(fig, filename='styled_pie_chart')

In [34]:
df_S = df[df['Season']=='Winter']

trace = go.Pie(labels=df_S['Sport'], 
               values=df_S['Athlets'],
               hoverinfo='label+value+percent', 
               textinfo='value+percent', 
               textfont=dict(size=8),
               rotation=180,
               marker=dict(colors=colors, 

                           line=dict(color='#000000', width=1)
                        )
            )

data = [trace]
layout = dict(title = "Number of athlets per sport (Winter Olympics)",
                  width=800,
                  height=800,
              legend=dict(orientation="h")
             )
fig = dict(data=data,layout=layout)
iplot(fig, filename='styled_pie_chart')

<a href="#0"><font size="1" color="red">Go to top</font></a>


# <a id="5">Countries</a> 

Let's merge first the `noc_regions_df` with `athlete_events_df` dataset.

In [35]:
olympics_df = athlete_events_df.merge(noc_regions_df)

In [36]:
print("All Olympics data -  rows:",olympics_df.shape[0]," columns:", olympics_df.shape[1])

All Olympics data -  rows: 270767  columns: 17


In [37]:
olympics_df.head(3)

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal,region,notes
0,1,A Dijiang,M,24.0,180.0,80.0,China,CHN,1992 Summer,1992,Summer,Barcelona,Basketball,Basketball Men's Basketball,,China,
1,2,A Lamusi,M,23.0,170.0,60.0,China,CHN,2012 Summer,2012,Summer,London,Judo,Judo Men's Extra-Lightweight,,China,
2,602,Abudoureheman,M,22.0,182.0,75.0,China,CHN,2000 Summer,2000,Summer,Sydney,Boxing,Boxing Men's Middleweight,,China,


First, let's rename `region` column as `Country`.   


Then, let's show at how many editions went each different Country.  

<a href="#0"><font size="1">Go to top</font></a>


## <a id="107">Choropleth</a>

We will use for representing the Country a `Choropleth` representation.

We specify the following attributes:  
* locations - these are the countries;  
* locationmode - the mode used for specifying the locations; in our case, we will use the `country names` option; other options are `ISO-3` or `USA-states`. 
* z - the value displayed; 
* text - the text shown in the popup on hover;  
* colorscale - the color scale used for the areas on the map; 
* marker - the marker used for the border of map areas specified in locations;

These are only a part of the important attributes for a `Choropleth` map, 

For the layout, there is an attribute `geo` with the following parameters:  
* showframe - if a frame is drawn around the map;  
* showcoastlines - if coast lines are drawn around the continents; if set to `False` no continents are shown;  
* showlakes - if interior non-continental areas are shown;  
* projection - there are multiple options, most used being `Mercator`, `orthographic`, `natural earth` and `albers usa` for US counties.


In [38]:
olympics_df=olympics_df.rename(columns = {'region':'Country'})

In [39]:
tmp = olympics_df.groupby(['Country'])['Year'].nunique()
df = pd.DataFrame(data={'Editions': tmp.values}, index=tmp.index).reset_index()
df.head(2)

Unnamed: 0,Country,Editions
0,Afghanistan,14
1,Albania,11


In [40]:
trace = go.Choropleth(
            locations = df['Country'],
            locationmode='country names',
            z = df['Editions'],
            text = df['Country'],
            autocolorscale =False,
            reversescale = True,
            colorscale = 'rainbow',
            marker = dict(
                line = dict(
                    color = 'rgb(0,0,0)',
                    width = 0.5)
            ),
            colorbar = dict(
                title = 'Editions',
                tickprefix = '')
        )

data = [trace]
layout = go.Layout(
    title = 'Olympic countries',
    geo = dict(
        showframe = True,
        showlakes = False,
        showcoastlines = True,
        projection = dict(
            type = 'natural earth'
        )
    )
)

fig = dict( data=data, layout=layout )
iplot(fig)

Let's show separately the number of events per country for Summer and Winter events. We will extract a function first.

In [41]:
tmp = olympics_df.groupby(['Country', 'Season'])['Year'].nunique()
df = pd.DataFrame(data={'Editions': tmp.values}, index=tmp.index).reset_index()
df.head(2)

Unnamed: 0,Country,Season,Editions
0,Afghanistan,Summer,14
1,Albania,Summer,8


In [42]:
dfS = df[df['Season']=='Summer']; dfW = df[df['Season']=='Winter']

def draw_map(dataset, title, colorscale, reversescale=False):
    trace = go.Choropleth(
                locations = dataset['Country'],
                locationmode='country names',
                z = dataset['Editions'],
                text = dataset['Country'],
                autocolorscale =False,
                reversescale = reversescale,
                colorscale = colorscale,
                marker = dict(
                    line = dict(
                        color = 'rgb(0,0,0)',
                        width = 0.5)
                ),
                colorbar = dict(
                    title = 'Editions',
                    tickprefix = '')
            )

    data = [trace]
    layout = go.Layout(
        title = title,
        geo = dict(
            showframe = True,
            showlakes = False,
            showcoastlines = True,
            projection = dict(
                type = 'orthographic'
            )
        )
    )
    fig = dict( data=data, layout=layout )
    iplot(fig)
    
draw_map(dfS, 'Olympic countries (Summer games)', "Reds")

In [43]:
draw_map(dfW, 'Olympic countries (Winter games)', "Blues", True)

Let's show the variation in time of the number of athlets per each country.

In [44]:
tmp = olympics_df.groupby(['Year','Sport'])['Country'].value_counts()
dataset = pd.DataFrame(data={'Athlets': tmp.values}, index=tmp.index).reset_index()
dataset.head()

Unnamed: 0,Year,Sport,Country,Athlets
0,1896,Athletics,Greece,36
1,1896,Athletics,USA,21
2,1896,Athletics,Germany,14
3,1896,Athletics,France,12
4,1896,Athletics,UK,7


<a href="#0"><font size="1" color="red">Go to top</font></a>


# <a id="6">Athlets</a> 

Let's show the age, height and weight of athlets distribution using a `distplot` chart.

We will group the data per sex and Season.

<a href="#0"><font size="1" color="red">Go to top</font></a>


## <a id="108">create_distplot</a>   


Let's show first the height distribution for athlets, grouped by sex.

In [45]:
female_h = olympics_df[olympics_df['Sex']=='F']['Height'].dropna()
male_h = olympics_df[olympics_df['Sex']=='M']['Height'].dropna()

hist_data = [female_h, male_h]
group_labels = ['Female Height', 'Male Height']

fig = ff.create_distplot(hist_data, group_labels, show_hist=False, show_rug=False)
fig['layout'].update(title='Athlets Height distribution plot')
iplot(fig, filename='dist_only')

Let's show the weight distribution for athlets, grouped by sex.

In [46]:
female_w = olympics_df[olympics_df['Sex']=='F']['Weight'].dropna()
male_w = olympics_df[olympics_df['Sex']=='M']['Weight'].dropna()

hist_data = [female_w, male_w]
group_labels = ['Female Weight', 'Male Weight']

fig = ff.create_distplot(hist_data, group_labels, show_hist=False, show_rug=False)
fig['layout'].update(title='Athlets Weight distribution plot')
iplot(fig, filename='dist_only')

Let's show also the age distribution for athlets, grouped by sex.

In [47]:
female_a = olympics_df[olympics_df['Sex']=='F']['Age'].dropna()
male_a = olympics_df[olympics_df['Sex']=='M']['Age'].dropna()

hist_data = [female_a, male_a]
group_labels = ['Female Age', 'Male Age']

fig = ff.create_distplot(hist_data, group_labels, show_hist=False, show_rug=False)
fig['layout'].update(title='Athlets Age distribution plot')
iplot(fig, filename='dist_only')

Let's show on a graph with x axis the average height and with y axis the average weight the number of athlets, grouped by sport.  

We will use a scatter plot but with markers (for each sport) proportional with the number of athlets.

In [48]:
tmp = olympics_df.groupby(['Sport'])['Height', 'Weight'].agg('mean').dropna()
df1 = pd.DataFrame(tmp).reset_index()
tmp2 = olympics_df.groupby(['Sport'])['ID'].count()
df2 = pd.DataFrame(tmp2).reset_index()
dataset = df1.merge(df2)

Let's define the hover text.

In [49]:
hover_text = []
for index, row in dataset.iterrows():
    hover_text.append(('Sport: {}<br>'+
                      'Number of athlets: {}<br>'+
                      'Mean Height: {}<br>'+
                      'Mean Weight: {}<br>').format(row['Sport'],
                                            row['ID'],
                                            round(row['Height'],2),
                                            round(row['Weight'],2)))
dataset['hover_text'] = hover_text

Let's now create the bubble scatter plot.

In [50]:
data = []
for sport in dataset['Sport']:
    ds = dataset[dataset['Sport']==sport]
    trace = go.Scatter(
        x = ds['Height'],
        y = ds['Weight'],
        name = sport,
        marker=dict(
            symbol='circle',
            sizemode='area',
            sizeref=10,
            size=ds['ID'],
            line=dict(
                width=2
            ),),
        text = ds['hover_text']
    )
    data.append(trace)
                         
layout = go.Layout(
    title='Athlets height and weight mean - grouped by sport',
    xaxis=dict(
        title='Height [cm]',
        gridcolor='rgb(128, 128, 128)',
        zerolinewidth=1,
        ticklen=1,
        gridwidth=0.5,
    ),
    yaxis=dict(
        title='Weight [kg]',
        gridcolor='rgb(128, 128, 128)',
        zerolinewidth=1,
        ticklen=1,
        gridwidth=0.5,
    ),
    paper_bgcolor='rgb(255,255,255)',
    plot_bgcolor='rgb(254, 254, 254)',
    showlegend=False,
)


fig = dict(data = data, layout = layout)

iplot(fig, filename='athlets_body_measures')
                         

<a href="#0"><font size="1">Go to top</font></a>


## <a id="109">Slider (animation)</a>


Let's represent the athlets body measurements plot, grouped not only by `Sport` but also by `Year`. For each `Year` we will set a slide and evolution in time will be shown as an animation. A button for start and one for pause will also be added to the chart, to allow easy browsing through the frames.


In [51]:
tmp = olympics_df.groupby(['Sport', 'Year'])['Height', 'Weight'].agg('mean').dropna()
df1 = pd.DataFrame(tmp).reset_index()
tmp2 = olympics_df.groupby(['Sport', 'Year'])['ID'].count()
df2 = pd.DataFrame(tmp2).reset_index()
dataset = df1.merge(df2)

Let's glipmse the resulted dataset.

In [52]:
dataset.head(3)

Unnamed: 0,Sport,Year,Height,Weight,ID
0,Alpine Skiing,1936,169.25,61.0,103
1,Alpine Skiing,1948,170.116279,64.666667,360
2,Alpine Skiing,1952,173.387755,66.93617,378


We create the hover text, by adding the `Year` as well in this case.

In [53]:
hover_text = []
for index, row in dataset.iterrows():
    hover_text.append(('Year: {}<br>'+
                       'Sport: {}<br>'+
                      'Number of athlets: {}<br>'+
                      'Mean Height: {}<br>'+
                      'Mean Weight: {}<br>').format(row['Year'], 
                                            row['Sport'],
                                            row['ID'],
                                            round(row['Height'],2),
                                            round(row['Weight'],2)))
dataset['hover_text'] = hover_text

We create now the animation.

In [54]:
years = (olympics_df.groupby(['Year'])['Year'].nunique()).index
sports = (olympics_df.groupby(['Sport'])['Sport'].nunique()).index
# make figure
figure = {
    'data': [],
    'layout': {},
    'frames': []
}

# fill in most of layout
figure['layout']['xaxis'] = {'range': [140, 200], 'title': 'Height'}
figure['layout']['yaxis'] = {'range': [20, 200],'title': 'Weight'}
figure['layout']['hovermode'] = 'closest'
figure['layout']['showlegend'] = False
figure['layout']['sliders'] = {
    'args': [
        'transition', {
            'duration': 400,
            'easing': 'cubic-in-out'
        }
    ],
    'initialValue': '1896',
    'plotlycommand': 'animate',
    'values': years,
    'visible': True
}

figure['layout']['updatemenus'] = [
    {
        'buttons': [
            {
                'args': [None, {'frame': {'duration': 500, 'redraw': False},
                         'fromcurrent': True, 'transition': {'duration': 300, 'easing': 'quadratic-in-out'}}],
                'label': 'Play',
                'method': 'animate'
            },
            {
                'args': [[None], {'frame': {'duration': 0, 'redraw': False}, 'mode': 'immediate',
                'transition': {'duration': 0}}],
                'label': 'Pause',
                'method': 'animate'
            }
        ],
        'direction': 'left',
        'pad': {'r': 10, 't': 87},
        'showactive': False,
        'type': 'buttons',
        'x': 0.1,
        'xanchor': 'right',
        'y': 0,
        'yanchor': 'top'
    }
]
sliders_dict = {
    'active': 0,
    'yanchor': 'top',
    'xanchor': 'left',
    'currentvalue': {
        'font': {'size': 20},
        'prefix': 'Year:',
        'visible': True,
        'xanchor': 'right'
    },
    'transition': {'duration': 300, 'easing': 'cubic-in-out'},
    'pad': {'b': 10, 't': 50},
    'len': 0.9,
    'x': 0.1,
    'y': 0,
    'steps': []
}
# make data
year = 1896
for sport in sports:
    dataset_by_year = dataset[dataset['Year'] == year]
    dataset_by_year_and_season = dataset_by_year[dataset_by_year['Sport'] == sport]

    data_dict = {
        'x': list(dataset_by_year_and_season['Height']),
        'y': list(dataset_by_year_and_season['Weight']),
        'mode': 'markers',
        'text': list(dataset_by_year_and_season['hover_text']),
        'marker': {
            'sizemode': 'area',
            'sizeref': 1,
            'size': list(dataset_by_year_and_season['ID'])
        },
        'name': sport
    }
    figure['data'].append(data_dict)
# make frames
for year in years:
    frame = {'data': [], 'name': str(year)}
    for sport in sports:
        dataset_by_year = dataset[dataset['Year'] == int(year)]
        dataset_by_year_and_season = dataset_by_year[dataset_by_year['Sport'] == sport]

        data_dict = {
            'x': list(dataset_by_year_and_season['Height']),
            'y': list(dataset_by_year_and_season['Weight']),
            'mode': 'markers',
            'text': list(dataset_by_year_and_season['hover_text']),
            'marker': {
                'sizemode': 'area',
                'sizeref': 1,
                'size':  list(dataset_by_year_and_season['ID'])
            },
            'name': sport
        }
        frame['data'].append(data_dict)

    figure['frames'].append(frame)
    slider_step = {'args': [
        [year],
        {'frame': {'duration': 300, 'redraw': False},
         'mode': 'immediate',
       'transition': {'duration': 300}}
     ],
     'label': year,
     'method': 'animate'}
    sliders_dict['steps'].append(slider_step)
figure['layout']['sliders'] = [sliders_dict]
iplot(figure)


## Athlets body measurements grouped by Sex

Let's do, as an exercise, a similar plot, grouping the athlets by `Sex` instead of `Sport`.

In [55]:
tmp = olympics_df.groupby(['Sex'])['Height', 'Weight'].agg('mean').dropna()
df1 = pd.DataFrame(tmp).reset_index()
tmp2 = olympics_df.groupby(['Sex'])['ID'].count()
df2 = pd.DataFrame(tmp2).reset_index()
dataset = df1.merge(df2)

In [56]:
hover_text = []
for index, row in dataset.iterrows():
    hover_text.append(('Sex: {}<br>'+
                      'Number of athlets: {}<br>'+
                      'Mean Height: {}<br>'+
                      'Mean Weight: {}<br>').format(row['Sex'],
                                            row['ID'],
                                            round(row['Height'],2),
                                            round(row['Weight'],2)))
dataset['hover_text'] = hover_text

In [57]:
data = []
for sex in dataset['Sex']:
    ds = dataset[dataset['Sex']==sex]
    trace = go.Scatter(
        x = ds['Height'],
        y = ds['Weight'],
        name = sex,
        marker=dict(
            symbol='circle',
            sizemode='area',
            sizeref=10,
            size=ds['ID'],
            line=dict(
                width=2
            ),),
        text = ds['hover_text']
    )
    data.append(trace)
                         
layout = go.Layout(
    title='Athlets height and weight mean - grouped by Sex',
    xaxis=dict(
        title='Height [cm]',
        gridcolor='rgb(128, 128, 128)',
        zerolinewidth=1,
        ticklen=1,
        gridwidth=0.5,
    ),
    yaxis=dict(
        title='Weight [kg]',
        gridcolor='rgb(128, 128, 128)',
        zerolinewidth=1,
        ticklen=1,
        gridwidth=0.5,
    ),
    paper_bgcolor='rgb(255,255,255)',
    plot_bgcolor='rgb(254, 254, 254)',
    showlegend=False,
)


fig = dict(data = data, layout = layout)

iplot(fig, filename='athlets_body_measures2')
                         

## Time variation of athlets body measurement, grouped by Sex

In [58]:
tmp = olympics_df.groupby(['Sex', 'Year'])['Height', 'Weight'].agg('mean').dropna()
df1 = pd.DataFrame(tmp).reset_index()
tmp2 = olympics_df.groupby(['Sex', 'Year'])['ID'].count()
df2 = pd.DataFrame(tmp2).reset_index()
dataset = df1.merge(df2)

In [59]:
hover_text = []
for index, row in dataset.iterrows():
    hover_text.append(('Year: {}<br>'+
                       'Sex: {}<br>'+
                      'Number of athlets: {}<br>'+
                      'Mean Height: {}<br>'+
                      'Mean Weight: {}<br>').format(row['Year'], 
                                            row['Sex'],
                                            row['ID'],
                                            round(row['Height'],2),
                                            round(row['Weight'],2)))
dataset['hover_text'] = hover_text

In [60]:
years = (olympics_df.groupby(['Year'])['Year'].nunique()).index
sexes = (olympics_df.groupby(['Sex'])['Sex'].nunique()).index
# make figure
figure = {
    'data': [],
    'layout': {},
    'frames': []
}

# fill in most of layout
figure['layout']['xaxis'] = {'range': [100, 200], 'title': 'Height'}
figure['layout']['yaxis'] = {'range': [20, 200],'title': 'Weight'}
figure['layout']['hovermode'] = 'closest'
figure['layout']['showlegend'] = False
figure['layout']['sliders'] = {
    'args': [
        'transition', {
            'duration': 400,
            'easing': 'cubic-in-out'
        }
    ],
    'initialValue': '1896',
    'plotlycommand': 'animate',
    'values': years,
    'visible': True
}

figure['layout']['updatemenus'] = [
    {
        'buttons': [
            {
                'args': [None, {'frame': {'duration': 500, 'redraw': False},
                         'fromcurrent': True, 'transition': {'duration': 300, 'easing': 'quadratic-in-out'}}],
                'label': 'Play',
                'method': 'animate'
            },
            {
                'args': [[None], {'frame': {'duration': 0, 'redraw': False}, 'mode': 'immediate',
                'transition': {'duration': 0}}],
                'label': 'Pause',
                'method': 'animate'
            }
        ],
        'direction': 'left',
        'pad': {'r': 10, 't': 87},
        'showactive': False,
        'type': 'buttons',
        'x': 0.1,
        'xanchor': 'right',
        'y': 0,
        'yanchor': 'top'
    }
]
sliders_dict = {
    'active': 0,
    'yanchor': 'top',
    'xanchor': 'left',
    'currentvalue': {
        'font': {'size': 20},
        'prefix': 'Year:',
        'visible': True,
        'xanchor': 'right'
    },
    'transition': {'duration': 300, 'easing': 'cubic-in-out'},
    'pad': {'b': 10, 't': 50},
    'len': 0.9,
    'x': 0.1,
    'y': 0,
    'steps': []
}
# make data
year = 1896
for sex in sexes:
    dataset_by_year = dataset[dataset['Year'] == year]
    dataset_by_year_and_season = dataset_by_year[dataset_by_year['Sex'] == sex]

    data_dict = {
        'x': list(dataset_by_year_and_season['Height']),
        'y': list(dataset_by_year_and_season['Weight']),
        'mode': 'markers',
        'text': list(dataset_by_year_and_season['hover_text']),
        'marker': {
            'sizemode': 'area',
            'sizeref': 1,
            'size': list(dataset_by_year_and_season['ID'])
        },
        'name': sex
    }
    figure['data'].append(data_dict)
# make frames
for year in years:
    frame = {'data': [], 'name': str(year)}
    for sex in sexes:
        dataset_by_year = dataset[dataset['Year'] == int(year)]
        dataset_by_year_and_season = dataset_by_year[dataset_by_year['Sex'] == sex]

        data_dict = {
            'x': list(dataset_by_year_and_season['Height']),
            'y': list(dataset_by_year_and_season['Weight']),
            'mode': 'markers',
            'text': list(dataset_by_year_and_season['hover_text']),
            'marker': {
                'sizemode': 'area',
                'sizeref': 1,
                'size':  list(dataset_by_year_and_season['ID'])
            },
            'name': sex
        }
        frame['data'].append(data_dict)

    figure['frames'].append(frame)
    slider_step = {'args': [
        [year],
        {'frame': {'duration': 300, 'redraw': False},
         'mode': 'immediate',
       'transition': {'duration': 300}}
     ],
     'label': year,
     'method': 'animate'}
    sliders_dict['steps'].append(slider_step)
figure['layout']['sliders'] = [sliders_dict]
iplot(figure)

## Athlets body measurements, grouped by Sex and Sport

Let's group now on both criteria and create two graphs.  We add also the `Age` and average as well on this and we make the bubble size proportional with the Age.

In [61]:
tmp = olympics_df.groupby(['Sport', 'Sex'])['Height', 'Weight', 'Age'].agg('mean').dropna()
df1 = pd.DataFrame(tmp).reset_index()
tmp2 = olympics_df.groupby(['Sport', 'Sex'])['ID'].count()
df2 = pd.DataFrame(tmp2).reset_index()
dataset = df1.merge(df2)

In [62]:
dataset.head()

Unnamed: 0,Sport,Sex,Height,Weight,Age,ID
0,Alpine Skiing,F,167.221001,62.640307,22.334609,3398
1,Alpine Skiing,M,177.891374,78.626035,23.758266,5431
2,Archery,F,167.166483,62.013575,26.508458,1015
3,Archery,M,178.477842,77.066866,29.083267,1319
4,Art Competitions,M,174.896552,75.290909,46.062816,3201


In [63]:
hover_text = []
for index, row in dataset.iterrows():
    hover_text.append(('Sex: {}<br>'+
                       'Sport: {}<br>'
                       'Number of athlets: {}<br>'+
                       'Mean Age: {}<br>'
                       'Mean Height: {}<br>'+
                       'Mean Weight: {}<br>').format(row['Sex'],
                                            row['Sport'],
                                            row['ID'],
                                            round(row['Age'],2), 
                                            round(row['Height'],2),
                                            round(row['Weight'],2)))
dataset['hover_text'] = hover_text

In [64]:

def plot_bubble_chart(dataset,title):
    data = []
    for sport in dataset['Sport']:
        ds = dataset[dataset['Sport']==sport]
        trace = go.Scatter(
            x = ds['Height'],
            y = ds['Weight'],
            name = sport,
            marker=dict(
                symbol='circle',
                sizemode='area',
                sizeref=50,
                size=np.power(ds['Age'],3),
                line=dict(
                    width=2
                ),),
            text = ds['hover_text']
        )
        data.append(trace)

    layout = go.Layout(
        title= title,
        xaxis=dict(
            title='Height [cm]',
            gridcolor='rgb(128, 128, 128)',
            zerolinewidth=1,
            ticklen=1,
            gridwidth=0.5,
            range=[150,200]
        ),
        yaxis=dict(
            title='Weight [kg]',
            gridcolor='rgb(128, 128, 128)',
            zerolinewidth=1,
            ticklen=1,
            gridwidth=0.5,
            range=[45,100]
        ),
        paper_bgcolor='rgb(255,255,255)',
        plot_bgcolor='rgb(254, 254, 254)',
        showlegend=False,
    )
    fig = dict(data = data, layout = layout)
    iplot(fig, filename='athlets_body_measures')
    


In [65]:
dF = dataset[dataset['Sex']=='F']
plot_bubble_chart(dF,'Female athlets height and weight mean - grouped by sport')

In [66]:
dM = dataset[dataset['Sex']=='M']
plot_bubble_chart(dM,'Male athlets height and weight mean - grouped by sport')

<a href="#0"><font size="1" color="red">Go to top</font></a>


# <a id="7">Medals</a> 

Let's check what are the countries with most medals.

In [67]:
tmp = olympics_df.groupby(['Country', 'Medal'])['ID'].agg('count').dropna()
df = pd.DataFrame(tmp).reset_index()

In [68]:
dfG = df[df['Medal']=='Gold']
dfS = df[df['Medal']=='Silver']
dfB = df[df['Medal']=='Bronze']

def draw_map(dataset, title, colorscale):
    trace = go.Choropleth(
                locations = dataset['Country'],
                locationmode='country names',
                z = dataset['ID'],
                text = dataset['Country'],
                autocolorscale =False,
                reversescale = True,
                colorscale = colorscale,
                marker = dict(
                    line = dict(
                        color = 'rgb(0,0,0)',
                        width = 0.5)
                ),
                colorbar = dict(
                    title = 'Medals',
                    tickprefix = '')
            )
    data = [trace]
    layout = go.Layout(
        title = title,
        geo = dict(
            showframe = True,
            showlakes = False,
            showcoastlines = True,
            projection = dict(
                type = 'natural earth'
            )
        )
    )
    fig = dict( data=data, layout=layout )
    iplot(fig)

Let's plot the countries with Gold, Silver and Bronze medals.

In [69]:
draw_map(dfG, "Countries with Gold Medals",'Greens')

In [70]:
draw_map(dfS, "Countries with Silver Medals",'Greys')

In [71]:
draw_map(dfB, "Countries with Bronze Medals",'Reds')

Let's show the number of medals (Gold, Silver, Bronze) per Olympic edition.

In [72]:
tmp = olympics_df.groupby(['Year', 'City','Season', 'Medal'])['ID'].agg('count').dropna()
df = pd.DataFrame(tmp).reset_index()
dfG = df[df['Medal']=='Gold']
dfS = df[df['Medal']=='Silver']
dfB = df[df['Medal']=='Bronze']

In [73]:
dfG.head()

Unnamed: 0,Year,City,Season,Medal,ID
1,1896,Athina,Summer,Gold,62
4,1900,Paris,Summer,Gold,201
7,1904,St. Louis,Summer,Gold,173
10,1906,Athina,Summer,Gold,157
13,1908,London,Summer,Gold,294


In [74]:

traceG = go.Bar(
    x = dfG['Year'],y = dfG['ID'],
    name="Gold",
     marker=dict(
                color='gold',
                line=dict(
                    color='black',
                    width=1),
                opacity=0.5,
            ),
    text = dfG['City']+ " (" + dfG['Season'] + ")",
)
traceS = go.Bar(
    x = dfS['Year'],y = dfS['ID'],
    name="Silver",
    marker=dict(
                color='Grey',
                line=dict(
                    color='black',
                    width=1),
                opacity=0.5,
            ),
    text=dfS['City']+ " (" + dfS['Season'] + ")",
)

traceB = go.Bar(
    x = dfB['Year'],y = dfB['ID'],
    name="Bronze",
    marker=dict(
                color='Brown',
                line=dict(
                    color='black',
                    width=1),
                opacity=0.5,
            ),
    text=dfB['City']+ " (" + dfB['Season'] + ")",
)

data = [traceG, traceS, traceB]
layout = dict(title = 'Medals per Olympic edition',
          xaxis = dict(title = 'Year', showticklabels=True), 
          yaxis = dict(title = 'Number of medals'),
          hovermode = 'closest',
          barmode='stack'
         )
fig = dict(data=data, layout=layout)
iplot(fig, filename='events-sports1')

Let's also show the number of medals per sport.  
We will show separately the number of medals for **Gold**, **Silver** and **Bronze**.

In [75]:
tmp = olympics_df.groupby(['Sport', 'Medal'])['ID'].agg('count').dropna()
df = pd.DataFrame(tmp).reset_index()
dfG = df[df['Medal']=='Gold']
dfS = df[df['Medal']=='Silver']
dfB = df[df['Medal']=='Bronze']

In [76]:
traceG = go.Bar(
    x = dfG['Sport'],y = dfG['ID'],
    name="Gold",
     marker=dict(
                color='gold',
                line=dict(
                    color='black',
                    width=1),
                opacity=0.5,
            ),
    text = dfG['Sport'],
    #orientation = 'h'
)
traceS = go.Bar(
    x = dfS['Sport'],y = dfS['ID'],
    name="Silver",
    marker=dict(
                color='Grey',
                line=dict(
                    color='black',
                    width=1),
                opacity=0.5,
            ),
    text=dfS['Sport'],
    #orientation = 'h'
)

traceB = go.Bar(
    x = dfB['Sport'],y = dfB['ID'],
    name="Bronze",
    marker=dict(
                color='Brown',
                line=dict(
                    color='black',
                    width=1),
                opacity=0.5,
            ),
    text=dfB['Sport'],
   # orientation = 'h'
)

data = [traceG, traceS, traceB]
layout = dict(title = 'Medals per sport',
          xaxis = dict(title = 'Sport', showticklabels=True, tickangle=45,
            tickfont=dict(
                size=8,
                color='black'),), 
          yaxis = dict(title = 'Number of medals'),
          hovermode = 'closest',
          barmode='stack',
          showlegend=False,
          width=900,
          height=600,
         )
fig = dict(data=data, layout=layout)
iplot(fig, filename='events-sports1')

<a href="#0"><font size="1" color="red">Go to top</font></a>


# <a id="8">References</a> 

[1] Plotly cheatsheet, https://images.plot.ly/plotly-documentation/images/python_cheat_sheet.pdf  
[2] Scatterplots with Plotly, https://plot.ly/python/line-and-scatter/  
[3] Bar charts with Plotly, https://plot.ly/python/bar-charts/   
[4] Box charts with Plotly, https://plot.ly/python/box-plots/  
[5] Plotly maps, https://plot.ly/python/choropleth-maps/  
[6] Plotly axes, https://plot.ly/python/axes/  
[7] Plotly animations, https://plot.ly/python/animations/   
[8] Plotly reference, https://plot.ly/python/reference/     
[9] Kristin Henry, In Defense of Pie Charts, and Why You Shouldn’t Use Them, https://medium.com/@KristinHenry/in-defense-of-pie-charts-and-why-you-shouldnt-use-them-df2e8ccb5f76    
[10]  Sven Hamberg, Why you shouldn’t use pie charts - Tips for better data visualization, https://blog.funnel.io/why-we-dont-use-pie-charts-and-some-tips-on-better-data-visualizations


# <a id="9">Known issues</a>  

Here I describe the known issues:

* Animation is not refreshing correctly the circles markers corresponding to categories that are not longer present after a certain step (ex: sports that were discontinued in the Olympics still appears in the next frames, not refreshed). The issue can be manually corrected, by refreshing the plot using the <font color="red">back to home</font>  button in Plotly controls menu.


<a href="#0"><font size="1" color="red">Go to top</font></a>