# Bokeh Demonstration - Analyzing the Dominance of The Men's Tennis "Big Three"

## Visualization Library

The visualization library I will be using for this demonstration will be Bokeh. 
<br><br>
Bokeh uses an underlying JavaScript articture to allow users to make interactive plots using Python. It is a quick and easy user friendly way for developers to add interactive plots into various applications. 
<br><br>
I thought bokeh would be a fun and interesting library to use because it has much of the basic functionality as matplotlib in that it is able to produce the same types of charts, however it has certain interactive features that make the charts a bit more user friendly. For example, there is a toolbar on the side of a bokeh plot, which allows a user to move the plot around or zoom in or out. In addition, you can add in widgets and other interactive features, like tabs to switch between plots, or a drop down menu. 
<br><br>
Bokeh also integrates well with Jupyter, it just requires an import in order to view bokeh plots in the Jupyter environment. 
<br><br>
To install bokeh:
<br>
Pip Installation:  `pip install bokeh`
<br>
Conda Installation:  `conda install bokeh`




## Demonstration

For this demonstration, I thought it would be fun to look at data from my favorite sport, tennis. I have been playing and following tennis since I was 8 years old. The era of men's tennis that I grew up watching has come to be known as the era of the Roger Federer, Rafael Nadal, and Novak Djokovic, also referred to as the "Big Three". For nearly 20 years they have dominated the sport, breaking records and firmly cementing themselves as the three greatest players ever to play men's tennis. 
<br><br>
I thought it would be interesting to use bokeh visualize the Big Three's statistics  from the four Grand Slam tournaments: The Australian Open, French Open, Wimbledon, and the U.S. Open. 

In [1]:
import pandas as pd
import numpy as np
from bokeh.plotting import figure, show
from bokeh.io import output_notebook             #Import this to show bokeh visualizations in Jupyter Notebook

In [2]:
output_notebook()                                #Run this function to show bokeh visualizations in Jupyter Notebook

The source of the data set I will be using is: https://www.kaggle.com/datasets/wonduk/mens-tennis-grand-slam-winner-dataset

In [3]:
df = pd.read_csv('Mens_Tennis_Grand_Slam_Winner.csv', header = 0)

To start, let's visualize just how dominant the Big Three have been compared to their competition. To do this, I will begin by making a bar chart showing the total number of Grand Slam wins for the Big Three, as well as for Andy Murray and Stan Wawrinka, the only two players able to win multiple Grand Slams during the Big Three era.

In [4]:

players = ['Roger Federer', 'Rafael Nadal', 'Novak Djokovic', 'Andy Murray', 'Stan Wawrinka']
nadal_wins = df[df['WINNER'] =='Rafael Nadal']     

federer_wins = df[df['WINNER'] =='Roger Federer']  

djokovic_wins = df[df['WINNER'] =='Novak Djokovic'] 

murray_wins = df[df['WINNER'] =='Andy Murray']  

wawrinka_wins = df[df['WINNER'] =='Stan Wawrinka']    

big_five = pd.DataFrame({'Player': players, 'Grand Slam Wins': [len(federer_wins), len(nadal_wins), len(djokovic_wins),
                                                                len(murray_wins), len(wawrinka_wins)]})

p = figure(x_range = players, height = 400, title = 'Top 5 Grand Slam Winners 2003-2022')
p.vbar(x = players, top = big_five['Grand Slam Wins'], width = 0.9)

show(p)


As we can see, the Big Three have signficantly more Grand Slam titles than Murray and Wawrinka.
This bar chart allows us to visualize just how far from the pack the "Big Three" has separated themselves in terms
of grand slam titles, and gives us a nice visual depiction of this dominance.
<br><br>
Now lets take a deeper look into the specific data for Grand Slam wins for Federer, Nadal and Djokovic. I will do this by making a bar chart showing the number of Grand Slams won by each player every year from 2003-2022. I will also add tabs to this bar chart using the tab widget feature of bokeh, so that I
can switch back and forth between the three bar charts.

In [5]:
#First let's filter the data and group by the players we are interested in and how many wins they have per year
df_big_three = df[(df['WINNER'] == 'Rafael Nadal') | (df['WINNER'] == 'Roger Federer') | 
                  (df['WINNER'] == 'Novak Djokovic')]
grouped = pd.DataFrame(df_big_three.groupby(['YEAR', 'WINNER']).TOURNAMENT.count()).reset_index()

In [6]:

from bokeh.models.layouts import TabPanel, Tabs            #IMPORT THIS FOR PANELS/TABS

years = [2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022]
for year in years:
    if year not in grouped[grouped['WINNER'] == 'Roger Federer']['YEAR'].unique():
        grouped = pd.concat([pd.DataFrame({'YEAR': [year], 'WINNER': ['Roger Federer'], 'TOURNAMENT': [0]}), 
                              grouped], ignore_index = True)
    elif year not in grouped[grouped['WINNER'] == 'Rafael Nadal']['YEAR'].unique():
        grouped = pd.concat([pd.DataFrame({'YEAR': [year], 'WINNER': ['Rafael Nadal'], 'TOURNAMENT': [0]}), 
                              grouped], ignore_index = True)
    elif year not in grouped[grouped['WINNER'] == 'Novak Djokovic']['YEAR'].unique():
        grouped = pd.concat([pd.DataFrame({'YEAR': [year], 'WINNER': ['Novak Djokovic'], 'TOURNAMENT': [0]}), 
                              grouped], ignore_index = True)
    else:
        pass

grouped= grouped.sort_values(by = ['WINNER', 'YEAR'], ascending = [False, True])

fed_data = grouped[grouped['WINNER'] == 'Roger Federer'].copy()
nadal_data = grouped[grouped['WINNER'] == 'Rafael Nadal'].copy()
djoko_data = grouped[grouped['WINNER'] == 'Novak Djokovic'].copy()

fed_data['YEAR'] = fed_data['YEAR'].astype(str)
nadal_data['YEAR'] = nadal_data['YEAR'].astype(str)
djoko_data['YEAR'] = djoko_data['YEAR'].astype(str)

p = figure(x_range = fed_data['YEAR'], width = 800, height = 400, title = 'Big Three Grand Slam Wins')
p.vbar(fed_data['YEAR'], top = fed_data['TOURNAMENT'], width = 0.9)
tab1 = TabPanel(child = p, title = 'Roger Federer')

p2 = figure(x_range = nadal_data['YEAR'], width = 800, height = 400, title = 'Big Three Grand Slam Wins')
p2.vbar(nadal_data['YEAR'], top = nadal_data['TOURNAMENT'], width = 0.9)
tab2 = TabPanel(child = p2, title = 'Rafael Nadal')

p3 = figure(x_range = djoko_data['YEAR'], width = 800, height = 400, title = 'Big Three Grand Slam Wins')
p3.vbar(djoko_data['YEAR'], top = djoko_data['TOURNAMENT'], width = 0.9)
tab3 = TabPanel(child = p3, title = 'Novak Djokovic')

tabs = Tabs(tabs = [tab1, tab2, tab3])


show(tabs)


If you click on each player's tab, you can switch to their specific bar chart.
<br><br>
As we click through the tabs between the different player's bar charts, notice that it seems like the majority of Federer's Grand Slam wins came between 2003-2010, whereas Nadal and Djokovic have had more of their wins from 2010 onward. When fans debate who among the Big Three is truly the greatest ever, they point to Grand Slam titles. However, since Federer's prime was so much earlier than Djokovic and Nadal, it is hard to truly determine who was the best player in their prime.
<br><br>
To try to come up with the best comparison we can let's compare their wins from 2008-2012, a time period where all three players were playing at a high level. To do this I will make a grouped bar chart using bokeh, so that we can see all three player's statistics in one graph, allowing for easy comparisons

In [7]:

from bokeh.models import ColumnDataSource, FactorRange
from bokeh.plotting import figure, show

#GETTING JUST THE DATA FROM 2008-2012

fed_data = grouped[grouped['WINNER'] == 'Roger Federer']
nadal_data = grouped[grouped['WINNER'] == 'Rafael Nadal']
djoko_data = grouped[grouped['WINNER'] == 'Novak Djokovic']

years = [2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022]
for year in years:
    if year not in fed_data['YEAR'].unique():
        grouped = pd.concat([pd.DataFrame({'YEAR': [year], 'WINNER': ['Roger Federer'], 'TOURNAMENT': [0]}), 
                              grouped], ignore_index = True)
    elif year not in nadal_data['YEAR'].unique():
        grouped = pd.concat([pd.DataFrame({'YEAR': [year], 'WINNER': ['Rafael Nadal'], 'TOURNAMENT': [0]}), 
                              grouped], ignore_index = True)
    elif year not in djoko_data['YEAR'].unique():
        grouped = pd.concat([pd.DataFrame({'YEAR': [year], 'WINNER': ['Novak Djokovic'], 'TOURNAMENT': [0]}), 
                              grouped], ignore_index = True)
    else:
        pass
     
    
grouped= grouped.sort_values(by = ['WINNER', 'YEAR'], ascending = [False, True])

data_2008_2012 = grouped[(grouped['YEAR'] == 2008)|(grouped['YEAR'] == 2009)|
                         (grouped['YEAR'] == 2010)|(grouped['YEAR'] == 2011)|
                         (grouped['YEAR'] == 2012)]


#MAKING THE CHART
x = [(player, str(year)) for player in list(data_2008_2012['WINNER'].unique())
     for year in list(data_2008_2012['YEAR'].unique())]

counts = data_2008_2012['TOURNAMENT']
source = ColumnDataSource(data = dict(x = x, counts = counts))  


p = figure(x_range = FactorRange(*x),width = 800, height = 400, title = 'Grand Slam Wins for the Big Three 2008-2012')
p.vbar(x = 'x', top = 'counts', width = 0.9, source = source)


show(p)


Based on the above chart, it looks like all three had success during this time period, however while Federer and Djokovic had years where they didn't win any Grand Slams, only Nadal consistently won at least one Grand Slam every year from 2008-2012, when all the players were playing at a high level.
<br><br>
Finally, since we are talking about Grand Slam tournaments, it might be interesting to see how the prize money given to the winner of each tournament has grown over the years. To do this I will use a line chart, because it will be helpful to allow us to see the trends over many years. Also I will color code the line for each tournament, and implement the bokeh function to be able to hide and show lines on the graph by clicking on the legend.

In [8]:

aussie = df[df['TOURNAMENT'] == 'Australian Open'].reset_index(drop = True)
aussie = aussie[['YEAR', 'WINNER_PRIZE']].dropna()

french = df[df['TOURNAMENT'] == 'French Open'].reset_index(drop = True)
french = french[['YEAR', 'WINNER_PRIZE']].dropna()

wimby = df[df['TOURNAMENT'] == 'Wimbledon'].reset_index(drop = True)
wimby = wimby[['YEAR', 'WINNER_PRIZE']].dropna()

uso = df[df['TOURNAMENT'] == 'U.S. Open'].reset_index(drop = True)
uso = uso[['YEAR', 'WINNER_PRIZE']].dropna()

p = figure(width = 800, height = 300, title = 'Grand Slam Prize Money')
p.line(aussie['YEAR'], aussie['WINNER_PRIZE'], line_width=2, color = 'blue', alpha = 0.8, 
       legend_label = 'Australian Open')
p.line(french['YEAR'], french['WINNER_PRIZE'], line_width=2, color = 'orange', alpha = 0.8,
       legend_label = 'French Open')
p.line(wimby['YEAR'], wimby['WINNER_PRIZE'], line_width=2, color = 'green', alpha = 0.8, 
       legend_label = 'Wimbledon')
p.line(uso['YEAR'], uso['WINNER_PRIZE'], line_width=2, color = 'red', alpha = 0.8,
       legend_label = 'U.S. Open')

p.legend.location = 'top_left'
p.legend.click_policy="hide"



show(p)

If you click on each of the tournament names on the legend, you will be able to hide the line for that tournament, which is a useful feature if you only want to look at one particular tournament's data, or compare two.
<br><br>
We can see that over the years the prize money has steadily been increasing. However, interestingly we see a sudden
drop in prize money for the French Open around 2001. The reason for this is that the French currency changed from
the Franc to the Euro, so the valuation of the prize money changed as a result.

## References

https://blog.logrocket.com/python-data-visualization-bokeh-jupyter-notebook/ 
<br>
https://www.geeksforgeeks.org/python-bokeh-plotting-a-scatter-plot-on-a-graph/
<br>
https://bokeh.org/roadmap/
<br>
https://bokeh.org/
<br>
https://www.analyticsvidhya.com/blog/2021/05/gentle-introduction-to-bokeh-interactive-python-plotting-library/
<br>
https://docs.bokeh.org/en/latest/docs/first_steps/installation.html
<br>
https://docs.bokeh.org/en/latest/docs/examples/basic/bars/nested.html
<br>
https://towardsdatascience.com/interactive-bar-charts-with-bokeh-7230e5653ba3
<br>
https://docs.bokeh.org/en/latest/docs/user_guide/styling.html#visible-property
<br>
https://docs.bokeh.org/en/1.0.4/docs/reference/models/widgets.panels.html
<br>
https://docs.bokeh.org/en/latest/docs/examples/basic/bars/basic.html
<br>
https://docs.bokeh.org/en/latest/docs/examples/interaction/legends/legend_hide.html
