# **Chapter 1 - Introduction to Bokeh**

**Blocks vs. rebounds**

You are working for a sports media agency that produces data journalism content such as blogs, analyses, and visualizations, primarily focused on basketball. The nba dataset has been preloaded for you and contains per-game statistics for basketball players in the 2017 season as well as their team, conference, and a label for their scoring ability.

The agency has asked you to produce a scatter plot displaying the relationship between blocks and rebounds.

In [1]:
import pandas as pd

In [2]:
nba = pd.read_csv('nba.csv')

In [3]:
nba.head()

Unnamed: 0,player,position,minutes,field_goal_perc,three_point_perc,free_throw_perc,rebounds,assists,steals,blocks,points,team,conference,scorer_category
0,Russell Westbrook,PG,34.6,0.425,0.343,0.845,10.7,10.4,1.6,0.4,31.6,OKC,West,High Scorer
1,James Harden,PG,36.4,0.44,0.347,0.847,8.1,11.2,1.5,0.5,29.1,HOU,West,High Scorer
2,Isaiah Thomas,PG,33.8,0.463,0.379,0.909,2.7,5.9,0.9,0.2,28.9,BOS,East,High Scorer
3,Anthony Davis,C,36.1,0.505,0.299,0.802,11.8,2.1,1.3,2.2,28.0,NO,West,High Scorer
4,DeMar DeRozan,SG,35.4,0.467,0.266,0.842,5.2,3.9,1.1,0.2,27.3,TOR,East,High Scorer


In [6]:
nba.describe()

Unnamed: 0,minutes,field_goal_perc,three_point_perc,free_throw_perc,rebounds,assists,steals,blocks,points
count,424.0,424.0,394.0,419.0,424.0,424.0,424.0,424.0,424.0
mean,20.51533,0.446231,0.301878,0.741618,3.706604,1.93467,0.649057,0.404245,8.826179
std,8.997185,0.084629,0.128638,0.133155,2.480926,1.820529,0.410845,0.411514,6.157515
min,2.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2
25%,13.375,0.40175,0.27325,0.679,2.0,0.7,0.3,0.1,4.4
50%,20.05,0.444,0.3365,0.765,3.1,1.3,0.6,0.3,7.15
75%,27.825,0.487,0.375,0.8285,4.8,2.5,0.9,0.5,11.4
max,37.8,0.75,1.0,1.0,14.1,11.2,2.0,2.6,31.6


In [11]:
# Import required libraries
from bokeh.plotting import figure
from bokeh.io import output_file, show
from bokeh.io import output_notebook

# Enable viewing Bokeh plots in the notebook
output_notebook()

# Create a new figure
fig = figure(x_axis_label="Blocks per Game", y_axis_label="Rebounds per Game")

# Add circle glyphs
fig.circle(x=nba["blocks"], y=nba["rebounds"])

# Call function to produce html file and display plot
output_file(filename="my_first_plot.html")
show(fig)

**Kevin Durant's performance across seasons**

The agency is writing an article on Kevin Durant and the evolution of the Small Forward position in basketball.

They would like you to supply them with a line plot displaying Kevin Durant's points per game from his rookie year to the end of the 2016-17 season.

kevin_durant has been preloaded for you containing Kevin's basketball performance statistics in 2010-2017.

In [16]:
kevin_durant = pd.read_excel('kevin_durant.xlsx')
kevin_durant.head()

Unnamed: 0,season,points
0,2010,30.1
1,2011,27.7
2,2012,28.0
3,2013,28.1
4,2014,32.0


In [17]:
# Create figure
fig = figure(x_axis_label="Season", y_axis_label="Points")

# Add line glyphs
fig.line(x=kevin_durant["season"], y=kevin_durant["points"])

# Generate HTML file
output_file(filename="Kevin_Durant_performance.html")

# Display plot
show(fig)

**Shooting ability by position**

The agency is producing an article about how 3 point field goal shooting ability has changed across basketball positions over the years.

They have asked you to produce a bar plot representing 3 point field goal percentage by position for the 2017 NBA season, which will be included in their article.

pandas has been imported as pd and the nba dataset preloaded for you. figure has also been imported from bokeh.plotting, along with output_file and show from bokeh.io. These will all be preloaded for you throughout the remainder of the course.

In [18]:
# Calculate average three point field goal percentage by position
positions = nba.groupby("position", as_index=False)["three_point_perc"].mean()

# Instantiate figure
fig = figure(x_axis_label="Position", y_axis_label="3 Point Field Goal (%)", x_range=positions["position"])

# Add bars
fig.vbar(x=positions["position"], top=positions["three_point_perc"])

# Produce the html file and display the plot
output_file(filename="3p_fg_by_position.html")
show(fig)

**Setting tools**

The agency is now interested in understanding whether the most accurate shooters are more likely to average a higher amount of points per game. They would like to have the toolbar customized to include poly_select, wheel_zoom, reset, and save, so they can further investigate the plot themselves.

The nba DataFrame is again available to you.

In [19]:
# Create a list of tools
tools = ['poly_select', 'wheel_zoom', 'reset', 'save']

# Create figure and set tools
fig = figure(x_axis_label="Field Goal (%)", y_axis_label="Points per Game", tools=tools)

# Add circle glyphs
fig.circle(x=nba["field_goal_perc"], y=nba["points"])

# Generate HTML file and display plot
output_file(filename="points_vs_field_goal_perc.html")
show(fig)

**Adding LassoSelectTool**

The agency loved your last plot! However, they have realized that it would also be useful to have access to the LassoSelectTool.

The figure and glyphs from the previous exercise have been preloaded for you.

In [20]:
tools = ['poly_select', 'wheel_zoom', 'reset', 'save']

# Import LassoSelectTool
from bokeh.models import LassoSelectTool

fig = figure(x_axis_label="Field Goal (%)", y_axis_label="Points per Game", tools=tools)
fig.circle(x=nba["field_goal_perc"], y=nba["points"])

# Update the figure to include LassoSelectTool
fig.add_tools(LassoSelectTool())
output_file(filename="updated_plot_with_lasso_select.html")
show(fig)

**Adding a HoverTool**

The agency is producing a blog about basketball performance by position. They want to hone in on assists and steals, giving their viewers the ability to hover over glyphs to find out who a player is, what position they play, and which team they play for.

You will convert the nba dataset, which has been preloaded, into a Bokeh source object, then add tooltips to a plot.

In [21]:
# Import ColumnDataSource
from bokeh.models import ColumnDataSource

# Create source
source = ColumnDataSource(data=nba)

# Create TOOLTIPS and add to figure
TOOLTIPS = [("Name", "@player"), ("Position", "@position"), ("Team", "@team")]
fig = figure(x_axis_label="Assists", y_axis_label="Steals", tooltips=TOOLTIPS)

# Add circle glyphs
fig.circle(x="assists", y="steals", source=source)
output_file(filename="first_tooltips.html")
show(fig)

Formatting the HoverTool

Now you've built a plot with a HoverTool, it's time to go one step further by formatting how the information is presented.

The agency has asked for a plot of average minutes versus average points per game. They would like the HoverTool to identify the player's name, their conference, and their field goal percentage rounded to the nearest 2 decimal places.

In [22]:
# Create TOOLTIPS
TOOLTIPS = [("Name", "@player"),
            ("Conference", "@conference"),
            ("Field Goal %", "@field_goal_perc{0.2f}")]

# Add TOOLTIPS to figure
fig = figure(x_axis_label="Minutes", y_axis_label="Points", tooltips=TOOLTIPS)

# Add circle glyphs
fig.circle(x="minutes", y="points", source=source)
output_file(filename="formatted_hovertool.html")
show(fig)