# Interactive Visualizations of NBA Stats with Bokeh

In this tutorial we will learn how to make an interactive dashboard to visualize player data from the 2015-2016 NBA season. We will do this using data from basketball-reference.com and the python visualization library Bokeh. The point of this tutorial is to introduce you to Bokeh.

I have prepared the data from basketball-reference. It is data on every player in the NBA last season along with their season totals for all the basic NBA statisics (field goals made, field goals attempts, 3 pointers, rebounds, etc.). However, in order to follow this tutorial you will need to install Bokeh. It should actually already be installed if you downloaded Anaconda, but the version included is not the latest. You will need to upgrade to version 12.3. You can do so by entering the following in your shell:

sudo pip install bokeh --upgrade

After upgrading, you will need to shut down Jupyter completely and open this notebook again. Sorry in advance!

## The Basics

Before we get into anything too fancy, let's make some basic plots using our player data with Bokeh. First we will need to import Pandas and some basic Bokeh features. Bokeh.charts includes the functions necessary to make high-level charts. We will use the Scatter function to make a scatterplot. The show function works the same as matplot lib - it displays the plot.

In [1]:
import pandas as pd
from bokeh.charts import Scatter
from bokeh.io import show, output_notebook

output_notebook()

In [2]:
df = pd.read_csv("players.csv", header=0)

Let's start off by plotting each player's total number of field goals made by the total number of field goals taken.

In [3]:
p = Scatter(df, x='FG', y='FGA',
           title="Each Player's Number of Shots Made by Number of Shots Taken",
           xlabel="Number of FG Made", ylabel="Number of FG Attempted")

show(p)

If you are viewing this as a static page, you will not be able to see the plots without running the cells. This is an unforunate feature of Bokeh in Jupyter. For all plots I will paste a screenshot of the plot in the pdf in the folder. I will denote each plot with a number. The plot above is plot 1.

## Adding Basic Custom Interactivity

Ok that's pretty cool! But you may be wondering which dots correspond to which players. We can add functionality to this graph to figure that out. In order to do this, we will need to stray away from the charts module, which is only intended to make charts without any interactivity. We will instead use the plotting module which includes the figure class and ColumnDataSource. The figure class is a generalized class to take in the specifications of our chart. This will replace "Scatter" from before. ColumnDataSource allows us to generalize the variables used in our plots. It essentially tells Bokeh what data we are using for our plot. In this case, it is the players dataframe.

We also import the tools we want to use in this plot. Bokeh.charts did that automatically for us before, but now with bokeh.plotting we need to specify these. If you want to know more about what each tool does, check out this link: http://bokeh.pydata.org/en/latest/docs/user_guide/tools.html#basic-tooltips

In [4]:
from bokeh.plotting import figure, show, ColumnDataSource
from bokeh.models import HoverTool, PanTool, BoxZoomTool, WheelZoomTool, ResetTool

source = ColumnDataSource(df)

Here we specify the parameters of our tools. All tools in Bokeh are pretty simple. The most involved is the hover tool, which isn't that much work. All we do here is specify the information shown when the user hovers over a point on the plot. The first argument in each tuple is the label that will appear. The second argument is the actual data from the source. The '@' tells Bokeh to reference our source and look for the given column name. Make note that the values supplied in the second argument must exactly match the name of a column in the data source.

In [5]:
hover = HoverTool(
        tooltips=[
            ("FG", "@FG"),
            ("FGA", "@FGA"),
            ("Player", "@Player"),
            ("Team", "@Tm"),
        ]
    )

pan = PanTool()
boxzoom = BoxZoomTool()
wheelzoom = WheelZoomTool()
reset = ResetTool()

Here we use the figure class to actually start building our plot. We give the tools that we specified in the previous cell. Notice in "p.circle" that we are able to simply give 'FG' and 'FGA' as arguments (columns from our data source) without including anything about df, the dataframe where our data is coming from. That is because we set the source parameter to our ColumnDataSource(df).

In [6]:
p = figure(tools=[hover, pan, boxzoom, wheelzoom, reset],
           title="Each Player's Number of Shots Made by Number of Shots Taken")

p.xaxis.axis_label = "FG Made"
p.yaxis.axis_label = "FG Attempted"

p.circle('FG', 'FGA', source=source)

show(p)

The above is plot 2 in the pdf.

## Coloring Based on Other Variables

Now we can hover over points to see which players made and attempted how many shots. We can also see what team they are on. We can use the various tools to zoom in on more crowded parts of the plot to look at individual player stats.

I think another cool feature to add would be to see what types of players make the most field goals. We can split up players by their position and see how many field goals they make/attempt. We will do this by coloring each circle on our plot by a position. To do this, we first make a new column in our dataframe that assigns each position a particular color:

In [7]:
colors = []

for idx, val in df["Pos"].iteritems():
    if val == 'PF':
        colors.append("red")
    if val == 'SG':
        colors.append("green")
    if val == 'C':
        colors.append("blue")
    if val == 'SF':
        colors.append("orange")
    if val == 'PG':
        colors.append("brown")
    if val == 'PF-C':
        colors.append("purple")
    if val == 'SG-SF':
        colors.append("yellow")
        
df["pos_colors"] = colors

Since pos_colors is part of the data frame, after calling ColumnDataSource, it is also part of our plot's data source. So, when we call p.circle, all we have to do is set the color parameter to the name of the column in the data source, which is "pos_color" in this case. You may be wondering why we repeat defining source, the tools, and p. Bokeh simply requires this for each plot.

In [8]:
source = ColumnDataSource(df)

hover = HoverTool(
        tooltips=[
            ("FG", "@FG"),
            ("FGA", "@FGA"),
            ("Player", "@Player"),
            ("Team", "@Tm"),
            ("Position", "@Pos"),
        ]
    )

pan = PanTool()
boxzoom = BoxZoomTool()
wheelzoom = WheelZoomTool()
reset = ResetTool()

p = figure(tools=[hover, pan, boxzoom, wheelzoom, reset],
           title="Each Player's Number of Shots Made by Number of Shots Taken")

p.xaxis.axis_label = "FG Made"
p.yaxis.axis_label = "FG Attempted"


p.circle('FG', 'FGA', color="pos_colors", source=source)

show(p)

The above is plot 3 in the pdf.

A great part of data visualization is that we can see patterns that are not easily discernible just by looking at the numbers. Take a look at the blue dots in this plot, which correspond to the center position (denoted by 'C'). If you imagine a fitted linear regression through the blue dots, it seems it would have a lower slope than a regression line through all the other dots. This means that NBA centers make more shots on less attempts than other types of players. This makes sense because centers usually take shots much closer to the basket (AKA easier shots) than other types of players.

Now we will move onto making a simple dashboard. This dashboard will allow users to compare multiple columns against each other from our original data. Basically, we will allow users to select which numerical stats they would like to compare on the scatterplot.

## A Simple Dashboard

In this simple dashboard, we will just allow our users to select number of field goals made or number of 3-pointers made on the x-axis, and the number of field goals attempted or the number of 3-pointers attempted on the y-axis.

First we create new columns x and y and we assign the FG and FGA columns of the dataframe to them respectively. FG and FGA will be our default x and y. As you will see, the x and y columns will be our generalized columns, and will update based on user inputs. Last, we set our source with ColumnDataSource of our new dataframe.

In [9]:
df['x'] = df['FG']
df['y'] = df['FGA']

source = ColumnDataSource(df)

Here we define our basic interactions as we did in our earlier plots. The only difference is in defining tooltips within HoverTool. We name our scatterplot x and y simply "X" and "Y". This is because they will represent whatever x and y the user chooses. The values for x and y will be referenced from whatever is in the x an y columns of the dataframe since we use the '@' notation. When we pass hover to the figure class and make our plot, tooltips will connect to the source that we defined.

In [10]:
hover = HoverTool(
        tooltips=[
            ("X", "@x"),
            ("Y", "@y"),
            ("Player", "@Player"),
            ("Team", "@Tm"),
            ("Position", "@Pos"),
        ]
    )

pan = PanTool()
boxzoom = BoxZoomTool()
wheelzoom = WheelZoomTool()
reset = ResetTool()

Here we define our plot mostly the same as we did before. The main difference here is that we initalize r as an instance of p.circle. In Bokeh, r is called a glyph instance. This will be important for our next step.

In [11]:
p = figure(tools=[hover, pan, boxzoom, wheelzoom, reset])
p.xaxis.axis_label = "X"
p.yaxis.axis_label = "Y"

r = p.circle('x', 'y', source=source)

This update function will be called every time a user interacts with our plot. The function takes in the name of the columns we want to display on the x and y axes. It then tells our glyph instance r to change the columns x and y of our data source to mimic the columns the user chooses. The call to push_notebook is what tells Bokeh to push our changes to the plot displayed in the notebook.

In [12]:
from bokeh.io import push_notebook

def update(xax, yax):
    r.data_source.data['x'] = list(df[xax])
    r.data_source.data['y'] = list(df[yax])
    push_notebook()

We finally show our plot. Note the notebook_handle set to True. This tells Bokeh we are making a notebook-friendly custom interaction.

In [13]:
show(p, notebook_handle=True)

The above is plot 4 in the pdf.

Now let's make the actual interaction capability. The interact function from ipywidgets allows us to make interactions within a Jupyter notebook. It takes in the update function, which it calls every time the plot is interacted with. The second and third arguments are lists of what columns the user can access for each axis. Note that the argument names in the interact function must exactly match the argument names in the update function. The interact function can distinguish between categorical and numerical variables. If we gave it a number as an argument along with start and end values, it would create a slider to select numbers instead of making dropdowns like it does in this case. Go ahead and play with the differnet inputs and notice how the graph changes.

In [14]:
from ipywidgets import interact

interact(update, xax=["FG", "3P"], yax=["FGA", "3PA"])

Now we have a basic interactive chart where we can choose our x and y axis variables. Let's add some more variable options for the axes and the ability to color dots by some different variables. This will be our final dashboard that we have been building up to.

## The Final Dashboard

If we want to be able to switch between colors based on user input, we will need to create a generalized "color" column in our dataframe. This is analgous to when we created x and y columns in the dataframe so the user can define what the current x and y are. We will set the default to the pos_color we made earlier. We also set our x and y to the default values of FG and FGA.

In [15]:
df["color"] = df["pos_colors"]

df['x'] = df['FG']
df['y'] = df['FGA']

Let's make our plot so that the user can choose to color by each position, or color all the positions like pos_colors does. To do this, we will make new columns with colors for each position. For example, we will have a PG column that is brown for the points representing players that are point guards and gray for all other points.

In [16]:
colors = []

for idx, val in df["Pos"].iteritems():
    if val == 'PF':
        colors.append("red")
    else:
        colors.append("gray")

df["PF"] = colors

colors = []

for idx, val in df["Pos"].iteritems():
    if val == 'SG':
        colors.append("green")
    else:
        colors.append("gray")
        
df["SG"] = colors

colors = []
        
for idx, val in df["Pos"].iteritems():
    if val == 'C':
        colors.append("blue")
    else:
        colors.append("gray")
        
df["C"] = colors

colors = []

for idx, val in df["Pos"].iteritems():
    if val == 'SF':
        colors.append("orange")
    else:
        colors.append("gray")

df["SF"] = colors

colors = []

for idx, val in df["Pos"].iteritems():
    if val == 'PG':
        colors.append("brown")
    else:
        colors.append("gray")
        
df["PG"] = colors

colors = []

for idx, val in df["Pos"].iteritems():
    if val == 'PF-C':
        colors.append("purple")
    else:
        colors.append("gray")

df["PF-C"] = colors

colors = []

for idx, val in df["Pos"].iteritems():
    if val == 'SG-SF':
        colors.append("yellow")
    else:
        colors.append("gray")

df["SG-SF"] = colors

As before we set our source and define our tools:

In [17]:
source = ColumnDataSource(df)

hover = HoverTool(
        tooltips=[
            ("X", "@x"),
            ("Y", "@y"),
            ("Player", "@Player"),
            ("Team", "@Tm"),
            ("Position", "@Pos"),
        ]
    )

pan = PanTool()
boxzoom = BoxZoomTool()
wheelzoom = WheelZoomTool()
reset = ResetTool()

We configure the plot the same as before, but this time we add the color argument and set it to the color column of our data source. Remember, as the user changes which position they would like to view, the color column is changed to copy whichever column in the data source corresponds to the user's choice.

In [18]:
p = figure(tools=[hover, pan, boxzoom, wheelzoom, reset])
p.xaxis.axis_label = "X"
p.yaxis.axis_label = "Y"

r = p.circle('x', 'y', color="color", source=source)

The update function is mostly the same as before, but now we add the ability to update the color based on user input. As the user changes which position they would like to view, the color column is changed to copy whichever column in the data source corresponds to the user's choice. We then show our plot, setting notebook_handle to True as we did earlier.

In [19]:
def update(xax, yax, position):
    r.data_source.data['x'] = list(df[xax])
    r.data_source.data['y'] = list(df[yax])
    r.data_source.data['color'] = list(df[position])
    push_notebook()
    
show(p, notebook_handle=True)

The above is plot 5.

Now define the interactions. This time we will dump in all the basic stats as options for both x and y. We also now add in our coloring by position options. Note that the first value in each list is the default option given in the plot.

If you are not familiar with basketball statistics nomenclature, take a look at the basketball-reference's glossary: http://www.basketball-reference.com/about/glossary.html

In [20]:
interact(update, 
         xax=["FG","2P","3P","2P%","3P%","FGA","2PA","3PA","ORB","DRB","TRB","AST","STL","BLK","TOV","PTS"], 
         yax=["FGA","2P","3P","2P%","3P%","FG","2PA","3PA","ORB","DRB","TRB","AST","STL","BLK","TOV","PTS"],
         position=["pos_colors", "PG", "SG", "SF", "PF", "C", "PF-C", "SG-SF"])

## Wrapping Up

That concludes the material for this tutorial. We have created a dashboard that allows users to compare players by position played and by various other basic statistics. This allows users to visualize relationships that are not always obvious. This kind of dashboard is especially essential in business settings, when there are often non-technical people who need to understand the implications of the data they have.

Thank you for reading this tutorial and I hope you have gained something from it!