# Bokeh Library Tutorial

# What is Bokeh ?

Bokeh is a Data Visualization library for interactive visualization in modern browsers standalone HTML documents, or server-backed apps. Although there are other data visualization libaries, what is most fascinating about the bokeh library is that it lets the user interact with the data making it go beyond just a static plot of data points as with matplotlib.

Through this tutorial, not only do I aim to teach how to use Bokeh but also give the learner an idea of how useful interacting with data is so for future use the learner is motivated to use Bokeh for Data Visualization and Interaction. We will start by learning the different tools that Bokeh offers for data visualization and interaction. The list of tools is not exhaustive but I'll cover the more important ones. Lastly, we will look at using all the tools that we have learned to create a very practical example to visualize data. 

Let's get started by loading bokeh and its libraries


In [2]:
from bokeh.io import output_notebook, show
import numpy as np
from bokeh.plotting import *
from bokeh.plotting import figure
from bokeh.charts import Histogram
from bokeh.layouts import gridplot
from bokeh.sampledata.iris import flowers as data
from bokeh.models import (
    ColumnDataSource, Plot, Circle, Range1d,
    LinearAxis, HoverTool, Text, HoverTool,
    SingleIntervalTicker, CustomJS
)
output_notebook()

## Table of Contents

1. [Widgets](#Interactivity)
2. [Hover Tool](#Hover-Tool)
3. [Box and Lasso Select](#Box-and-Lasso-Select)
4. [Linking Plots](#Linking-Plots)
5. [A Practical Example](#Combining-Features-:-A-Practical-Example)


# Interactivity

In the following code blocks we will look at some of the tools that bokeh offers which lets us interact with the plots.

# Widgets

Bokeh offers a number of widgets through which the user can control which data points he wants to see. Essentially, instead of just throwing a lot of data points on a static plot the user can now interact with the plot and observe the features of the graph individually by varying the parameters. Some of the important widgets that Bokeh provides are:-

* Checkbox Buttons
* Data Tables
* Dropdown Menu
* MultiSelect
* Radio Buttons
* Select
* Slider
* Tab Panes
* TextInput
* Toggle Button.

We are specifically going to look at sliders, buttons, select and a dropdown menu. Feel free to move the slider and use the dropdown menu. 

In [2]:
from bokeh.io import output_file, show
from bokeh.layouts import widgetbox
from bokeh.models.widgets import Button, RadioButtonGroup, Select, Slider

output_file("layout_widgets.html")

slider = Slider(start=0, end=10, value=1, step=.1, title="Slider")
button_group = RadioButtonGroup(labels=["Sunny", "Rainy", "Snowy"], active=0)
select = Select(title="Option:", value="Weather", options=["Sunny", "Rainy", "Snowy"])
button_1 = Button(label="Weather Characterization")

# put the results in a row
show(widgetbox(button_1, slider, button_group, select, width=500))

In [4]:
from bokeh.layouts import row
output_file("rad_slider.html")
x = [x*0.005 for x in range(0, 200)]
#x = np.linspace(0,4*np.pi,200)
y = [10**a for a in x]

source = ColumnDataSource(data=dict(x=x, y=y))

plot = figure(title="Slider Example", toolbar_location="above",plot_width=400, plot_height=400)
plot.line('x', 'y', source=source, line_width=3, line_alpha=0.6)

callback = CustomJS(args=dict(source=source), code="""
        var data = source.get('data');
        var f = cb_obj.get('value')
        x = data['x']
        y = data['y']
        for (i = 0; i < x.length; i++) {
            y[i] = 10**(f*x[i])
        }
        source.trigger('change');
    """)

slider = Slider(start=0.1, end=1, value=1, step=.01, title="Value of x", callback=callback)
layout = row(slider, plot)
show(layout)

Going one step further, just to give you an idea of the extent to which Sliders can prove useful I am providing a demo from the bokeh plots. The user can now completely interact with the plot which was our aim of using Bokeh.

In [5]:
from IPython.display import IFrame
IFrame('http://demo.bokehplots.com/apps/sliders', width=1000, height=500)

# Hover Tool
The hover tool is a passive inspector tool. It is generally on at all times, but can be configured in the inspector’s menu associated with the toolbar.

The hover tool displays informational tooltips whenever the cursor is directly over a glyph. The data to show comes from the glyph’s data source, and what is to be displayed is configurable through a tooltips attribute that maps display names to columns in the data source, or to special known variables. Field names starting with “@” are interpreted as columns on the data source. Field names starting with `$` are special, known fields, e.g. `$`x will display the x-coordinate under the current mouse position. More information about those fields can be found in the HoverTool reference. [Information about the hovertool has been taken from the Bokeh documentation]

In the code block below, we build a scatter plot. The first figure that we draw is a basic scatter plot. However, notice that this plot itself has so many features like Box Zoom, Wheel Zoom, Reset and Pan unlike other libraries like matplotlib. Now we implement the hover tool and plot the next figure on the same set of data. You can now hover over the circles in the next plot and see the values.

In [6]:
p = figure(plot_width=800, plot_height=500)
p.background_fill_color = "white"
x = np.linspace(0,10*np.pi,100)
y = np.cos(x) + np.random.randn(100)
z = 0.02*(x-1)
p.circle(x,y,radius=z,line_color="black", fill_color="navy", fill_alpha=0.5)
p.xaxis.axis_label = "Value 1"
p.xaxis.axis_label_text_font_size = "20pt"
p.yaxis.axis_label = "Value 2"
p.yaxis.axis_label_text_font_size = "20pt"
show(p) # show the results
source = ColumnDataSource(
        data=dict(
            x = np.linspace(0,10*np.pi,100),
            y = np.cos(x) + np.random.randn(100)
        )
    )

hover = HoverTool(
        tooltips=[
            ("index", "$index"),
            ("(x,y)", "($x, $y)"),
        ]
    )

p = figure(plot_width=800, plot_height=500,tools=[hover], title="Hover over the dots",)
p.xaxis.axis_label = "Value 1"
p.xaxis.axis_label_text_font_size = "20pt"
p.yaxis.axis_label = "Value 2"
p.yaxis.axis_label_text_font_size = "20pt"
p.background_fill_color = "beige"
p.circle(x,y,source=source,radius=z,line_color="black", fill_color="navy", fill_alpha=0.5,)
show(p)

# Box and Lasso Select

This tool helps the user to select a particular part of the graph and focus on it while blurring the remaining part. This has multiple use cases. When selecting a part of the plot, the user is essentially selecting the data points he wants to focus on. This can then be used to show the mean/median or standard deviation of the selected set of points. The functionality essentially becomes useful when the user is dealing with a plot which has large amounts of data. 

I have plotted below the closing values of stocks of Apple and Google. Note that the data is for 10 years (2006-2016). While this gives a high level understanding of how both companies have performed its important to select a particular time frame and compare both of them. Try using the box/lasso select feature to concentrate only on the data points you want to focus on. Further you can use the wheel zoon feature to obtain more granularity in the visualization process. Notice how we have used the basic features to build interactivity in the financial charts.

Note: To make a multiple selection, press the SHIFT key. To clear the selection, press the ESC key.
 
## Linking Plots

It’s often essential to link plots for added interactivity between plots. Here we have tried to link the Google and Apple Charts. This lets the user select data points using the box select on one graph and see the corresponding data points in the other graph. This is often useful when comparing 2 plots. The particular feature being implemented below is Linked Brushing. Another feature which bokeh offers is Linked Panning.

In [3]:
from bokeh.layouts import gridplot
import pandas as pd

AAPL = pd.read_csv(
    "http://ichart.yahoo.com/table.csv?s=AAPL&a=0&b=1&c=2000",
    parse_dates=['Date'])
GOOG = pd.read_csv(
    "http://ichart.yahoo.com/table.csv?s=GOOG&a=0&b=1&c=2000",
    parse_dates=['Date'])


x=GOOG['Date']
y0=AAPL['Adj Close']
y1=GOOG['Adj Close']


source = ColumnDataSource(data=dict(x=x, y0=y0, y1=y1))

TOOLS = "box_select,lasso_select,help,wheel_zoom,pan"

a=figure(title="Apple", toolbar_location="above",width=1000,height=300,tools=TOOLS,x_axis_type="datetime")
a.circle('x',
       'y0',
       source=source,
       color='black',
       legend='AAPL',
        size=2)
b=figure(title="Google", toolbar_location="above",width=1000, height=300,tools=TOOLS,x_axis_type="datetime")
b.circle('x',
       'y1',
       source=source,
       color='green',
       legend='GOOG',
        size=2)
p = gridplot([[a],[b]])
show(p)

# Combining Features : A Practical Example
## Air and Water Quality of Pennsylvania

In addition to the features that we have learned above we will implement a few features and see how ultimately interactivity can be used. This example will illustrate how important interactivity is as compared to static plots. We look to plot the air and water quality of Pennsylvania county wise. We use the following tools to do this:

* Column Data Source
* Hover Tool
* Log Color Mapper
* Patches in Figures

We use different palettes for air and water index plots. This is in turn used by the Log Color Mapper to create patches based on the air and water index values.

I would like to highlight two things here:

* The colour coding itself gives an overview of how the air and water indexes change betwee counties in state
* For a user who is interested in the specifics of the state can hover over the counties now and see the values of the air and water index to get an understanding of how good or bad the air or water index is.

Note that this is just a starting point and with other features we could actually highlight only counties which are above over a threshold air/water index. This value can be given by the user using a Slider and we could callback on this and update the plots. By now, I hope you would have understood how using Bokeh one can actually create an entire dashboard to interact with the plots.

In [1]:
#Run this cell first by uncommenting below lines
#import bokeh
#bokeh.sampledata.download()
#Now run the next cell

In [4]:
import bokeh
from bokeh.io import show
from bokeh.models import (
    ColumnDataSource,
    HoverTool,
    LogColorMapper
)
from bokeh.palettes import RdYlGn10 as palette
from bokeh.palettes import Blues9 as palette2
from bokeh.plotting import figure

from bokeh.sampledata.us_counties import data as counties
from bokeh.sampledata.unemployment import data as unemployment
import pandas as pd
data=pd.read_csv("pa_environment.csv")

palette.reverse()
palette2.reverse()

counties = {
    code: county for code, county in counties.items() if county["state"] == "pa"
}

county_xs = [county["lons"] for county in counties.values()]
county_ys = [county["lats"] for county in counties.values()]

county_names = [county['name'] for county in counties.values()]

county_rates=[]
county_water=[]
for county_id in counties:
    id_converted=county_id[0]*1000+county_id[1]
    county_rates.append(float(data.loc[data['County_id']==id_converted]['air']))
    county_water.append(-float(data.loc[data['County_id']==id_converted]['water']))

color_mapper = LogColorMapper(palette=palette)
color_mapper2 = LogColorMapper(palette=palette2)

source = ColumnDataSource(data=dict(
    x=county_xs,
    y=county_ys,
    name=county_names,
    rate=county_rates
))

source2 = ColumnDataSource(data=dict(
    x=county_xs,
    y=county_ys,
    name=county_names,
    water=county_water
))

TOOLS = "pan,wheel_zoom,box_zoom,reset,hover,save"

p = figure(
    title="Pennsylvania Air Index, 2009", tools=TOOLS,width=500,height=500,
    x_axis_location=None, y_axis_location=None
)

q = figure(
    title="Pennsylvania Water Index, 2009", tools=TOOLS,width=500,height=500,
    x_axis_location=None, y_axis_location=None
)


p.grid.grid_line_color = None
q.grid.grid_line_color = None

p.patches('x', 'y', source=source,
          fill_color={'field': 'rate', 'transform': color_mapper},
          fill_alpha=0.7, line_color="black", line_width=1.0)
q.patches('x', 'y', source=source2,
          fill_color={'field': 'water', 'transform': color_mapper2},
          fill_alpha=0.7, line_color="black", line_width=1.0)

hover = p.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = [
    ("Name", "@name"),
    ("Air Index", "@rate"),
    ("(Long, Lat)", "($x, $y)"),
]

hover2 = q.select_one(HoverTool)
hover2.point_policy = "follow_mouse"
hover2.tooltips = [
    ("Name", "@name"),
    ("Water Index", "@water"),
    ("(Long, Lat)", "($x, $y)"),
]
plot = gridplot([[p,q]])
show(plot)

## References:
[1] http://bokeh.pydata.org/en/latest/

[2] http://bokeh.pydata.org/en/latest/docs/gallery.html

[3] http://www.datasciencecourse.org/