# Bokeh Note

This is my note about interactive data visualzation with Bokeh tools from various MOOCs I took. Bokeh allows building interactive visualization without using JavaScript. Bokeh plot output can be either html or in jupyter notebook, add Ipython interactor for widget. I can choose between output html file, with `output_file`, or display in jupyter notbook using `output_notebook`, or both. 

 The workflow of Bokeh is similar to matplotlib, which is (1) create figure with specificed `plot_width` and `plot_height` and `tools` options, (2) plot line, (3) show graph, and additional step (4) add interactive bar.

Bokeh glyphs are visual shapes, cycles, lines, with properties attached to data such as xy positions, radius, color. 

Here are the topics:

* Basics Bokeh: maker options, drawing geometrical shape using patch(), plotting pandas dataframe in bokeh, box_select tool, Hover tool, Colormap
* Building interactive apps with Bokeh: connet Bokeh widgets to a python code.  For example, generate fit after user select a plot, or change plotting data from a selection panel. Widget options include slider, select (dropdown), button etc.

In [68]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Import figure from bokeh.plotting
from ipywidgets import interact


# Import output_file and show from bokeh.io
from bokeh.io import output_file, output_notebook, show, reset_output, push_notebook
output_notebook()
from bokeh.models.mappers import ColorMapper, LinearColorMapper
from bokeh.palettes import Viridis5
from bokeh.plotting import figure, show, output_file
from bokeh.tile_providers import STAMEN_TERRAIN,CARTODBPOSITRON_RETINA
from bokeh.layouts import widgetbox,row, column, gridplot

# Basics Bokeh

Set output either html or in jupyter notebook, add Ipython interactor for widget. I can choose between output html file, with `output_file`, or display in jupyter notbook using `output_notebook`, or both. 

The workflow of Bokeh is similar to matplotlib, which is (1) create figure, (2) plot line, (3) show graph, and additional step (4) add interactive bar.


* Options of markers: Markers ● asterisk() ● circle() ● circle_cross() ● circle_x() ● cross() ● diamond() ● diamond_cross() ● inverted_triangle() ● square() ● square_cross() ● square_x() ● triangle() ● x(), line()
* draw geometrical shape using patches()
* Bokeh can work with numpy, and pandas
* Bokeh ColumnaDataSource: the Bokeh ColumnDataSource allows Bokeh plot to refer to the pandas column names direclty,  link selections and extra columns can be used with hover tooltips
* box_select tool, Hover tool, Colormap

In [139]:
# Import figure from bokeh.plotting
from ipywidgets import interact
from bokeh.plotting import figure

# Import output_file and show from bokeh.io
from bokeh.io import output_file, output_notebook, show, reset_output, push_notebook

A simple example from Bokeh docummentation.

In [141]:
x = np.linspace(0, 2*np.pi, 2000)
y = np.sin(x)

p = figure(title="simple line example", plot_height=300, plot_width=600, y_range=(-5,5),
           background_fill_color='#efefef')
r = p.line(x, y, color="#8888cc", line_width=1.5, alpha=0.8)

In [142]:
show(p, notebook_handle=True)

In [138]:
def update(f, w=1, A=1, phi=0):
    if   f == "sin": func = np.sin
    elif f == "cos": func = np.cos
    r.data_source.data['y'] = A * func(w * x + phi)
    push_notebook()
    

In [143]:
interact(update, f=["sin", "cos"], w=(0,50), A=(1,10), phi=(0, 20, 0.1))

interactive(children=(Dropdown(description='f', options=('sin', 'cos'), value='sin'), IntSlider(value=1, descr…

<function __main__.update(f, w=1, A=1, phi=0)>

In [119]:
lit = pd.read_csv('data/literacy_birth_rate.csv')
lit.head()

Unnamed: 0,Country,Continent,female literacy,fertility,population
0,Chine,ASI,90.5,1.769,1324655000.0
1,Inde,ASI,50.8,2.682,1139965000.0
2,USA,NAM,99.0,2.077,304060000.0
3,Indonésie,ASI,88.8,2.132,227345100.0
4,Brésil,LAT,90.2,1.827,191971500.0


In [153]:
lit=lit.dropna()

In [157]:
lit['female literacy'] = lit['female literacy'].astype(float)
lit['fertility'] = lit['fertility'].astype(float)
lit.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 162 entries, 0 to 161
Data columns (total 5 columns):
Country            162 non-null object
Continent          162 non-null object
female literacy    162 non-null float64
fertility          162 non-null float64
population         162 non-null float64
dtypes: float64(3), object(2)
memory usage: 7.6+ KB


In [135]:
output_notebook()

In [158]:

# Create the figure: p
p = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')

# Add a circle glyph to the figure p
p.circle(lit['fertility'], lit['female literacy'])

# Call the output_file() function and specify the name of the file
#output_file('fert_lit.html')

# Display the plot
show(p)

A scatter plot with different shapes

In [165]:
p = figure(x_axis_label='fertility',
           y_axis_label='female_literacy (%  population)')
p.circle(lit[lit['Continent'] == 'LAT']['fertility'], lit[lit['Continent']
                                                          == 'LAT']['female literacy'], color='blue', size=10, alpha=0.8)
p.x(lit[lit['Continent'] == 'AF']['fertility'],
    lit[lit['Continent'] == 'AF']['female literacy'])
show(p)

In [173]:
stocks = pd.read_csv('data/stocks.csv')
stocks['Date'] = pd.to_datetime(stocks['Date'])
stocks.head()

Unnamed: 0,Date,AAPL,IBM,CSCO,MSFT
0,2000-01-03,111.937502,116.0,108.0625,116.5625
1,2000-01-04,102.500003,112.0625,102.0,112.625
2,2000-01-05,103.999997,116.0,101.6875,113.8125
3,2000-01-06,94.999998,114.0,100.0,110.0
4,2000-01-07,99.500001,113.5,105.875,111.4375


In [176]:
p = figure(x_axis_type='datetime', x_axis_label='Date', y_axis_label='US Dollars')
p.line(stocks.Date, stocks.AAPL)
p.circle(stocks.Date, stocks.AAPL,fill_color='white', size=4)
show(p)

In [171]:
stocks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3521 entries, 0 to 3520
Data columns (total 5 columns):
Date    3521 non-null object
AAPL    3521 non-null float64
IBM     3521 non-null float64
CSCO    3521 non-null float64
MSFT    3521 non-null float64
dtypes: float64(4), object(1)
memory usage: 137.6+ KB


Patches: 
In Bokeh, extended geometrical shapes can be plotted by using the patches() glyph function. The patches glyph takes as input a list-of-lists collection of numeric values specifying the vertices in x and y directions of each distinct patch to plot.

The patches glyph takes as input a list-of-lists collection of numeric values specifying the vertices in x and y directions of each distinct patch to plot.

In [182]:

x = [[1,1,2,2], [2,2,4], [2,2,3,3]]

y = [[2,5,5,2], [3,5,5], [2,3,4,2]]

# Add patches to figure p with line_color=white for x and y
p = figure()
p.patches(x,y,fill_color=['red','blue','green'], line_color='white')

# Specify the name of the output file and show the result
show(p)

Other glyphs
● annulus() ● annular_wedge() ● wedge() 
● rect() ● quad() ● vbar() ● hbar() 
● image() ● image_rgba() ● image_url()
● patch() ● patches() 
● line() ● multi_line() 
● circle() ● oval() ● ellipse() 
● arc() ● quadratic() ● bezier()

In [7]:
from bokeh.plotting import ColumnDataSource

In [179]:
df = pd.read_csv('data/sprint.csv')
df.head()

Unnamed: 0,Name,Country,Medal,Time,Year,color
0,Usain Bolt,JAM,GOLD,9.63,2012,goldenrod
1,Yohan Blake,JAM,SILVER,9.75,2012,silver
2,Justin Gatlin,USA,BRONZE,9.79,2012,saddlebrown
3,Usain Bolt,JAM,GOLD,9.69,2008,goldenrod
4,Richard Thompson,TRI,SILVER,9.89,2008,silver


Bokeh ColumnDataSource allows Bokeh plot to refer to the pandas column names direclty,  link selections and extra columns can be used with hover tooltips

In [180]:
source = ColumnDataSource(df)
p.circle('Year','Time',source=source,color='color',size=8)
show(p)

Adding box_select tool changes the selected and non-selected circle glyph properties so that selected glyphs are red and non-selected glyphs are transparent blue.

In [181]:
p = figure(x_axis_label='Year',y_axis_label='Time', tools='box_select')
p.circle('Year','Time',source=source,selection_color='red',nonselection_alpha=0.1)
show(p)

adding hover tool

In [1]:
from bokeh.models import HoverTool

In [5]:
glu = pd.read_csv('data/glucose.csv')
glu['datetime'] = pd.to_datetime(glu['datetime'])
glu.head()

Unnamed: 0,datetime,isig,glucose
0,2010-10-07 00:03:00,22.1,150
1,2010-10-07 00:08:00,21.46,152
2,2010-10-07 00:13:00,21.06,149
3,2010-10-07 00:18:00,20.96,147
4,2010-10-07 00:23:00,21.52,148


In [8]:
glu_s = ColumnDataSource(glu)

In [15]:
# Add circle glyphs to figure p
p = figure()
p.circle('datetime', 'glucose', source=glu_s, size=10,
         fill_color='grey', alpha=0.1, line_color=None,
         hover_fill_color='firebrick', hover_alpha=0.5,
         hover_line_color='white')

hover = HoverTool(tooltips=None,mode='vline')
p.add_tools(hover)
show(p)

Use the CategoricalColorMapper to color each glyph by a categorical property

In [16]:
from bokeh.models import CategoricalColorMapper

In [17]:
auto = pd.read_csv('data/auto-mpg.csv')

In [18]:
auto_s = ColumnDataSource(auto)

In [20]:
p = figure()
# Make a CategoricalColorMapper object: color_mapper
color_mapper = CategoricalColorMapper(factors=['Europe', 'Asia', 'US'],
                                      palette=['red', 'green', 'blue'])

# Add a circle glyph to the figure p
p.circle('weight', 'mpg', source=auto_s,
            color=dict(field='origin',transform=color_mapper),
            legend='origin')

show(p)

# Layouts, Link plot together and Annotations

* Arranging multiple plots in rows and columns, grid layout, Panel and tabbed layouts
* Linking plots together: link axes, or ColumnDataSource
* Add annotations: legends and hover tooltips

In [57]:
from bokeh.layouts import row, column, gridplot

In [23]:
lit = pd.read_csv('data/literacy_birth_rate.csv').dropna()
lit[['female literacy', 'fertility']] = lit[['female literacy', 'fertility']].astype(float)
lit = lit.rename({'Country ': 'Country'}, axis=1)
lit.head(), lit.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 162 entries, 0 to 161
Data columns (total 5 columns):
Country            162 non-null object
Continent          162 non-null object
female literacy    162 non-null float64
fertility          162 non-null float64
population         162 non-null float64
dtypes: float64(3), object(2)
memory usage: 7.6+ KB


(    Country  Continent  female literacy  fertility    population
 0      Chine       ASI             90.5      1.769  1.324655e+09
 1       Inde       ASI             50.8      2.682  1.139965e+09
 2        USA       NAM             99.0      2.077  3.040600e+08
 3  Indonésie       ASI             88.8      2.132  2.273451e+08
 4     Brésil       LAT             90.2      1.827  1.919715e+08, None)

In [62]:
source = ColumnDataSource(lit)
p1 = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')
# Add a circle glyph to p1
p1.circle('fertility', 'female literacy',source=source)

# Create the second figure: p2
p2 = figure(x_axis_label='population', y_axis_label='female_literacy (% population)')
# Add a circle glyph to p2
p2.circle('population','female literacy',source=source)
p2.xaxis.major_label_orientation = 45

layout = row(p1,p2, sizing_mode = 'scale_width')
show(layout)

In [55]:
layout2 = column([p1,p2])
show(layout2)

Grid layouts

In [70]:
x = list(range(11))
y0 = x
y1 = [10 - i for i in x]
y2 = [abs(i - 5) for i in x]

# create three plots
p1 = figure(plot_width=250, plot_height=250, title=None)
p1.circle(x, y0, size=10)
p2 = figure(plot_width=250, plot_height=250, title=None)
p2.triangle(x, y1, size=10)
p3 = figure(plot_width=250, plot_height=250, title=None)
p3.square(x, y2, size=10)

# make a grid
grid = gridplot([[p1, p2], [None, p3]])

# show the results
show(grid)

Tabbed laytout is for making a plot showing different tab. Each tab is made from Panel layout. First generate a plot for each tab, turn the figure into a panel with the `Panel()` function with the title, then passing a list of tabs into the `Tabs(list)` function.

In [79]:
from bokeh.models.widgets import Panel, Tabs

In [85]:
p1 = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)',plot_width=250,plot_height=250)
# Add a circle glyph to p1
p1.circle('fertility', 'female literacy',source=ColumnDataSource(lit[lit['Continent']=='LAT']))

p2 = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)',plot_width=250,plot_height=250)
# Add a circle glyph to p1
p2.circle('fertility', 'female literacy',source=ColumnDataSource(lit[lit['Continent']=='AF']))


p3 = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)',plot_width=250,plot_height=250)
# Add a circle glyph to p1
p3.circle('fertility', 'female literacy',source=ColumnDataSource(lit[lit['Continent']=='ASI']))

p4 = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)',plot_width=250, plot_height=250)
# Add a circle glyph to p1
p4.circle('fertility', 'female literacy',source=ColumnDataSource(lit[lit['Continent']=='EUR']))

In [78]:
# Create tab1 from plot p1: tab1
tab1 = Panel(child=p1, title='Latin America')

# Create tab2 from plot p2: tab2
tab2 = Panel(child=p2, title='Africa')

# Create tab3 from plot p3: tab3
tab3 = Panel(child=p3, title='Asia')

# Create tab4 from plot p4: tab4
tab4 = Panel(child=p4, title='Europe')

In [80]:
# Create a Tabs layout: layout
layout = Tabs(tabs=[tab1, tab2, tab3,tab4])
show(layout)

Creating a plot with linked axes is done by sharing range objects. From the plot p1 to p4 above. I only have to make the axes equal.

In [86]:
# Link the x_range of p2 to p1: p2.x_range
p2.x_range = p1.x_range

# Link the y_range of p2 to p1: p2.y_range
p2.y_range = p1.y_range

# Link the x_range of p3 to p1: p3.x_range
p3.x_range=p1.x_range

# Link the y_range of p4 to p1: p4.y_range
p4.x_range=p1.x_range

layout = gridplot([[p1,p2],[p3,p4]])
show(layout)

Link points: By sharing the same ColumnDataSource object between multiple plots, selection tools like BoxSelect and LassoSelect will highlight points in both plots that share a row in the ColumnDataSource.

In [103]:
source = ColumnDataSource(lit)
# Create the first figure: p1
p1 = figure(x_axis_label='fertility (children per woman)', 
            y_axis_label ='female literacy (% population)',
            tools='box_select,lasso_select')

# Add a circle glyph to p1
p1.circle('fertility','female literacy',source=source)

# Create the second figure: p2
p2 = figure(x_axis_label='fertility (children per woman)', 
            y_axis_label='population (millions)',tools='box_select,lasso_select')

# Add a circle glyph to p2
p2.circle('fertility','population',source=source)

# Create row layout of figures p1 and p2: layout
layout = row(p1,p2, sizing_mode ='scale_width')
show(layout)

Legends can be added to any glyph by using the legend keyword argument. One can specify the styling and location of the legend.

Then, one can add hover tooltips for showing details of the data using $index, or '@columnname' of the ColumnDataSource

In [112]:
from bokeh.models import HoverTool

In [138]:
p = figure(plot_width=300, plot_height=300)
# Add the first circle glyph to the figure p
p.circle('fertility', 'female literacy', source=ColumnDataSource(lit[lit['Continent']=='LAT']), size=10, color='red', legend='Latin America')

# Add the second circle glyph to the figure p
p.circle('fertility', 'female literacy', source=ColumnDataSource(lit[lit['Continent']=='AF']), size=10, color='blue', legend='Africa')
p.legend.location='bottom_left'
p.legend.background_fill_color='lightgray'

hover = HoverTool(tooltips=[('Country', '@Country '), ('Population','@population'), ('Continent','@Continent')])
p.add_tools(hover)

show(p)

# Building interactive apps with Bokeh

Bokeh server connects the plot to a live python code.  For example, generate fit after user select a plot.

* Start by importing curdoc
* Create plots and widgets 
* Add callbacks function that responses to some event for example response to the user selection
* Add plots and widgets in layout
* curdoc().add_root(layout)
* run the app on the shell or windown command prompt 
`bokeh serve --show myapp.py` or use directory instead of .py application. The --show open webbrower.

The curdoc() is intened for .py file. For jupyter notebook I have to create another function that passed a doc as an argument. Then add layout into doc using `doc.add_roo(layout)`.

Widget options: slider, select (dropdown), button etc.

In [140]:
from bokeh.io import curdoc
from bokeh.plotting import figure
from bokeh.layouts import widgetbox
from bokeh.models import Slider

In [201]:
def plot_sin(doc):
    # Create a new plot: plot
    plot = figure(plot_width=300,plot_height=300)

    x= np.arange(0,10)
    y=np.sin(x)
    source = ColumnDataSource(data={'x':x,'y':y})
    # Add a line to the plot
    plot.line('x','y', source=source)

    # Add the plot to the current document (during testing)
    #doc.add_root(plot)
    
    # Define a callback function: callback
    def callback1(attr, old, new):

        # Read the current value of the slider: scale
        scale = slider1.value

        # Compute the updated y using np.sin(scale/x): new_y
        new_y = np.sin(scale*x)

        # Update source with the new data values
        source.data = {'x': x, 'y': new_y}

    #Create first slider: slider1
    slider1 = Slider(title='frequency',start=0,end=10,step=0.1,value=1)
    slider1.on_change('value', callback1)
    
        # Define a callback function: callback
    def callback2(attr, old, new):

        # Read the current value of the slider: scale
        amp = slider2.value

        # Compute the updated y using np.sin(scale/x): new_y
        new_y = amp*np.sin(new*x)

        # Update source with the new data values
        source.data = {'x': x, 'y': new_y}

    #Create first slider: slider1
    slider1 = Slider(title='frequency',start=0,end=10,step=0.1,value=1)
    slider1.on_change('value', callback1)

    # Create second slider: slider2
    slider2 = Slider(title='amplitude',start=1,end=5,step=1,value=1)
    slider2.on_change('value', callback2)

# Add slider1 and slider2 to a widgetbox
    layout = column(widgetbox(slider1,slider2),plot)
    doc.add_root(layout)

# Add the layout to the current document
#curdoc().add_root(layout)

In [202]:
show(plot_sin)

Another Example from Bokeh website

In [176]:
import yaml

from bokeh.layouts import column
from bokeh.models import ColumnDataSource, Slider
from bokeh.plotting import figure
from bokeh.themes import Theme
from bokeh.io import show, output_notebook

from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature

In [177]:

def modify_doc(doc):
    df = sea_surface_temperature.copy()
    source = ColumnDataSource(data=df)

    plot = figure(x_axis_type='datetime', y_range=(0, 25),
                  y_axis_label='Temperature (Celsius)',
                  title="Sea Surface Temperature at 43.18, -70.43")
    plot.line('time', 'temperature', source=source)

    def callback(attr, old, new):
        if new == 0:
            data = df
        else:
            data = df.rolling('{0}D'.format(new)).mean()
        source.data = ColumnDataSource(data=data).data

    slider = Slider(start=0, end=30, value=0, step=1, title="Smoothing by N Days")
    slider.on_change('value', callback)

    doc.add_root(column(slider, plot))

    doc.theme = Theme(json=yaml.load("""
        attrs:
            Figure:
                background_fill_color: "#DDDDDD"
                outline_line_color: white
                toolbar_location: above
                height: 500
                width: 800
            Grid:
                grid_line_dash: [6, 4]
                grid_line_color: white
    """))

In [179]:
show(modify_doc)

Dropdown callbacks

In [207]:
from bokeh.models import ColumnDataSource, Select

In [216]:
def dropdown_demo(doc):
    # Create ColumnDataSource: source
    source = ColumnDataSource(data={
        'x' : lit['fertility'].values,
        'y' : lit['female literacy'].values
        })
    # Create a new plot: plot
    plot = figure(plot_width=300,plot_height=300)

    # Add circles to the plot
    plot.circle('x', 'y', source=source)
    
    # Define a callback function: update_plot
    def update_plot(attr, old, new):
    # If the new Selection is 'female_literacy', update 'y' to female_literacy
        if new == 'female_literacy': 
            source.data = {
                'x' : lit.fertility.values,
                'y' : lit['female literacy'].values
            }
        # Else, update 'y' to population
        else:
            source.data = {
                'x' : lit.fertility.values,
                'y' : lit.population.values
            }
    # Create a dropdown Select widget: select    
    select = Select(title="distribution", options=['female_literacy', 'population'], value='female_literacy')
    # Attach the update_plot callback to the 'value' property of select
    select.on_change('value', update_plot)
    
    layout = column(select, plot)
    doc.add_root(layout)

In [217]:
show(dropdown_demo)