# Practical Data Science Tutorial- Interactive Visualization with Bokeh
*Author: Yuwei Zhu*

## Introduction

This tutorial will introduce you to some basic elements and methods for creating an **interactive plot** using `Bokeh` in Jupyter Notebook. 

Telling attractive stories by visualizing data is one of the most important abilities that a successful data scientist should acquire. To improve audiences' understanding of the data, we want to enable them the freedom to fully explore analyzed data in many cases. However, although matplotlib (the most popular python visualization library) provides concise construction of versatile graphics for data visualization, it doesn't support interactive visualization. Therefore, I want to introduce Bokeh, an interactive visualization library, to help you better presenting your data analysis result. 

A famous example of interactive data visualization is the following video, shown by Hans Rosling's in his iconic TED Talk in 2006. After reading this tutorial, you will be able to create similar fantastic plots.

In [21]:
%%HTML
<div align="middle">
<video width="80%" controls autoplay>
      <source src="gapminder_demo.mp4" type="video/mp4">
</video></div>

[Bokeh](https://bokeh.pydata.org/en/1.3.4/) is an interactive visualization library that targets modern web browsers for presentation by rendering its graphics using HTML and JavaScript. It's also a great candidate for exploring and presenting your data since it supports Jupyter Notebook inline display. More importantly, it allows your notebook take inputs from readers and present corresponding result, making your notebook more appealing.

### Tutorial content
In this tutorial, I will show you how to make your plots become interactive in Python using Bokeh. 

I will cover the following topics in this tutorial:
1. [Installing the libraries](#1.-Installing-the-Libraries)
2. [Ploting with basic glyphs](#2.-Ploting-with-Basic-Glyphs)
    - 2.1 [Bokeh visualization template](#2.1-Bokeh-visualization-template)
    - 2.2 [Organizing multiple glyphs](#2.2-Organizing-Multiple-Glyphs)
3. [Transforming data](#3.-Transforming-Data)
4. [Adding interactions](#4.-Adding-Interactions)
    - 4.1 [Interactive plots with widgets](#4.1-Interactive-Plots-with-Widgets)
    - 4.2 [Linking behavior](#4.2-Linking-Behavior)
5. [Example application: World Happiness Report](#Example-application:-World-Happiness-Report)
6. [Example application: Airbnb NYC data](#Example-application:-Airbnb-NYC-data)

## 1. Installing the Libraries

Before getting started, you'll need to install the library that we will use. You can install Bokeh using `pip`:

    $ pip install bokeh

If you are already an Anaconda user, you can simply run the command:

    $ conda install bokeh
    
Please note that Bokeh is officially supported (and continuously tested) on CPython versions 2.7 and 3.5+ only. Other Python versions or implementations may function, possibly limited capacity, but no guarantees or support is provided.

Also, for basic usage, your machine should have the following libraries install:
- Jinja2 >=2.7
- numpy >=1.7.1
- packaging >=16.8
- pillow >=4.0
- python-dateutil >=2.1
- PyYAML >=3.10
- six >=1.5.2
- tornado >=4.3

In [2]:
# Data handling
import numpy as np
import pandas as pd

import warnings
warnings.filterwarnings("ignore")

# Bokeh libraries
from bokeh.io import output_notebook
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource
from bokeh.layouts import gridplot, column, row
from bokeh.transform import factor_cmap, factor_mark
from bokeh.models.widgets import CheckboxButtonGroup, RangeSlider, Panel, Tabs,\
                                 Slider, TextInput, Button, Paragraph, Select, CheckboxGroup
from bokeh.palettes import all_palettes

In [3]:
# display in notebook
output_notebook() 

## *After install the library, please trust and re-run the entire notebook to make sure the interaction shows!*
---

Before we officially start the tutorial, **let's play a quick game!**<br>
*Please run the cell below and start our little game!*

In [4]:
def modify_doc(doc):   
    input = TextInput(value="default", title = 'Your name:')
    button = Button(label="Push !", button_type="success")
    output = Paragraph()

    def update():
        output.text = "Hello, " + input.value +'.  Welcome to this tutorial. Hope you can enjoy it!'
    button.on_click(update)

    layout = column(input, button, output)

    doc.add_root(layout)
    
show(modify_doc) 

It's fun, right? <br>
Want to learn how to make similar interaction? <br>
Keep on reading this tutorial!

## 2. Ploting with Basic Glyphs

Now that we've installed the required library, let's first create some static plots using Bokeh to be familair with basic concepts and methods.

### 2.1 Bokeh Visualization Template

Below is a simple static plot example. Bascially, creating a plot using Bokeh is quite similar to using Matplotlib to generate a plot. There are mainly four step:
- prepare your data for ploting
- determine location of rendered plots
- configure the canvas
- draw the data

In [5]:
# Prepare the data
x = np.linspace(-5, 5, 50)
y_square = x ** 2
y_exp = np.exp(x)
y_log = np.log(np.absolute(x))
y_cos = np.cos(x)

# Determine where the visualization will be rendered
output_notebook()  # Render inline in a Jupyter Notebook
# If you want to render to a static HTML, use: 
# from bokeh.io import output_file
# output_file('filename.html')

# Configure the figure(s)
fig = figure(plot_height=400, plot_width=500, # configure canvas size
             x_range = (-5,5), y_range = (-2, 10),
             title = 'Simple plot example',
             background_fill_color = "cornsilk", background_fill_alpha = 0.4)# set up background

# Connect to data and draw plots
fig.line(x, y_square, legend="y=sqrt(x)",line_color = "coral") # line 1
fig.square(x, y_square, legend="y=sqrt(x)", fill_color=None, line_color="coral") # add square to points in line 1

fig.line(x, y_exp, legend="y=exp(x)", line_color = "dodgerblue", line_dash = 'dashed')# line 2

fig.step(x, y_log , legend="y=log(|x|)", line_color = "indigo", line_alpha = 0.5)# line3 draw as step line
fig.circle(x, y_log, legend = "y=log(x)",line_color = 'indigo', fill_color = None)# add circle to points

fig.line(x, y_cos, legend="y=cos(x)",line_color = "green", line_width = 2)# line 4

fig.legend.location = "top_right" # set legend position

# Organize the multiple layers/plots(optional)

# Preview
show(fig) 

As we can see from the code above, Bohek provides mutliple figure options along with various styling visual attributes to customize the plots. To see available figure opitions, please refer the [bokeh.plotting](https://bokeh.pydata.org/en/1.3.4/docs/reference/plotting.html#bokeh-plotting) interface. See [Styling Visual Attributes](https://bokeh.pydata.org/en/1.3.4/docs/user_guide/styling.html#userguide-styling) for information about how to customize the visual style of plots.



An intersting element in the above plot you might notice is the `toolbar` on the right side of the plot. `Tools` is an important feature Bokeh provides to let you interact with your plots. You can use different tools to report information, to change plot parameters such as zoom level or range extents, or to add, edit, or delete glyphs. 

Be default, the toolbar comes with the following tools (from top to bottom):
- Link to the [Bokeh homepage](https://bokeh.pydata.org/en/latest/)
- Pan
- Box Zoom
- Wheel Zoom
- Save
- Reset
- A link Bokeh's user guide for [Configuring Plot Tools](https://bokeh.pydata.org/en/latest/docs/user_guide/tools.html#built-in-tools)

You can simply click the tool on the toolbar and then interact with your plot! (Please try)

The toolbar can be removed by passing *toolbar_location=None* when instantiating a figure() object, or relocated by passing any of 'above', 'below', 'left', or 'right'. Additionally, the toolbar can be configured to include any combination of tools you desire. To check available tool options, please visit [Configuring Plot Tools](https://bokeh.pydata.org/en/latest/docs/user_guide/tools.html#built-in-tools).

### 2.2 Organizing Multiple Glyphs

You already learned how to create a single plot using Bokeh. Then the next step is to create multiple plots and put them together.

Similar to the functionality of Matplotlib’s subplot, Bokeh offers the `column`, `row`, and `gridplot` functions in its [bokeh.layouts](https://bokeh.pydata.org/en/latest/docs/reference/layouts.html) module. These functions can more generally be classified as layouts.

The usage is very straightforward. If you want to put two visualizations in a vertical configuration, you can do so with the following:

    $ show(column(figure_1, figure_2)

If you want to create plots with complex layout design, you might want to use gridplot instead. Below is a example using gridplot to generate customized layout.

In [6]:
#create plot 1
p1 = figure(plot_width=200, plot_height=200, title="y = sqrt(x)")
p1.line(x, y_square, line_color = "coral")

# create plot2
p2 = figure(plot_width=200, plot_height=200, title="y = log(|x|)")
p2.line(x, y_log, line_color = "dodgerblue")

# create plot3
p3 = figure(plot_width=200, plot_height=200, title="y = exp(x)")
p3.line(x, y_exp, line_color = "indigo")

# create plot4
p4 = figure(plot_width=200, plot_height=100, title="y = cos(x)")
p4.line(x, y_cos, line_color = "green")

grid = gridplot([[p1, p2, p3], [None, p4, None]]) # plot with four plots with placeholders at the second row
show(grid)

## 3. Transforming Data

In previous examples, we simply pass lists of values into plotting functions. When you pass in data like this, Bokeh works behind the scenes to make a `ColumnDataSource` for you. But learning to create and use the ColumnDataSource will enable you access more advanced capabilities, such as sharing data between plots, and filtering data.

The primary functionality of ColumnDataSource is to map names to the columns of your data. This makes it easier for you to reference elements of your data when building your visualization. 

The ColumnDataSource takes a data parameter which is a **dict**, with **string column names as keys** and **lists (or arrays) of data values as values**. Once you initialize a ColumnDataSource object, you can pass a column’s name as a stand in for the data values when ploting.

In [7]:
data = {'x_values': [i for i in range(20)],
        'y_values': np.random.normal(10, 10, size = 20).tolist(),
        'size': np.random.normal(10, 5, size = 20).tolist()}

source = ColumnDataSource(data=data)# create ColumnDataSource object

fig = figure(plot_width=300, plot_height=200)
fig.square(x='x_values', y='y_values', size = 'size',source=source,
         color="lightseagreen", alpha= 0.7) # use columns names to pass the data
show(fig)

Aside from quick reference, using ColumnDataSource allows explicitly **transform data themselves to generate useful columns for ploting**. For example, we can create a column of colors to control how the Markers in a scatter plot should be shaded. By doing so, we can reduce both code (i.e. not having to color map data by hand) as well as the amount of data that has to be sent into the browser (only the raw data is sent).

Let's create a toy dataset for three popular courses at CMU and corresponding registration student numbers and average scores.

In [8]:
# generate data
average_score = np.random.normal(90, 10, size = 20).tolist()+ np.random.normal(60, 3, size = 20).tolist()+\
              np.random.normal(80, 2, size = 20).tolist()

courses = ['Practical DS'] * 20 + ['Intro to ML'] * 20 + ['Deep Learning'] * 20

students = np.random.randint(150, size = 20).tolist()+ np.random.randint(200, size = 20).tolist()+\
           np.random.randint(100, size = 20).tolist()

source = ColumnDataSource(dict(average_score=average_score, courses = courses, students = students))

Now let's create a scatter plot for this dataset. Here, I want to use different color and marker shapes to mark different courses. You can see that we use ColumnDataSource object quickly generate markers and colormap in the line 7 & 8.

In [9]:
COURSE = ['Practical DS', 'Intro to ML', 'Deep Learning'] 
MARKERS = ['hex', 'circle_x', 'triangle']

fig = figure(plot_height=300, plot_width=500,toolbar_location=None)

fig.scatter('average_score', 'students', source = source, legend = 'courses', fill_alpha=0.4, size=12,
           marker=factor_mark('courses', MARKERS, COURSE), # use factor_mark() display different markers 
           color=factor_cmap('courses', all_palettes['Dark2'][4], COURSE)) # use factor_cmap() to colormap different categories
fig.xaxis.axis_label = 'Average Score'
fig.yaxis.axis_label = 'Number of students enrolled'

fig.legend.title = 'Courses'
fig.legend.title_text_font_style = "bold"
fig.legend.background_fill_alpha = 0.7
fig.legend.location = 'top_right'

show(fig)

[factor_cmap()](https://bokeh.pydata.org/en/latest/docs/reference/transform.html#bokeh.transform.factor_cmap) and [factor_mark()](https://bokeh.pydata.org/en/latest/docs/reference/transform.html#bokeh.transform.factor_mark) are used for categorical data. To handle continuous data, you can use [linear_cmap()](https://bokeh.pydata.org/en/latest/docs/reference/transform.html#bokeh.transform.linear_cmap) or [log_cmap()](https://bokeh.pydata.org/en/latest/docs/reference/transform.html#bokeh.transform.log_cmap) to perform color mapping.

For more information about ColumnDataSource, please refer the user guide [Providing Data for Plots and Tables](https://bokeh.pydata.org/en/latest/docs/user_guide/data.html)

## 4. Adding Interactions

#### It's time to create interactions!

Aside from the toolbar mentioned above, Bokeh provides addtional two methods to create interactive visualization. They are:
- Add widgets to allow readers input values
- Link multiple plots with filtered data

### 4.1 Interactive Plots with Widgets

Widgets are interactive controls that can be added to provide a front end user interface to a visualization. They can drive new computations, update plots, and connect to other programmatic functionality. 

To use widgets, you must add them to your document and define their functionality. Widgets can be added directly to the document root or nested inside a layout. Bekeh provides two ways to program a widget’s functionality:
- Use the CustomJS callback (see [JavaScript Callbacks](#https://docs.bokeh.org/en/latest/docs/user_guide/interaction/callbacks.html#userguide-interaction-jscallbacks)). This will work in standalone HTML documents.
- Use bokeh serve to start the Bokeh server and set up event handlers with .on_change.

*Using the CustomJS callback will require JavaScript programming skill. Therefore, here we illustrate the interaction fucntion by using the second method.*

To make you widgets be able to update values when when certain attributes on the widget are changed, you must define `event handlers` functions and attach them to widgets. These event handlers should be used as parameters in widgets **.on_change** method.

Below is a simple example to show how to add widgets and define event handlers.

In [10]:
# helper function to generate data
def get_data(N=200, choice='square'):
    x = np.linspace(1,N,N)
    if choice == 'square': y = x ** 2
    elif choice == 'exp': y = np.exp(x)
    elif choice == 'log': y = np.log(x)
    elif choice == 'random': y = np.random.random(N)
    
    return dict(x = x, y = y)

# formula choices
CHOICE = ["square", "exp", "log", 'random']

# main function to create interactive plots
def modify_doc(doc):
    source = ColumnDataSource(data=get_data(200))

    p = figure(plot_height=200, plot_width=400, tools="", toolbar_location=None)
    r = p.line(x='x', y='y', source=source,color="darkgreen", alpha=0.6, line_width = 3)

    select = Select(title="Fomula", value="square", options=CHOICE)# create a select widget
    input = TextInput(title="Maximum of X", value="200")# create a textinput widget
    
    # event handler for select widget
    def update_choice(attrname, old, new):
        choice = select.value
        N = int(input.value)
        source.data = get_data(N,choice = choice)
    select.on_change('value', update_choice)# pass event handler to on_change method
    
    # event handler for textinput widget
    def update_points(attrname, old, new):
        N = int(input.value)
        choice = select.value
        source.data = get_data(N, choice = choice)
    input.on_change('value', update_points)

    layout = column(row(select, input, width=400), row(p))

    doc.add_root(layout)

show(modify_doc)

In the above plot, you can visulize different formulas with different range of value by easily select/input values. 

Please note that all .on_change method should take an attribute name and one or more event handlers as parameters. These handlers are expected to have the function signature, **(attr, old, new)**, where attr refers to the changed attribute’s name, and old and new refer to the previous and updated values of the attribute. .on_change must be used when you need the previous value of an attribute.

For more available, please refer the user guide [Adding Widgets](https://bokeh.pydata.org/en/latest/docs/user_guide/interaction/widgets.html).

### 4.2 Linking Behavior

As mentioned before, ColumnDataSource allows **shared data selection** between plots automatically, which is the key of interaction in Bokeh. To let Bokeh understand that selection acted on several plots, you can simply pass the same data source to differnt plots. 

The following code shows an example of linkage between circle plots on two different figure() calls. To see the connection, please use your mouse to **select some data points** in left/right plot and you will see the corresponding data points in another plot.

In [11]:
x = list(range(-20, 21))
y0 = [i**2 for i in x]
y1 = [i**3 for i in x]

# create a column data source for the plots to share
source = ColumnDataSource(data=dict(x=x, y0=y0, y1=y1))

TOOLS = "box_select,lasso_select,reset"# define toolbars
TOOLTIPS = [("index", "$index"),
            ('square', '@y0'),
            ('cube', '@y1')]# define hovertool

# create a new plot
left = figure(tools=TOOLS, tooltips = TOOLTIPS, plot_width=300, plot_height=200, title=None)
left.circle('x', 'y0', source=source, color = 'indianred')

# create another new plot
right = figure(tools=TOOLS, tooltips = TOOLTIPS, plot_width=300, plot_height=200, title=None)
right.circle('x', 'y1', source=source, color = 'mediumaquamarine')

p = gridplot([[left, right]])

show(p)

In the above plot, we also add a useful tool called `HoverTool` to help user inspect data. It's a passive tool that annotate or report information about the selected data point. It is generally on at all times, but can be configured when ploting. To specify values in hovertool, we need to use filed names.

Field names that begin with `$` are “special fields”. These often correspond to values that are intrinsic to the plot, such as the coordinates of the mouse in data or screen space. These some special fields are listed here:
    
- index: index of selected point in the data source
- x: x-coordinate 
- y: y-coordinate 

Field names that begin with `@` are associated with columns in a ColumnDataSource.

Bokeh also support some advanced hovertool configuration (for example, image hover). To see more details, please [click](#https://bokeh.pydata.org/en/latest/docs/user_guide/tools.html#userguide-tools-inspectors).

---
### Congratulations! 
#### You finished the main part of this tutorial. 
#### Now you can create your own pretty plots!
---

To help you have an intuitive impression on creating interactive plots to present your data, I provide two examples below using real-world datasets to demonstrate possible usage. Hope it can help you have a better understanding and give you some inspiration.

## Example application: World Happiness Report

Does economy development improve people's happiness? What's the relationship between freedom and life expetancy? Are these relationships differ from different continents? 

To solve these problem, let's use the data from **World Happiness Report** to visualize these relations. For demonstration, here we only use the 2017 World Happiness Report. To categorize countries into different continents, we also load a country_to_continent dataset.

The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness on March 20th. The happiness scores and rankings use data from the Gallup World Poll. 

> *The World Happiness Report dataset and detailed decription can be download from Kaggle: https://www.kaggle.com/unsdsn/world-happiness <br>*
> *The country_to_continent dataset can be download from Kaggle: https://www.kaggle.com/statchaitya/country-to-continent*



Before visualization, let's load and transform the data first.

In [12]:
# load data
country_continent = pd.read_csv('countryContinent.csv', encoding  = 'iso-8859-1')[['country',  'continent']]
happy_17 = pd.read_csv('world_happiness_2017.csv')
# merge data
happy = happy_17.merge(country_continent, left_on = 'Country', right_on = 'country', how = 'left')
# drop nan
happy.drop(happy[happy['continent'].isnull()].index, inplace = True)
# rename
happy = happy.rename(columns = {'Happiness.Score': 'happiness_score', 
                        'Economy..GDP.per.Capita.': 'GDP_per_capita',
                       'Health..Life.Expectancy.': 'life_expectancy'})
# select useful columns
happy = happy[['Country','happiness_score','GDP_per_capita','life_expectancy','Freedom','continent']]

In [13]:
happy.head()

Unnamed: 0,Country,happiness_score,GDP_per_capita,life_expectancy,Freedom,continent
0,Norway,7.537,1.616463,0.796667,0.635423,Europe
1,Denmark,7.522,1.482383,0.792566,0.626007,Europe
2,Iceland,7.504,1.480633,0.833552,0.627163,Europe
3,Switzerland,7.494,1.56498,0.858131,0.620071,Europe
4,Finland,7.469,1.443572,0.809158,0.617951,Europe


Here, our basic needs and corresponding visualization design are:

| Needs      | Design    |
| :--------  | :------- |
| Visualizing relationships between:<br> - happiness score and GDP<br>- life expectancy and freedom | Creating two scatter plots |
|Presenting connection between the above two relationship |Bridging two scatter plots|
|Labelling different continents| Using different colors and markers for different continents|
|Providing details for sepecific data point| Adding hovertool to annotate data|

Moreover, we want to provide some interaction choices. Given the visualization design above, we can offer the following interaction options:
- Select multiple continents
- Define range of happiness score
- Define range of life expectancy

Now we have our visualization design, let's start create the plot.

In [14]:
# helper function: create corresponding dataset for plot given interaction choices
def make_dataset(select, range_start = 0, range_end = 10, life_start = 0, life_end = 1):
    new_data = pd.DataFrame(columns = happy.columns)
    for i in select:
        new_data = new_data.append(happy.loc[happy['continent'] == i])
    new_data = new_data.loc[(new_data['happiness_score'] > range_start) & 
                            (new_data['happiness_score'] < range_end)]
    new_data = new_data.loc[(new_data['life_expectancy'] > life_start) & 
                            (new_data['life_expectancy'] < life_end)]
    cds = ColumnDataSource(new_data)
    return cds

In [16]:
def modify_doc(doc):
    source = ColumnDataSource(happy)
    
    # variables needed in plot configuration
    CONTINENT = happy['continent'].unique().tolist()
    MARKERS = ['hex', 'circle_x', 'triangle', 'square', 'asterisk']
    TOOLS = "box_select,lasso_select,reset" # set toolbar
    TOOLTIPS_left = [("index", "$index"),
                ('Country', '@Country'),
                ('Happiness score',"@happiness_score"),
                ('GDP per capita', '@GDP_per_capita')]# set hovertool for the left plot
    TOOLTIPS_right = [("index", "$index"),
                ('Country', '@Country'),
                ('Life Expectancy',"@life_expectancy"),
                ('Freedom', '@Freedom')]# set hovertool for the right plot

    left = figure(plot_width=400, plot_height=350, title = 'Happiness Score vs. GDP per Capita',
                  tools = TOOLS, tooltips = TOOLTIPS_left)
    left.scatter('happiness_score', 'GDP_per_capita',legend = 'continent',
                  size=8, hover_color="firebrick", source=source, 
                  marker=factor_mark('continent', MARKERS, CONTINENT),
                  color=factor_cmap('continent', 'Category20_5', CONTINENT))
    left.xaxis[0].axis_label = 'Happiness Score'
    left.yaxis[0].axis_label = 'GDP per Capita'
    left.legend.location = "bottom_right"
    left.legend.title = 'Continent'
    left.legend.title_text_font_style = "bold"
    left.legend.background_fill_alpha = 0.5
    

    right = figure(plot_width=400, plot_height=350, title = 'Life Expectancy vs. Freedom',
                   tools = TOOLS, tooltips = TOOLTIPS_right)
    right.scatter('life_expectancy', 'Freedom', legend = 'continent',
                   size=8, hover_color="firebrick", source=source, 
                   marker=factor_mark('continent', MARKERS, CONTINENT),
                   color=factor_cmap('continent', 'Category20_5', CONTINENT))
    right.xaxis[0].axis_label = 'Life Expectancy'
    right.yaxis[0].axis_label = 'Freedom'
    right.legend.location = "bottom_right"
    right.legend.title = 'Continent'
    right.legend.title_text_font_style = "bold"
    right.legend.background_fill_alpha = 0.5
    
    # create widgets: 1 checkbox, 2 sliders(1 for happiness socre, 1 for life expectancy)
    checkbox = CheckboxGroup(labels=CONTINENT, active=[0,1,2,3,4])
    
    range_slider_1 = RangeSlider(start=2, end=8, value=(2,8), step=.1, title="Happiness Score")
    
    range_slider_2 = RangeSlider(start=0, end=1, value=(0,1), step=.01, title="Life Expectancy")
    
    # define event handlers
    def update_check(attr, old, new):
        checked = [CONTINENT[i] for i in checkbox.active]
        new_src = make_dataset(checked)
        source.data.update(new_src.data)
    checkbox.on_change('active', update_check)
    
    def update_slider_1(attr, old, new):
        range_start = range_slider_1.value[0]
        range_end = range_slider_1.value[1]
        checked = [CONTINENT[i] for i in checkbox.active]
        new_src = make_dataset(checked, range_start = range_start, range_end = range_end)
        source.data.update(new_src.data)
    range_slider_1.on_change('value', update_slider_1)
    
    def update_slider_2(attr, old, new):
        range_start = range_slider_2.value[0]
        range_end = range_slider_2.value[1]
        checked = [CONTINENT[i] for i in checkbox.active]
        new_src = make_dataset(checked, life_start = range_start, life_end = range_end)
        source.data.update(new_src.data)
    range_slider_2.on_change('value', update_slider_2)
    
    layout = column(row(checkbox,column(range_slider_1,range_slider_2)), row(left, right))
    
    doc.add_root(layout)

show(modify_doc)

Now we have our first real-world dataset interactive visualization! 

The above plot offers the following interactions: 
- Select continents to plot
- Select happiness socreand life expectancy range
- Select specific data point to check corresponding values
- Select mutliple data points in one plot to see corresponding point in another plot (you can use the toolbar to choice different select mothods (box select/lasso select)

In next example, we will include more complex data transformation and interactive visualization methods.

## Example application: Airbnb NYC data

In this example, we will use the Airbnb NYC dataset. This dataset describes listing activity and metrics in NYC, NY for 2019. It contains 48895 observations with 16 columns.

> You can downlaod this dataset from Kaggle: https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data

In [17]:
AB_nyc = pd.read_csv('AB_NYC_2019.csv')

In [18]:
AB_nyc.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


Imagine you are planning a travel to New York City. What are factors you would consider when choosing a room to book on Airbnb? 

**Price** definitely will be a key factor determining your final decision. Also, **location** and **room type** might be another two main factors impacting your choice. However, being unfamilair with New York city, you have no idea about which neighborhood or what type of room that you are most likely to find a place within your price range. 

Consider other traverlers with the similar needs illustrated above, let's use the dataset to generate a interacitve plots satisfying uers' needs.

In [19]:
# helper function to generate corresponding dataset
# flag refers to which column(neighbourhood/room type) we want to use to categorize data
def make_dataset(flag, input_list, range_start = 0, range_end = 1500, bin_width = 50):
    new_data = pd.DataFrame(columns = ['proportion', 'left', 'right', 'room_type'])
    range_extent = range_end - range_start
  
    for i in input_list:
        if flag == 'neighbourhood':
            subset = AB_nyc.loc[AB_nyc['neighbourhood_group'] == i]
            # Bokeh cannot plot histogram automatically given raw data, to create a histogram, you need to use
            # np.histogram to compute the historgram before ploting
            hist, edges = np.histogram(subset['price'], bins = int(range_extent / bin_width), 
                                       range = [range_start, range_end])
            room_df = pd.DataFrame({'proportion': hist / np.sum(hist), 
                                   'left': edges[:-1], 'right': edges[1:] })
            room_df['neighbourhood_group'] = i
            new_data = new_data.append(room_df)
        if flag == 'room_type':
            subset = AB_nyc.loc[AB_nyc['room_type'] == i]
            hist, edges = np.histogram(subset['price'], bins = int(range_extent / bin_width), 
                                       range = [range_start, range_end])
            room_df = pd.DataFrame({'proportion': hist / np.sum(hist), 
                                   'left': edges[:-1], 'right': edges[1:] })
            room_df['room_type'] = i
            new_data = new_data.append(room_df)
            
    new_data = ColumnDataSource(new_data)
    return new_data

This time we will add a new feature - `Tab`. Instead of presenting two plots in parallel to show visualization results at the same time, we want to only display the plot that user desire to see. Therefore, we can create two tabs for two category choices to present our result.

To add a tab, you just group your plots and widgets using `Panel` and then pass all the panels to Tab. Please see the last few lines in the following cell to see how it works.

In [20]:
def modify_doc(doc):
    source_1 = make_dataset('neighbourhood', AB_nyc['neighbourhood_group'].unique().tolist())
    source_2 = make_dataset('room_type', AB_nyc['room_type'].unique().tolist())
    NEIGHBOUR = AB_nyc['neighbourhood_group'].unique().tolist()
    ROOM = AB_nyc['room_type'].unique().tolist()
    
    fig_1 = figure(plot_width = 500, plot_height = 400)
    fig_1.quad(source = source_1, bottom = 0, top = 'proportion', left = 'left', right = 'right',
            legend = 'neighbourhood_group', fill_alpha = 0.66, line_color = 'black',
            color=factor_cmap('neighbourhood_group', 'Category10_5', NEIGHBOUR))
    fig_2 = figure(plot_width = 500, plot_height = 400)
    fig_2.quad(source = source_2, bottom = 0, top = 'proportion', left = 'left', right = 'right',
            legend = 'room_type', fill_alpha = 0.75, line_color = 'black',
            color=factor_cmap('room_type', all_palettes['RdYlBu'][3], ROOM))
    
    checkbox_1 = CheckboxGroup(labels=NEIGHBOUR, active=[0,1,2,3,4])
    checkbox_2 = CheckboxGroup(labels=ROOM, active=[0,1,2])
    
    range_slider_1 = RangeSlider(start=0, end=1500, value=(0,1500), step=100, title="Price")
    range_slider_2 = RangeSlider(start=0, end=1500, value=(0,1500), step=100, title="Price")
    
    binwidth_slider_1 = Slider(start=20, end=100, value=50, step=10, title="Price Width")
    binwidth_slider_2 = Slider(start=20, end=100, value=50, step=10, title="Price Width")
    
    def update_check_1(attr, old, new):
        checked_1 = [NEIGHBOUR[i] for i in checkbox_1.active]
        new_src = make_dataset('neighbourhood', checked_1)
        source_1.data.update(new_src.data)
    checkbox_1.on_change('active', update_check_1)
    
    def update_check_2(attr, old, new):
        checked_2 = [ROOM[i] for i in checkbox_2.active]
        new_src = make_dataset('room_type', checked_2)
        source_2.data.update(new_src.data)
    checkbox_2.on_change('active', update_check_2)
    
    def update_slider_1(attr, old, new):
        range_start = range_slider_1.value[0]
        range_end = range_slider_1.value[1]
        checked = [NEIGHBOUR[i] for i in checkbox_1.active]
        new_src = make_dataset('neighbourhood',checked, range_start = range_start, range_end = range_end)
        source_1.data.update(new_src.data)
    range_slider_1.on_change('value', update_slider_1)
    
    def update_slider_2(attr, old, new):
        range_start = range_slider_2.value[0]
        range_end = range_slider_2.value[1]
        checked = [ROOM[i] for i in checkbox_2.active]
        new_src = make_dataset('room_type', checked, range_start = range_start, range_end = range_end)
        source_2.data.update(new_src.data)
    range_slider_2.on_change('value', update_slider_2)
    
    def update_width_slider_1(attr, old, new):
        bin_w = binwidth_slider_1.value
        range_start = range_slider_1.value[0]
        range_end = range_slider_1.value[1]
        checked = [NEIGHBOUR[i] for i in checkbox_1.active]
        new_src = make_dataset('neighbourhood',checked, range_start = range_start, range_end = range_end,bin_width = bin_w)
        source_1.data.update(new_src.data)
    binwidth_slider_1.on_change('value', update_width_slider_1)
    
    def update_width_slider_2(attr, old, new):
        bin_w = binwidth_slider_2.value
        range_start = range_slider_2.value[0]
        range_end = range_slider_2.value[1]
        checked = [ROOM[i] for i in checkbox_2.active]
        new_src = make_dataset('room_type', checked, range_start = range_start, range_end = range_end,bin_width = bin_w)
        source_2.data.update(new_src.data)

    binwidth_slider_2.on_change('value', update_width_slider_2)

    tab1 = Panel(child = row(column(checkbox_1,range_slider_1,binwidth_slider_1),fig_1), title = 'Neighbourhood')
    tab2 = Panel(child = row(column(checkbox_2,range_slider_2,binwidth_slider_2), fig_2), title = 'Room Type')
    
    layout = Tabs(tabs = [tab1, tab2])
    
    doc.add_root(layout)

show(modify_doc)

The plot above offers two options for ploting histogram of price. Also, it provides range selection of price and price width. By using this interactive visualization, people can easily have a deeper understanding of price disctribution in New York City for different neighbourhoods and room types. 

## Summary and Reference

This tutorial highlighted just a few elements of what is possible with interactive data visualization in Jupyter Notebook using Bokeh. Bokeh works very well in creating web-based dashboard and applications. Much more detail about the libraries and examplers are available from the following links.

1. Bokeh user guide: https://docs.bokeh.org/en/latest/docs/user_guide.html
2. Gallery: https://docs.bokeh.org/en/latest/docs/gallery.html
3. The World Happiness Report dataset: https://www.kaggle.com/unsdsn/world-happiness
4. The country_to_continent dataset: https://www.kaggle.com/statchaitya/country-to-continent
5. Airbnb NYC open data: https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data
6. Interactive Data Visualization in Python With Bokeh: https://realpython.com/python-data-visualization-bokeh/
7. Data Visualization with Bokeh in Python, Part II: Interactions: https://towardsdatascience.com/data-visualization-with-bokeh-in-python-part-ii-interactions-a4cf994e2512