# Learning Module - Bokeh

### Here we will show you how to use some of the basic functionalities of Bokeh, a python visualization library.
Keep in mind that there are several other libraries, each one with their own pros and cons, and, as of now, none of them can be classified as the best overall.

Bokeh, Plotly and Matplotlib are the most well known in data visualization but there's more.

Some of these can be used to create rich and interactive visualizations that can be server directly to the end-user in a web-page (Bokeh is one of them)

Bokeh is open-source, which means that you can see how it is built, and you can also help built it!

## Data !

Let's use some covid data taken from the Jonh Hopkin's University.

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("covid19data_cleaned.csv")

In [3]:
cumusum = df.groupby("Country").sum()

In [4]:
from bokeh.io import output_file, show
from bokeh.palettes import Category20c,Turbo256
from bokeh.plotting import figure
from bokeh.transform import cumsum
from bokeh.io import output_notebook

from math import pi



Run this so that the plots are shown here

In [5]:
output_notebook()

## Ya'll like pies?
#### Everybody likes pies. That was retorical.
*I dont remember the last time I had one thou*

In [6]:
n = 15

In [7]:
n_largest = cumusum["confirmed_cases"].nlargest(n)

We need data for the pie. 

Let us setup the recipe first.

In [8]:
data = pd.Series(n_largest).reset_index(name='value').rename(columns={'index':'Country'})
data['angle'] = data['value']/data['value'].sum() * 2*pi
data['color'] = Category20c[n]

In [9]:
data

Unnamed: 0,Country,value,angle,color
0,US,11202979.0,1.723609,#3182bd
1,India,8873541.0,1.365219,#6baed6
2,Brazil,5876464.0,0.90411,#9ecae1
3,France,2041293.0,0.314059,#c6dbef
4,Russia,1932711.0,0.297353,#e6550d
5,Spain,1496864.0,0.230297,#fd8d3c
6,United Kingdom,1394299.0,0.214517,#fdae6b
7,Argentina,1318384.0,0.202837,#fdd0a2
8,Italy,1205881.0,0.185528,#31a354
9,Colombia,1205217.0,0.185426,#74c476


### Pie Chart!

#### Not an actual pie, I am sorry.

In [10]:
p = figure(plot_height=350, title="Top {} Confirmed cases".format(n), toolbar_location=None,
           tools="hover", tooltips="@Country: @value", x_range=(-0.5, 1.0))

p.wedge(x=0, y=1, radius=0.4,
        start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
        line_color="white", fill_color='color', legend_field='Country', source=data)

p.axis.axis_label=None
p.axis.visible=False
p.grid.grid_line_color = None
show(p)

In [11]:
n_largest = cumusum["deaths"].nlargest(n)
data = pd.Series(n_largest).reset_index(name='value').rename(columns={'index':'Country'})
data['angle'] = data['value']/data['value'].sum() * 2*pi
data['color'] = Category20c[n]

In [12]:
p = figure(plot_height=350, title="Top {} Deaths".format(n), toolbar_location=None,
           tools="hover", tooltips="@Country: @value", x_range=(-0.5, 1.0))

p.wedge(x=0, y=1, radius=0.4,
        start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
        line_color="white", fill_color='color', legend_field='Country', source=data)

p.axis.axis_label=None
p.axis.visible=False
p.grid.grid_line_color = None

show(p)

## Now, since we got the basics of pie charts and data sources out of the way, let us just do a regular line.

### First lets pick a country to see the evolution of cases throughout time!

In [13]:
country = "Portugal"

In [14]:
mask = df["Country"] ==country

In [15]:
selected_country = df[mask]

In [16]:
fig = figure(width= 600,height=500,tools="hover,pan,wheel_zoom,box_zoom,reset",title="{} cases evolution".format(country),x_axis_type="datetime",tooltips="value: @y")

In [17]:
x = pd.to_datetime(selected_country["date"])
y = selected_country["confirmed_cases"].astype(int)

In [18]:
fig.line(x = x,y = y,legend_label="Confirmed by Day")

In [19]:
show(fig)

## Feels a bit empty no?

#### Lets correlate the cases to the deaths.

In [20]:
y = selected_country["deaths"].astype(int)

In [21]:
fig.line(x = x,y = y,legend_label="Death by Day",color="red")

In [22]:
show(fig)

### Yet again, we still have space for more!
#### Now let us add the cumulative data for both cases and deaths.

These lines share the same x, with bokeh we can simplify the way we do this.
And this time we want to be able to zoom, but only on the x axis.

In [23]:
cumsum_selected = selected_country[["confirmed_cases","deaths"]].cumsum()

In [24]:
y = cumsum_selected["deaths"]

fig.line(x = x,y = y,legend_label="Cumulative Death",color="yellow")

y = cumsum_selected["confirmed_cases"]
fig.line(x = x,y = y,legend_label="Cumulative Cases",color="orange")

show(fig)

### Ok. Now it is to hard to make sense out of this.

Lets keep thing neat and tidy, so that we dont overwhelm our end-users with a lot of information.

### As an alternative, lets try to plot both the deaths and confirmed cases, but as % variances of the day before.

In [25]:
country = "Portugal"
mask = df["Country"] ==country
selected_country = df[mask][["confirmed_cases","deaths"]].pct_change()
fig = figure(width= 600,height=500,tools="hover,pan,wheel_zoom,reset",title="{} cases evolution".format(country),x_axis_type="datetime",tooltips="value: @y")
x = selected_country.index
y =selected_country["confirmed_cases"]
fig.line(x = x,y = y,legend_label="% difference per day",color="orange")
y =selected_country["deaths"]

fig.line(x = x,y = y,legend_label="% difference per day",color="red")

show(fig)


### Might be an interesting view of the data. However, lets use the absolute values. However, with less granularity. Let us see how Covid was increasing by week. (7D)

In [32]:
country = "Portugal"
mask = df["Country"] ==country
selected_country = df[mask]

In [33]:
#selected_country = selected_country.resample("2D") #Attention check.

In [34]:
df.index = pd.to_datetime(df["date"])

In [35]:
import numpy as np

In [36]:
selected_country

Unnamed: 0_level_0,date,Country,deaths,confirmed_cases
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-01-22,2020-01-22,Portugal,0.0,0.0
2020-01-23,2020-01-23,Portugal,0.0,0.0
2020-01-24,2020-01-24,Portugal,0.0,0.0
2020-01-25,2020-01-25,Portugal,0.0,0.0
2020-01-26,2020-01-26,Portugal,0.0,0.0
...,...,...,...,...
2020-11-12,2020-11-12,Portugal,78.0,5839.0
2020-11-13,2020-11-13,Portugal,69.0,6653.0
2020-11-14,2020-11-14,Portugal,55.0,6602.0
2020-11-15,2020-11-15,Portugal,76.0,6035.0


In [37]:
selected_country = selected_country.resample("7D").agg([np.mean,np.sum,np.std])

In [38]:
x = selected_country.index
y = selected_country["confirmed_cases"]["mean"]

In [39]:
fig = figure(width= 600,height=500,tools="hover,pan,wheel_zoom,box_zoom,reset",title="{} cases evolution".format(country),x_axis_type="datetime",tooltips="value: @y")
fig.line(x = x,y = y,legend_label="Mean confirmed by week",color="orange")
show(fig)

In [40]:
conf_cases = selected_country["confirmed_cases"]
upper = conf_cases["mean"] + conf_cases["std"]
lower = conf_cases["mean"] - conf_cases["std"]

In [41]:
source = { "x": x, "upper" :upper,"lower":lower}

In [42]:
from bokeh.models import Band, ColumnDataSource

In [43]:
band = Band(base="x", lower="lower", upper="upper", source=ColumnDataSource(data=source), level='underlay',
            fill_alpha=0.2, line_width=1, line_color='orange',fill_color="orange")

fig.add_layout(band) #we need to add this one explicitly..
show(fig)

In [44]:
fig.varea(x="x",y1= "upper", y2= "lower",legend_label="std",color="orange",fill_alpha=0.2,source=source) #This also works, but then I wouldnt be able to make a pun with bands.

### Lets do the same, but now, lets separate the data per season.

In [45]:
winter = pd.date_range(start='21/12/2019', end='20/03/2020')
spring =pd.date_range(start='20/03/2020', end='20/06/2020')
summer =pd.date_range(start='20/06/2020', end='22/10/2020')
autumn = pd.date_range(start='22/10/2020', end='21/12/2020')


In [46]:
country = "Portugal"
mask = df["Country"] ==country
selected_country = df[mask]
selected_country = selected_country.resample("7D").agg([np.mean,np.sum,np.std])

In [47]:
fig = figure(width= 600,height=500,tools="hover,pan,wheel_zoom,box_zoom,reset",title="{} cases evolution".format(country),x_axis_type="datetime",tooltips="value: @y")


In [48]:
mask = selected_country.index.isin(winter)

winter = selected_country[mask]
x = winter.index
y = winter["confirmed_cases"]["mean"]
source = {
    "x":x,
    "upper":y + winter["confirmed_cases"]["std"],
    "lower":y - winter["confirmed_cases"]["std"],
}
fig.varea(x="x",y1= "upper", y2= "lower",color="blue",fill_alpha=0.2,source=source)
fig.line(x = x,y = y,legend_label="Winter cases",color="blue")


In [49]:
mask = selected_country.index.isin(spring)

spring = selected_country[mask]
x = spring.index
y = spring["confirmed_cases"]["mean"]
source = {
    "x":x,
    "upper":y + spring["confirmed_cases"]["std"],
    "lower":y - spring["confirmed_cases"]["std"],
}
fig.varea(x="x",y1= "upper", y2= "lower",color="orange",fill_alpha=0.2,source=source)
fig.line(x = x,y = y,legend_label="Spring cases",color="orange")


In [50]:
mask = selected_country.index.isin(autumn)

autumn = selected_country[mask]
x = autumn.index
y = autumn["confirmed_cases"]["mean"]
source = {
    "x":x,
    "upper":y + autumn["confirmed_cases"]["std"],
    "lower":y - autumn["confirmed_cases"]["std"],
}
fig.varea(x="x",y1= "upper", y2= "lower",color="yellow",fill_alpha=0.2,source=source) 
fig.line(x = x,y = y,legend_label="Autumn cases",color="yellow")


In [51]:
mask = selected_country.index.isin(summer)

summer = selected_country[mask]
x = summer.index
y = summer["confirmed_cases"]["mean"]
source = {
    "x":x,
    "upper":y + summer["confirmed_cases"]["std"],
    "lower":y - summer["confirmed_cases"]["std"],
}
fig.varea(x="x",y1= "upper", y2= "lower",color="red",fill_alpha=0.2,source=source)
fig.line(x = x,y = y,legend_label="Summer cases",color="red")


In [52]:
show(fig)

### Lets just organize that code

In [53]:
winter = pd.date_range(start='21/12/2019', end='20/03/2020')
spring =pd.date_range(start='20/03/2020', end='20/06/2020')
summer =pd.date_range(start='20/06/2020', end='22/10/2020')
autumn = pd.date_range(start='22/10/2020', end='21/12/2020')

colors = ["blue","orange","red","yellow"]
seasons = [winter,spring,summer,autumn]
labels = ["Winter","Spring","Summer","Autumn"]



In [54]:
fig = figure(width= 600,height=500,tools="hover,pan,xwheel_zoom,reset",title="{} cases evolution".format(country),x_axis_type="datetime",tooltips="value: @y")

for i in range(4):
    mask = selected_country.index.isin(seasons[i])

    season = selected_country[mask]
    x = season.index
    y = season["confirmed_cases"]["mean"]
    source = {
        "x":x,
        "upper":y + season["confirmed_cases"]["std"],
        "lower":y - season["confirmed_cases"]["std"],
    }
    fig.varea(x="x",y1= "upper", y2= "lower",color=colors[i],fill_alpha=0.2,source=source)
    fig.line(x = x,y = y,legend_label="{} cases".format(labels[i]),color=colors[i])


show(fig)

## The line should be continuous, right?

In [55]:
winter = pd.date_range(start='21/12/2019', end='20/03/2020')
spring =pd.date_range(start='20/03/2020', end='20/06/2020')
summer =pd.date_range(start='20/06/2020', end='22/10/2020')
autumn = pd.date_range(start='22/10/2020', end='21/12/2020')


colors = ["blue","orange","red","yellow"]
seasons = [winter,spring,summer,autumn]
labels = ["Winter","Spring","Summer","Autumn"]



In [56]:
fig = figure(width= 600,height=500,tools="hover,pan,xwheel_zoom,reset",title="{} cases evolution".format(country),x_axis_type="datetime",tooltips="value: @y")

for i in range(4):
    mask = selected_country.index.isin(seasons[i])

    season = selected_country[mask]
    x = season.index
    y = season["confirmed_cases"]["mean"]
    source = {
        "x":x,
        "upper":y + season["confirmed_cases"]["std"],
        "lower":y - season["confirmed_cases"]["std"],
    }
    fig.varea(x="x",y1= "upper", y2= "lower",color=colors[i],fill_alpha=0.2,source=source)
    fig.line(x = x,y = y,legend_label="{} cases".format(labels[i]),color=colors[i])


show(fig)

In [57]:
#categorical groups by season ?  heatmap? (Months -> Cases) TODO