# Learning Module - Bokeh

### Here we will show you how to use some of the basic functionalities of Bokeh, a python visualization library.
Keep in mind that there are several other libraries, each one with their own pros and cons, and, as of now, none of them can be classified as the best overall.

Bokeh, Plotly and Matplotlib are the most well known in data visualization but there's more.

Some of these can be used to create rich and interactive visualizations that can be server directly to the end-user in a web-page (Bokeh is one of them)

Bokeh is open-source, which means that you can see how it is built, and you can also help built it!

## Data !

Let's use some covid data taken from the Jonh Hopkin's University.

In [2]:
import pandas as pd

In [28]:
df = pd.read_csv("covid19data_cleaned.csv")

In [29]:
cumusum = df.groupby("Country").sum()

In [32]:
from bokeh.io import output_file, show
from bokeh.palettes import Category20c
from bokeh.plotting import figure
from bokeh.transform import cumsum
from bokeh.io import output_notebook

from math import pi



Run this so that the plots are shown here

In [33]:
output_notebook()

## Ya'll like pies?
#### Everybody likes pies. That was retorical.
*I dont remember the last time I had one thou*

In [50]:
n = 15

In [51]:
n_largest = cumusum["confirmed_cases"].nlargest(n)


We need data for the pie. 

Let us setup the recipe first.

In [52]:
data = pd.Series(n_largest).reset_index(name='value').rename(columns={'index':'Country'})
data['angle'] = data['value']/data['value'].sum() * 2*pi
data['color'] = Category20c[n]

In [53]:
data

Unnamed: 0,Country,value,angle,color
0,US,11202979.0,1.723609,#3182bd
1,India,8873541.0,1.365219,#6baed6
2,Brazil,5876464.0,0.90411,#9ecae1
3,France,2041293.0,0.314059,#c6dbef
4,Russia,1932711.0,0.297353,#e6550d
5,Spain,1496864.0,0.230297,#fd8d3c
6,United Kingdom,1394299.0,0.214517,#fdae6b
7,Argentina,1318384.0,0.202837,#fdd0a2
8,Italy,1205881.0,0.185528,#31a354
9,Colombia,1205217.0,0.185426,#74c476


### Pie Chart!

#### Not an actual pie, I am sorry.

In [58]:
p = figure(plot_height=350, title="Top {} Confirmed".format(n), toolbar_location=None,
           tools="hover", tooltips="@Country: @value", x_range=(-0.5, 1.0))

p.wedge(x=0, y=1, radius=0.4,
        start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
        line_color="white", fill_color='color', legend_field='Country', source=data)

p.axis.axis_label=None
p.axis.visible=False
p.grid.grid_line_color = None
show(p)

Now, we add a line and show the plot

In [56]:
n_largest = cumusum["deaths"].nlargest(n)
data = pd.Series(n_largest).reset_index(name='value').rename(columns={'index':'Country'})
data['angle'] = data['value']/data['value'].sum() * 2*pi
data['color'] = Category20c[n]

In [59]:


p = figure(plot_height=350, title="Top {} Deaths".format(n), toolbar_location=None,
           tools="hover", tooltips="@Country: @value", x_range=(-0.5, 1.0))

p.wedge(x=0, y=1, radius=0.4,
        start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
        line_color="white", fill_color='color', legend_field='Country', source=data)

p.axis.axis_label=None
p.axis.visible=False
p.grid.grid_line_color = None

show(p)

## Now, since we got the basics of pie charts and data sources out of the way, let us just do a regular line.

### First lets pick a country to see its evolution of cases throughout time!

In [135]:
country = "Portugal"

In [136]:
mask = df["Country"] ==country

In [137]:
selected_country = df[mask]

In [173]:
fig = figure(width= 600,height=500,tools="hover,pan,wheel_zoom,box_zoom,reset",title="{} cases evolution".format(country),x_axis_type="datetime",tooltips="value: @y")

In [174]:
x = pd.to_datetime(selected_country["date"])
y = selected_country["confirmed_cases"].astype(int)

In [175]:
fig.line(x = x,y = y,legend_label="Confirmed by Day")

In [176]:
show(fig)

## Still a bit to simple right? 

#### Lets correlate the cases to the deaths.

In [177]:
y = selected_country["deaths"].astype(int)

In [178]:
fig.line(x = x,y = y,legend_label="Death by Day",color="red")

In [179]:
show(fig)

### Yet again, we still have space for more!
#### Now let us add the cumulative data for both cases and deaths.

These lines share the same x, with bokeh we can simplify the way we do this.
And this time we want to be able to zoom, but only on the x axis.

In [180]:
cumsum_selected = selected_country[["confirmed_cases","deaths"]].cumsum()

In [181]:
y = cumsum_selected["deaths"]

fig.line(x = x,y = y,legend_label="Cumulative Death",color="yellow")

y = cumsum_selected["confirmed_cases"]
fig.line(x = x,y = y,legend_label="Cumulative Cases",color="orange")

show(fig)

### Ok. Now it is to hard to make sense out of this.

Lets keep thing neat and tidy, so that we dont overwhelm our end-users with a lot of information.

### Lets do the same, but with less granularity. Let us see how Covid was increasing by week. (7D)

In [243]:
country = "Portugal"
mask = df["Country"] ==country
selected_country = df[mask]

In [244]:
selected_country = selected_country.resample("2D") #Attention check.

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'

In [245]:
selected_country.index = pd.to_datetime(selected_country["date"])

In [246]:
import numpy as np

In [247]:
selected_country = selected_country.resample("7D").agg([np.mean,np.sum,np.std])

In [248]:
x = selected_country.index
y = selected_country["confirmed_cases"]["mean"]

In [261]:
fig = figure(width= 600,height=500,tools="hover,pan,wheel_zoom,box_zoom,reset",title="{} cases evolution".format(country),x_axis_type="datetime",tooltips="value: @y")
fig.line(x = x,y = y,legend_label="Mean confirmed by week",color="orange")
show(fig)

In [262]:
conf_cases = selected_country["confirmed_cases"]
upper = conf_cases["mean"] + conf_cases["std"]
lower = conf_cases["mean"] - conf_cases["std"]

In [263]:
source = { "x": x, "upper" :upper,"lower":lower}

In [267]:
from bokeh.models import Band, ColumnDataSource

In [265]:
band = Band(base="x", lower="lower", upper="upper", source=ColumnDataSource(data=source), level='underlay',
            fill_alpha=0.2, line_width=1, line_color='orange',fill_color="orange")

fig.add_layout(band) #we need to add this one explicitly..
show(fig)

In [None]:
fig.varea(x="x",y1= "upper", y2= "lower",legend_label="std",color="orange",fill_alpha=0.2,source=source) #This also works, but then I wouldnt be able to make a pun with bands.