# Introduction to Python Programming

## Walkthrough: Exercise 4.9 - Plotting with Bokeh

First things first, we need to import the functions that we need from the third-party libraries, Pandas and Bokeh. We import all of Pandas, with the abbreviated namespace `pd`, and `figure`, `show` and `output_notebook` from `bokeh.plotting`. We use `output_notebook` here, to allow the plot to be displayed in this Jupyter Notebook - you might want to use `output_file` instead, which will result in the figure being rendered in an HTML file instead (see below). `figure` is used to construct the figure, and `show` is used to render the layout in the end. Speaking of layout, we import `gridplot` from `bokeh.io`, to create rows and columsn of the figures after we've created them. 

In [23]:
import pandas as pd
from bokeh.plotting import figure, show, output_notebook
from bokeh.io import gridplot

Now, we need to read the data from the file into Python. Unlike in the matplotlib example, we will use a Pandas `DataFrame` here. (Note that this approach would work with matplotlib too - we're just choosing to introduce it at the same time as we start doing things with Bokeh.) We use the `read_table` function, with the `index_col` and `header` arguments, to read the tabular data into a dataframe.

In [24]:
inputFile = 'speciesDistribution_tabular.txt'
data = pd.read_table(inputFile, header=0, index_col=0)
data

Unnamed: 0_level_0,Grimston Wood,Hagg Wood,Hetchell Wood N,Hetchell Wood S,Scoreby Wood,Sutton Wood,Wheldrake Wood
taxonID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
A,123,2039,12983,9380,920,883,0
B,1340,9394,8493,13928,3928,293,91
C,11984,19380,948,0,0,893,22649
D,0,9102,9384,949,9301,18990,2949
E,9389,932,4942,19023,19384,0,901
F,4320,0,0,9384,12949,3910,0
G,1283,893,9834,948,0,930,9204
H,0,5839,0,9284,3892,1738,2040
I,0,0,1293,0,9192,819,8173
J,8193,9302,9348,1093,0,0,0


Great! Now we have all of our data loaded and arranged in the way that we need, we can move onto plotting.

Now we use `output_notebook()` to make sure that our plots can be displayed in this Notebook.

In [25]:
output_notebook()

_Note: to save the plot to a file instead, use:_

```Python
output_file('my_amazing_plot.html') # give files descriptive names so you can easily identify them in future
```

Now we can start creating the indivual bar plots, one for each site. We want to display these plots all together, so we should start by storing them in a list. For each site we pull out the bar heights - this is the dataframe column corresponding to the site name; and the taxon names as a list created from the index column of the dataframe.

Next, we create a figure object for the site, using the site name as the title, and setting the range for the x-axis as the taxon names. `tools=[]` disables the interactive tool bar for these plots. Finally, we call the `figure` object's `vbar` method to add the bars to the figure axes. The mid-points of the bars on the axis need to be labelled with the taxon names, so we construct a list comprehension adding 0.5 to the index numbers along the axis. The heights of the bars are given as the column that we pulled out earlier. Once the figure is constructed, we add it to the list of plots.

In [26]:
plots = []
for site in data.columns:
    heights = data[site]
    taxa = list(data.index)
    fig = figure(plot_width=300, plot_height=200, title=site, x_range=taxa, tools=[])
    fig.vbar(x=[n+0.5 for n in range(13)], 
       width=0.8, 
       bottom=0, 
       top=heights, 
       color='firebrick')
    plots.append(fig)

The final step is to lay out and display the plots. Here, we will use the `gridplot` function from `bokeh.io`, but there are plenty of other options. Check out the documentation [here](http://bokeh.pydata.org/en/latest/docs/user_guide/layout.html).

In [27]:
layout = gridplot([[plots[0],plots[1]],\
                   [plots[2],plots[3]],\
                   [plots[4],plots[5]],\
                   [plots[6]]])
show(layout)

As a bonus, let's colour the bars individually according to the taxon that they refer to. There are 13 taxa, which is a bit too large to avoid using some similar colours, so we use the Category20 color palette instead. Several palettes are available via the `bokeh.palettes` module. Choose which one you want, then specify how many colours you need when you use the palette in your plot(s).

In [28]:
from bokeh.palettes import Category20

In [29]:
plots = []
for site in data.columns:
    heights = data[site]
    taxa = list(data.index)
    fig = figure(plot_width=300, plot_height=200, title=site, x_range=taxa, tools=[])
    fig.vbar(x=[n+0.5 for n in range(13)], 
       width=0.8, 
       bottom=0, 
       top=heights, 
       color=Category20[13])
    plots.append(fig)

In [30]:
layout = gridplot([[plots[0],plots[1]],\
                   [plots[2],plots[3]],\
                   [plots[4],plots[5]],\
                   [plots[6]]])
show(layout)