# Data Journalism
## Practical Python exercise : Interactive data visualization

*Damian Trilling and Penny Sheets*

This notebook gives some examples for how to create interactive visualizations for the web.


While we will use 
- `bokeh`
- `pygal`


`bokeh` allows to create interactive visualizations in which users can hover over elements, zoom in, etc.
`pygal` allows to create standard charts with hover-effects.


## Download the sample data
The first time you run this notebook, you will need to download some example data.
You only need to do this once, and should comment out the following two lines again after running them once.

In [None]:
# import bokeh
# bokeh.sampledata.download()

# Interactivity 

As discussed in the literature of week 4, interactivity should have a function. For instance, it can be used to reduce information overload while still providing information 'on demand' if users want to dig into it.

Consider the example below where the user at a first glance can get an idea of the geographical distribution of unemployment, but if they really want to know can even get the exact number for each and every county by hovering over it.

More complicated online demos of bokeh apps:

- https://demo.bokeh.org/movies
- https://demo.bokeh.org/weather


### An example with pygal

For more info and examples, see http://www.pygal.org .

In [21]:
import pygal
%matplotlib inline


In [19]:
bar_chart = pygal.Bar()                                            # Then create a bar graph object
bar_chart.add('Fibonacci', [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55])  # Add some values

In [23]:
# Show in browser:
bar_chart.render_in_browser()
# or save to file:
# bar_chart.render_to_file('bar_chart.svg')            

file:///tmp/tmpz803qx88.html


### An example with bokeh

For more info and examples, see https://bokeh.org/

In [9]:
from bokeh.io import show, output_file
from bokeh.models import LogColorMapper
from bokeh.palettes import Viridis6 as palette
from bokeh.plotting import figure

from bokeh.sampledata.us_counties import data as counties
from bokeh.sampledata.unemployment import data as unemployment

palette.reverse()

counties = {
    code: county for code, county in counties.items() if county["state"] == "tx"
}

county_xs = [county["lons"] for county in counties.values()]
county_ys = [county["lats"] for county in counties.values()]

county_names = [county['name'] for county in counties.values()]
county_rates = [unemployment[county_id] for county_id in counties]
color_mapper = LogColorMapper(palette=palette)

data=dict(
    x=county_xs,
    y=county_ys,
    name=county_names,
    rate=county_rates,
)

TOOLS = "pan,wheel_zoom,reset,hover,save"

p = figure(
    title="Texas Unemployment, 2009", tools=TOOLS,
    x_axis_location=None, y_axis_location=None,
    tooltips=[
        ("Name", "@name"), ("Unemployment rate)", "@rate%"), ("(Long, Lat)", "($x, $y)")
    ])
p.grid.grid_line_color = None
p.hover.point_policy = "follow_mouse"

p.patches('x', 'y', source=data,
          fill_color={'field': 'rate', 'transform': color_mapper},
          fill_alpha=0.7, line_color="white", line_width=0.5)



# save to file
output_file('bokeh-example.html')

# and/or show in browser
show(p)

# Publishing interactivity

Publishing and distributing your static visualizations is straightforward (see week 4 of this course). You can simply save them in any format you like (e.g., `.png` (better than `.jpg` for text and sharp lines), or as a vector graphic (e.g., `.svg`) that allows loss-free scaling.
For example, we could use `plt.save_fig()` for that purpose.

This file, then, can be freely used in any online or offline publication.

But how can we do this online? It's one thing to make a nice interactive visualization in your browser


## SVG graphics


One approach are SVG graphics. That's the route we took in the pygal example above. As you see, you can just open the file in any browser, and the interactive elements (hovering over the bars with your mouse shows the values) work.

However, there is one problem with this approach: First, the possibilities types of interactivity are possible are a bit limited. Second, and more importantly: SVG graphics are sometimes seen as a secrutiy risk (because one could construct a malicious svg file that executes unwanted code); and therefore, many platforms restrict there use (for instance, Wordpress - although you can (partly) circumvent this, for instance by installing a svg plugin).

If you build your own website from scratch, that's less of a problem, of course.


## JavaScript (client-side)

The bokeh example above takes a different approach: It generates an HTML file and java script code that then is used to render the interactive graphic in the users' browser. That means that if we distribute the HTML file (and, for instance, upload it to our own website; I did it [here](http://www.damiantrilling.net/downloads/test.html)), anyone can use it in their browser.

It requires a bit more fiddling, though, to display such a thing inline (for instance, like embedding a picture within a wordpress blog). With a bit of HTML knowledge, though, you can get there.


## Server-side approaches

Both approaches outlined above are *self-contained*: They include all data, all calculations are already made, etc. Especially if you have very large data, or when you want to actually run some python code based on the user input, you will need to run your own (bokeh-) server. That's a cool thing to do (and a nice project to pursue, if you want to experiment a bit), but out of scope for this class.

# Exercise

The example below is from the official bokeh tutorial (https://mybinder.org/v2/gh/bokeh/bokeh-notebooks/master?filepath=tutorial%2F00%20-%20Introduction%20and%20Setup.ipynb ). It plot a complex chart with intearctive hover.

**Try to understand the code (in broad lines) and modify it to try out what happens. Construct a different visualization, or use other (own?) data.**

In [32]:
# import modules and prepare example dataset

from bokeh.models import ColumnDataSource, HoverTool
from bokeh.plotting import figure
from bokeh.sampledata.autompg import autompg_clean as df
from bokeh.transform import factor_cmap

df.cyl = df.cyl.astype(str)
df.yr = df.yr.astype(str)

In [31]:
df

Unnamed: 0,mpg,cyl,displ,hp,weight,accel,yr,origin,name,mfr
0,18.0,8,307.0,130,3504,12.0,70,North America,chevrolet chevelle malibu,chevrolet
1,15.0,8,350.0,165,3693,11.5,70,North America,buick skylark 320,buick
2,18.0,8,318.0,150,3436,11.0,70,North America,plymouth satellite,plymouth
3,16.0,8,304.0,150,3433,12.0,70,North America,amc rebel sst,amc
4,17.0,8,302.0,140,3449,10.5,70,North America,ford torino,ford
5,15.0,8,429.0,198,4341,10.0,70,North America,ford galaxie 500,ford
6,14.0,8,454.0,220,4354,9.0,70,North America,chevrolet impala,chevrolet
7,14.0,8,440.0,215,4312,8.5,70,North America,plymouth fury iii,plymouth
8,14.0,8,455.0,225,4425,10.0,70,North America,pontiac catalina,pontiac
9,15.0,8,390.0,190,3850,8.5,70,North America,amc ambassador dpl,amc


In [29]:
group = df.groupby(by=['cyl', 'mfr'])
source = ColumnDataSource(group)

p = figure(plot_width=800, plot_height=300, title="Mean MPG by # Cylinders and Manufacturer",
           x_range=group, toolbar_location=None, tools="")

p.xgrid.grid_line_color = None
p.xaxis.axis_label = "Manufacturer grouped by # Cylinders"
p.xaxis.major_label_orientation = 1.2

index_cmap = factor_cmap('cyl_mfr', palette=['#2b83ba', '#abdda4', '#ffffbf', '#fdae61', '#d7191c'], 
                         factors=sorted(df.cyl.unique()), end=1)

p.vbar(x='cyl_mfr', top='mpg_mean', width=1, source=source,
       line_color="white", fill_color=index_cmap, 
       hover_line_color="darkgrey", hover_fill_color=index_cmap)

p.add_tools(HoverTool(tooltips=[("MPG", "@mpg_mean"), ("Cyl, Mfr", "@cyl_mfr")]))

show(p)