In [None]:
import pandas as pd
data_file = '../data/iris.data'
colnames = [
    'sepal length',
    'sepal width',
    'petal length',
    'petal width',
    'species'
]
iris = pd.read_csv(data_file, names=colnames, index_col=False)
iris.head()

# Matplotlib

In [None]:
import matplotlib.pyplot as plt
# this enables inline plotting in the notebook
%matplotlib inline

Matplotlib is a well established plotting solution. It is very powerful and can produce high quality figures for publication in a variety of static output formats. It was initially modeled after plotting commands in MatLab and now has an object oriented plotting API in which successive functions are applied to the currently active plot.

For simply exploring data it provides a fairly simple interface.

In [None]:
# Simplest use case
plt.scatter(
    data=iris,
    x='sepal length',
    y='sepal width'
)

Unfortunately, the amount of code needed to achieve many things can quickly escalate. For example if you want to map a categorical data column to colors, you would need to do so manually:

In [None]:
# color maps in matplotlib map a numeric value to a color
# scmap will map a species to a number -- used for plotting 
# revscmap will map that number back to the species -- used for legend

# get a list of unique species
species = iris['species'].unique().tolist()
nspecies = len(species)
# get a colormap with the same number of colors from a pre-defined colormap
cmap = plt.cm.get_cmap('YlGnBu', nspecies)
# forward and reverse dicts of species to numeric index
scmap = {s: c for s, c in zip(species, range(len(species)))}
revscmap = {c: s for c, s in zip(range(len(species)), species)}

fig, ax = plt.subplots(figsize=(8, 6))

im = ax.scatter(
    data=iris,
    x='sepal length',
    y='sepal width',
    c=iris['species'].map(scmap),
    cmap=cmap,
    linewidths=1, edgecolors='grey'
)

# set title and axis labels
fig.suptitle('Iris Dataset')
ax.set_xlabel('sepal length')
ax.set_ylabel('sepal width')

# formatter to label the colorbar with species names
formatter = plt.FuncFormatter(lambda val, loc: revscmap[val])

# fig.colorbar(ticks=range(len(species)), format=formatter)
fig.colorbar(im, ax=ax, ticks=range(len(species)), format=formatter)

plt.show()

An easier way could be to separate your data by category and plot each subset in turn.

In [None]:
species_list = []
for species, grp in iris.groupby('species'):
    plt.plot(
        grp['sepal length'],
        grp['sepal width'],
        label=species,
        linestyle='None',
        marker='.'
    )
    species_list.append(species)

plt.figlegend(species_list, loc='upper right', bbox_to_anchor=(0.853, 0.85))
plt.show()

# Bokeh

Bokeh generates *dynamic* plots. Although the code you write is in python, Bokeh generates a json object containing data and plot specification. This is then used by the Bokeh javascript component to create an interactive visualization in a web browser. All these steps happen quite transparently in a Jupyter notebook environment. 

One weakness here is that it is more challenging to create a publication-quality image.

In [None]:
from bokeh.io import output_notebook
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource
from bokeh.transform import factor_cmap
# this enables inline plotting in the notebook
output_notebook()

In [None]:
source = ColumnDataSource(iris)
species_list = iris['species'].unique().tolist()
p = figure(title='Iris Dataset')
p.circle(x='sepal length', y='sepal width', source=source, size=10,
         legend='species',
         color=factor_cmap('species', 'Category10_3', species_list))
show(p)

# Plotly

Plotly is another library for generating interactive visualizations. It is fairly similar to Bokeh in that respect but can also generate 3D visualizations. Though as we will see later it lacks other functionality from Bokeh.

Although the Python and JavaScript libraries are open source, they are created by and associated with an online service for collaboratieve charts and dashboards. You do *not* need to use these services and can use Plotly in 'offline' mode.

In [None]:
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot
# this enables inline plotting in the notebook
init_notebook_mode(connected=True)

In [None]:
# Basic plotting
data = go.Scatter(
    x=iris['sepal length'],
    y=iris['sepal width'],
    mode='markers'
)
iplot([data])

Plotly like matplotlib does not make it easy to map a categorical column to a color palette easily, but successive calls to `go.Scatter()` will achieve our goal.

In [None]:
data = []
for species, grp in iris.groupby('species'):
    data.append(
        go.Scatter(
            x=grp['sepal length'],
            y=grp['sepal width'],
            mode='markers',
            name=species))

iplot(data)

# HoloViews

HoloViews bridges these plotting backends and lets you switch *relatively* easily between them.

At the time of writing HoloViews version 1.12.1 was the stable release. Concurrently Plotly's offline API had some internal changes which broke the integration. It's been fixed in the development but that patch has not been pushed to release yet so I won't include any more Plotly stuff here.

In [None]:
import holoviews as hv
hv.extension('bokeh', 'matplotlib')

In [None]:
hv.Scatter(iris)

In [None]:
points = hv.Points(iris).opts(
    color='species', cmap='Category10', size=8, width=600, height=400)
points

With a little extra effort we can convert this to a satatic image. 

The differences in plotting back-ends can be automatically bridged to some extent by HoloViews. For example here, we did not have to manually map the species names to values and then map those values to a colormap. HoloViews had already done that for us. On the other hand there is still a sizeable difference in the APIs that is not (yet) bridged. In this case our color legend and size information are completely lost while the tick marks and label formatting are different.

In [None]:
# specify matplotlib as the backend for the already existing 'points' object
# specify a matplotlib-generated png as the final image container
hv.output(
    points.opts(backend='matplotlib'),
    backend='matplotlib',
    fig='png')

However, this is not the main purpose behind HoloViews...