There are many visualization libraries available in python world. Visualization is tremendous useful in notebook enviroment, most of them could be used in Jupyter Notebook. And most of them can also be used in Apache Zeppelin Notebook, but maybe a little difference for some libraries. This tutorial will teach you how to use these popular visualization libraries in Apache Zeppelin. To be noted, you need to use IPython interpreter to make these libraries work in Zeppelin. IPython interpreter is only available starting from Zeppelin 0.8.0.

We would cover the following visualiztion libraries:

* Matplotlib
* Pandas
* Seaborn
* Plotnine
* Bokeh
* Holoviews
* Altair
* Plotly

# Matplotlib

Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. The usage of Matplotlib in Zeppelin is the same as Jupyter Notebook. The key is to put `%matplotlib inline` before using Matplotlig. Below is one simple example, for more usage of Matplotlib, you can refer this [link](https://matplotlib.org/).

In [2]:
%python.ipython

%matplotlib inline

import matplotlib.pyplot as plt

plt.plot([1,4,3,9])
plt.ylabel('some numbers')
plt.show()


# Pandas

Pandas provide high level api for visualization on DataFrame. It uses Matplotlib for its visualization underlying, so the usage is the same as Matplotlib. 

In [4]:
%python.ipython

%matplotlib inline

import pandas as pd
import numpy as np

ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()
ts.plot()

# Seaborn

Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics. Its usage in Zeppelin is the same as in Jupyter. For seaborn usage please refer this [link](https://seaborn.pydata.org/)


In [6]:
%python.ipython

%matplotlib inline

import seaborn as sns
sns.set(style="ticks")

# Load the example dataset for Anscombe's quartet
df = sns.load_dataset("anscombe")

# Show the results of a linear regression within each dataset
sns.lmplot(x="x", y="y", col="dataset", hue="dataset", data=df,
           col_wrap=2, ci=None, palette="muted", size=4,
           scatter_kws={"s": 50, "alpha": 1})


# Plotnine

plotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2. The grammar allows users to compose plots by explicitly mapping data to the visual objects that make up the plot.


In [8]:
%python.ipython

%matplotlib inline  

from plotnine import *
from plotnine.data import mtcars

(ggplot(mtcars, aes('wt', 'mpg'))
 + geom_point())


In [9]:
%python.ipython

(ggplot(mtcars, aes('wt', 'mpg', color='factor(gear)'))
 + geom_point()
 + stat_smooth(method='lm')
 + facet_wrap('~gear'))

# bokeh

[Bokeh](https://bokeh.pydata.org/en/latest/) is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.



In [11]:
%python.ipython

from bokeh.io import output_notebook, show
from bokeh.plotting import figure

output_notebook()

fig = figure()
fig.line([1,2], [3,4])
show(fig)



In [12]:
%python.ipython

from bokeh.server.server import Server
from bokeh.application import Application
from bokeh.application.handlers.function import FunctionHandler
from bokeh.plotting import figure, ColumnDataSource

def make_document(doc):
    fig = figure(title='Line plot!', sizing_mode='scale_width')
    fig.line(x=[1, 2, 3], y=[1, 4, 9])

    doc.title = "Hello, world!"
    doc.add_root(fig)

# Set up the Application
handler = FunctionHandler(make_document)
app = Application(handler)

doc = app.create_document()
# notebook_url must be the zeppelin server url
show(app, notebook_url='localhost:18086')

# Holoviews

HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple. With HoloViews, you can usually express what you want to do in very few lines of code, letting you focus on what you are trying to explore and convey, not on the process of plotting. Compared to bokeh, HoloViews is a high level visualization library. Refer this [link](http://holoviews.org/) for more usage tutorial of HoloViews.


In [14]:
%python.ipython

import logging
logging.getLogger("params").setLevel(logging.ERROR)

import pandas as pd
import numpy as np
import holoviews as hv
from holoviews import opts
hv.extension('bokeh')

from bokeh.plotting import figure
from bokeh.io import show,output_notebook

output_notebook()


In [15]:
%python.ipython

station_info = pd.read_csv('https://raw.githubusercontent.com/holoviz/holoviews/master/examples/assets/station_info.csv')
station_info.head()


In [16]:
%python.ipython

scatter = hv.Scatter(station_info, 'services', 'ridership')
scatter

In [17]:
%python.ipython

layout = scatter + hv.Histogram(np.histogram(station_info['opened'], bins=24), kdims=['opened'])
layout

# HvPlot

[HvPlot](https://hvplot.holoviz.org/) is a high-level plotting API for the PyData ecosystem built on HoloViews.



In [19]:
%python.ipython

from bokeh.io import output_notebook
output_notebook()

import pandas as pd, numpy as np


idx = pd.date_range('1/1/2000', periods=1000)
df  = pd.DataFrame(np.random.randn(1000, 4), index=idx, columns=list('ABCD')).cumsum()

import hvplot.pandas
df.hvplot()


In [20]:
%python.ipython

from hvplot.sample_data import us_crime, airline_flights

crime = us_crime.read()
print(type(crime))
crime.hvplot.line(x='Year', y='Violent Crime rate')


In [21]:
%python.ipython

us_crime.plot.bivariate('Burglary rate', 'Property crime rate', legend=False, width=500, height=400) * \
us_crime.plot.scatter(  'Burglary rate', 'Property crime rate', color='black', size=15, legend=False) + \
us_crime.plot.table(['Burglary rate', 'Property crime rate'], width=350, height=350)

# Altair

[Altair](https://altair-viz.github.io/) is a declarative statistical visualization library for Python, based on Vega and Vega-Lite, and the source is available on GitHub.

With Altair, you can spend more time understanding your data and its meaning. Altair’s API is simple, friendly and consistent and built on top of the powerful Vega-Lite visualization grammar. This elegant simplicity produces beautiful and effective visualizations with a minimal amount of code.

There's still one bug in Altair which make altair doesn't work very well with Zeppelin. The first run may not succeed, after Zeppelin load altair javascript properly, the following run will work.



In [23]:
%python.ipython

import altair as alt

alt.renderers.enable('zeppelin')

# load a simple dataset as a pandas DataFrame
from vega_datasets import data
cars = data.cars()

alt.Chart(cars).mark_point().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color='Origin',
)

In [24]:
%python.ipython

import altair as alt
import numpy as np
import pandas as pd

# Compute x^2 + y^2 across a 2D grid
x, y = np.meshgrid(range(-5, 5), range(-5, 5))
z = x ** 2 + y ** 2

# Convert this grid to columnar data expected by Altair
source = pd.DataFrame({'x': x.ravel(),
                     'y': y.ravel(),
                     'z': z.ravel()})

alt.Chart(source).mark_rect().encode(
    x='x:O',
    y='y:O',
    color='z:Q'
)

[plotly.py](https://plotly.com/python/) is an interactive, open-source, and browser-based graphing library for Python


In [26]:
%python.ipython

import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(y=[2, 1, 4, 3]))
fig.add_trace(go.Bar(y=[1, 4, 3, 2]))
fig.update_layout(title = 'Hello Figure')

print("%html {0}".format(fig.to_html()))


In [27]:
%python.ipython
