# Altair: Your Paper's Best Friend
There are a number of modules and packages that help you create charts and graphs and other visualizations in Python. Some popular ones include:
  * Matplotlib
  * Seaborn (sits on top of Matplotlib)
  * Bokeh
  * Altair
  
I have used a number of different approaches, and I've found Altair to be the most intuitive, quick to use, and aesthetically pleasing. So that's what we'll introduce today.

In [None]:
import pandas as pd
import altair as alt
import numpy as np

<img src="excellence.jpg">

## Five Principles
Within reason:

1. Above all else show the data
2. Maximize your data-ink ratio
3. Erase non-data ink
4. Erase redundant data ink
5. Revise and edit

## Visualizing VIA Reliability on the Kitchener Corridor
Using the outputs we created in the Pandas tutorial, we are going to produce some visualizations and summaries that help us communicate reliability on the Kitchener Corridor portion of the VIA Rail network. Let's load our data now






In [None]:
otp = pd.read_csv('kitchener_otp.csv')
times = pd.read_csv('kitchener_times.csv')
otp

## Charts, Marks, and Encodings
First, let's start with an introduction to Altair's `Chart` object. The `Chart` class holds the basis for creating a chart, and takes a Pandas dataframe as a data source. From there we can make marks, and encode data into channels.

In [None]:
alt.Chart(otp).mark_bar().encode(
    alt.X('otp:Q', title="On-Time Performance (%)"),
    alt.Y('station:N', title=""),
    alt.Row('train:N', title=""),
    alt.Color('station:N', legend=None)
).properties(
    width=800,
    height=200,
    title='On-Time Performance in the Kitchener Corridor'
).configure(
    font='Raleway'
).configure_axis(
    grid=False,
    domain=False
)

## Example: Station Reliability by Train
Let's find an interesting and useful way to show station reliability by train using a much nicer version of a "box and whiskers" plot:

<img src="box_and_whisker.jpg">

Our steps will be as follows:
1. Construct a `DataFrame` with the values we need
2. Plot each element of the plot (bars and dots) individually
3. Combine them together into a plot
4. Erase what we don't need

In [None]:
train = 87
quantiles = times[['train', 'prettyname', 'depDelta']].groupby(['train', 'prettyname'], as_index=False).quantile([0.05, 0.25, 0.5, 0.75, 0.95]).reset_index()
quantiles.columns = ['id', 'q', 'train' ,'station', 'depDelta']
quantiles = quantiles[['q', 'train', 'station', 'depDelta']].pivot(index=['train', 'station'], columns='q', values='depDelta').reset_index()
quantiles.columns = ['train', 'station', 'q5', 'q25', 'q50', 'q75', 'q95']
quantiles

In [None]:
plotData = quantiles[quantiles.train == train]

# Here we create a dummy dataframe that we can use for the grey area plot that is added via "area"
bandData = pd.DataFrame({'station': plotData.station.unique(), 'low': -1, 'high': 5})

#Sort allows us to specify orders
tops = alt.Chart(plotData).mark_bar(width=1, color='#708090').encode(
    alt.X('station:N', sort=['Malton', 'Brampton', 'Georgetown', 'Guelph', 'Kitchener', 'Stratford', "St. Mary's", 'London', 'Stratford', 'Wyoming'], title=None),
    alt.Y('q75:Q', title='Schedule Deviation (min)'),
    alt.Y2('q95:Q', title=None)
)

bottoms = alt.Chart(plotData).mark_bar(width=1, color='#708090').encode(
    alt.X('station:N'),
    alt.Y('q5:Q'),
    alt.Y2('q25:Q')
)

medians = alt.Chart(plotData).mark_circle(color="#f58426").encode(
    alt.X('station:N'),
    alt.Y('q50:Q')
)

area = alt.Chart(bandData).mark_area(fill='#EEEEEE', opacity=0.5).encode(
    alt.X('station:N'),
    alt.Y('low:Q'),
    alt.Y2('high:Q')
)

(area+tops+bottoms+medians).configure(
font='Raleway'
).configure_view(
strokeWidth=0
).configure_axis(
grid=False, 
domain=False
).configure_axisX(
    labelAngle=0,
    tickSize=0
).properties(
width=1080, 
height=400, 
title=f"VIA Arrival Reliabilities for Train {train} on the Kitchener Corridor"
)
