[Information Visualization Tutorials](https://infovis.fh-potsdam.de/tutorials/) · FH Potsdam · Summer 2020


# Tutorial 3: Interaction techniques

Interactive capabilities make information visualizations come alive. They let viewers adjust a visualization to change its parameters, focus on an interesting aspect of a dataset, and get more detailed information about specific items. Arguably, the Jupyter environment itself is already an interactive environment. By adding interactive elements to visualizations, you can explore the data more directly and rapidly.

In this tutorial, you will get to know several techniques that provide interactivity to your notebooks and visualizations. 

✏️ *Remember to follow the pencils to get ideas for edits in this tutorial*



In [1]:
# first we include the two libraries that we will work with
import pandas as pd
import altair as alt

In this tutorial, we are using some basic information about cities with more than 100'000 inhabitants as an example dataset.

In [2]:
cities = pd.read_csv("http://infovis.fh-potsdam.de/tutorials/data/cities.csv")

✏️ *Curious about the structure and contents of `cities`? Apply your data examination skills from last tutorial:*

## Zoom & pan

Visualizations created with Altair can be equipped with a simple zoom function that can be triggered either with the scroll gesture of a trackpad or the scrollwheel of the mouse. Depending on the chart type, this gives the viewer the option to gradually adjust which section of each axis is visible.

In the following scatterplot of large cities, their populations and elevations are mapped to x and y. By adding the `.interactive()` method the axes of the chart become dynamic. Try zooming into the scatterplot and notice how the labels on the axes move and adjust. You can also drag the visualization to change the current viewport. Double clicking anywhere resets the axes of the chart.

In [3]:
# create a simple scatterplot with two dimensions for x and y
alt.Chart(cities).mark_circle().encode(
    # position based on population and elevation:
    x='population',
    y='elevation',
).interactive() # this makes the axes interactive: now you can zoom & pan


✏️ *Now many dots overlap each other. Would it help to adjust their appearance, e.g., their size or opacity? You can set these default parameters as arguments to the `mark_circle()` call. Have a look at the [mark property channels](https://altair-viz.github.io/user_guide/encoding.html#encoding-channels) provided by Altair*


## Details-on-demand

Because our ability to discern multiple channels in a visualization is limited, the visual variables that can be utilized to represent data dimensions are finite. We need to decide which information is encoded visually and which information can be made available interactively on demand. 

Tooltips constitute a classic technique to provide such additional, more detailed information on data elements. Your viewers will be able to reveal the details by hovering the mouse pointer over the respective visual elements.

To include a tooltip feature in above visualization, you need to tell Altair, which attributes should be included. You do this by adding a `tooltip` to the `encode()` method call:

In [4]:
alt.Chart(cities).mark_circle().encode(
    x='population',
    y='elevation',
    tooltip=['name', 'country'] # add tooltip for name and country
).interactive() 


✏️ *Add other attributes to the tooltip that might be of interest for a reader*

## Interactive legend

At this point the scatterplot is not entirely instructive if we cannot distinguish between the cities. Maybe it would help to focus on a subset of data elements… such as by continent. The dataset contains almost 2000 cities spread across all continents. Let's adjust the color according to continent and turn the legend into an interactive filter.

For this to work we need to create a selection element first, which we then add to the chart declaration as above, with the addition of changing the opacity:

In [5]:
# create a selection based on the column continent, linked to the legend:
selection = alt.selection(type="multi", fields=['continent'], bind='legend')

alt.Chart(cities).mark_circle().encode(
    x='population',
    y='elevation',
    tooltip=['name', 'country'],
    # we're adding a color coding, which comes with a legend
    color = 'continent',
    # the opacity is based on the selection
    opacity=alt.condition(selection, alt.value(1), alt.value(0))
).add_selection(selection) # the selection is applied to the chart

✏️ *What happens if you replace continent with country?*

## Input elements

Have you tried the pencil exercise? You must have noticed that both the color coding and legend did not scale well to that many countries. The interactive legend only works well when you have a manageable number of entries, which is also true for using color for more than a handful of categories. There are just too many countries in the dataset to be handled via an interactive legend. Let's find out how many there are exactly. To do this we can use the `unique` method of the DataFrame and then Python's `len()` function to count the countries:

In [6]:
countries = cities.country.unique()
len(countries)

159

Currently the individual countries are collected by their order of appearance in the dataset:

In [7]:
countries

array(['Netherlands', 'Egypt', 'Spain', 'Germany', 'Nigeria', 'Iran',
       'Canada', 'Ivory Coast', 'Japan', 'United States of America',
       'United Arab Emirates', 'Mexico', 'Venezuela', 'Ghana', 'Turkey',
       'Ethiopia', 'Australia', 'Yemen', 'India', 'El Salvador',
       'Morocco', 'Kazakhstan', 'Libya', 'Sudan', 'Costa Rica', 'Ukraine',
       'Syria', 'Algeria', 'Portugal', 'Malaysia', 'Ecuador', 'Brazil',
       'Jordan', 'Italy', 'Uzbekistan', 'South Korea', 'France',
       "People's Republic of China", 'Madagascar', 'Philippines', 'Chile',
       'Romania', 'Peru', 'Colombia', 'Tanzania', 'Israel',
       'Turkmenistan', 'Eritrea', 'Greece', 'New Zealand', 'Iraq',
       'Pakistan', 'Argentina', 'Azerbaijan', 'Mali', 'Thailand',
       'Central African Republic', 'Bosnia and Herzegovina',
       'Dominican Republic', 'Switzerland', 'Equatorial Guinea',
       'Cambodia', 'Georgia', 'Cuba', 'Mozambique', 'Lebanon',
       'United Kingdom', 'Serbia', 'Angola', 'Norway',

✏️ *Find a way to get them `sorted()`*


Now let's provide an interactive element with which the viewer can focus the scatterplot on the cities belonging to a particular country. We can use `countries` and populate a dropdown menu as an interactive selection element.

In [8]:
# creating a dropdown element with the countries as options and a name
dropdown = alt.binding_select(options=countries, name="Select a country")

# applying the selection to the country column and using the dropwdown element
selection = alt.selection(type="single", fields=['country'], bind=dropdown)

The following block is identical to the one above. Except now the chart will be accompanied by a country selector, and not an interactive legend.

In [9]:
alt.Chart(cities).mark_circle().encode(
    x='population',
    y='elevation',
    tooltip=['name', 'country'],
    color = 'continent',
    opacity=alt.condition(selection, alt.value(1), alt.value(0))
).add_selection(selection)

✏️ *Consider using the circle sizes to encode another dimension not currently shown!*

## Linked views

Oftentimes we are working with complex datasets, which require selections to be made in the visualizations themselves. This allows us to specify data ranges according to particular interests as they emerge in the interaction with the visualizations.

The following example places two charts next to each other. Selecting a circle in the scatterplot of continent averages on the left, will trigger the respective histogram of city populations in the chart to the right.


In [10]:
# we again create a selection, which links the interaction based on continents
selection = alt.selection(type="multi", fields=['continent'])

# this sets up a layout of charts with a size of 250x250 pixels each
base = alt.Chart(cities).properties(width=250, height=250)

# this creates a scatterplot of city population & elevation averages by continent
scat = base.mark_circle().encode(
    x='mean(population)',
    y='mean(elevation)',
    size='count()',
    tooltip=['continent', alt.Tooltip('count()', title='big cities')],
    color=alt.condition(selection, 'continent', alt.value('lightgray'))
).add_selection(selection)
# this last step specifies where the selection can be made

# histogram showing the distribution of cities according to population
hist = base.mark_bar(color="darkgray").encode(
    x=alt.X('population', bin=alt.Bin(maxbins=30)),
    y='count()',
).transform_filter(selection)
# this last step applies the continent selection from scatterpot to histogram 

# the two charts are arranged together with the pipe operator
scat | hist

✏️ *There is a lot to digest in above example. Try replacing the population histogram with an elevation histogram*

## Dynamic queries

To rapidly explore how multiple dimensions relate to one another, you can also select entire regions in a visualization and perform dynamic queries. The following example juxtaposes a (pseudo) map of big cities and a population histogram. In both you can create and move selections directly in the visualizations.

In [11]:
# the type parameter has changed, this time we set intervals in both x and y position
selection = alt.selection(type='interval', encodings=['x', 'y'])

# first we create a pseudo map using the geographic positions (lat and long) for a scatterplot
map = alt.Chart(cities).mark_point(filled=True, size=5).encode(
    x='long',
    y='lat',
    # the dots are rendered black, unless outside of the selection
    color=alt.condition(selection, alt.value("black"), alt.value("lightgray"))
).properties(
    width=400, height=200
).add_selection(selection)

# define background chart
base = alt.Chart().mark_bar(color="black").encode(
    x=alt.X("population:Q", bin=alt.Bin(maxbins=30)),
    y="count()",
).properties(
    width=200, height=200
)

# lightgray background with selection
background = base.encode( color=alt.value('lightgray') ).add_selection(selection)

# black highlights on the transformed data
highlight = base.transform_filter(selection)

# layer the two charts on top of each other
layers = alt.layer(background, highlight, data=cities)

map | layers

✏️ *What would it take to turn the histogram on the right into a scatterplot?*

## Sources

From the Altair documentation:
- [Encodings](https://altair-viz.github.io/user_guide/encoding.html)
- [Interactive Legend](https://altair-viz.github.io/gallery/interactive_legend.html)
- [Bindings, Selections, Conditions: Making Charts Interactive](https://altair-viz.github.io/user_guide/interactions.html)
- [Interactive Scatter Plot and Linked Layered Histogram](https://altair-viz.github.io/gallery/scatter_with_layered_histogram.html)
- [Interactive Crossfilter](https://altair-viz.github.io/gallery/interactive_layered_crossfilter.html)

Cities data from Wikidata: [SPARQL query](https://w.wiki/Nms)
