**SA433 &#x25aa; Data Wrangling and Visualization &#x25aa; Fall 2024**

# Lesson 8. Multi-View Composition &mdash; Facet, Concatenate, Repeat

## In this lesson...

- In this lesson, we will focus on 3 more ways to perform multi-view composition


- **Faceting charts** involves partitioning data into multiple charts, organized in rows and/or columns


- **Concatenating charts** involves positioning arbitrary charts in horizontal or vertical stacks


- **Repeating charts** involves taking a base chart specification and applying it to multiple variables

<hr style="border-top: 2px solid gray; margin-top: 1px; margin-bottom: 1px"></hr>

## Weather data, revisited

* First, let's import Pandas and Altair:

In [1]:
import pandas as pd
import altair as alt

- In this lesson, we're going to use the same weather data we did in the previous lesson


- In the same folder as this notebook, there is a CSV file `data/weather.csv` containing weather statistics for Seattle and New York


- Let's load the dataset and peek at the first 5 rows:

In [2]:
df = pd.read_csv('data/weather.csv')
df.head()

Unnamed: 0,location,date,precipitation,temp_max,temp_min,wind,weather
0,Seattle,2012-01-01,0.0,12.8,5.0,4.7,drizzle
1,Seattle,2012-01-02,10.9,10.6,2.8,4.5,rain
2,Seattle,2012-01-03,0.8,11.7,7.2,2.3,rain
3,Seattle,2012-01-04,20.3,12.2,5.6,4.7,rain
4,Seattle,2012-01-05,1.3,8.9,2.8,6.1,rain


<hr style="border-top: 2px solid gray; margin-top: 1px; margin-bottom: 1px"></hr>

## Faceting charts

- **Faceting** involves subdividing a dataset into groups and creating a separate chart for each group


- Let's start with a basic histogram of the daily high temperature in Seattle:

In [3]:
alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_bar().encode(
    alt.X('temp_max:Q').bin().title('Temperature (degrees C)'),
    alt.Y('count():Q')
)

### Columns and rows of facets

- What if we want to see how this distribution changes based on the weather type (e.g. sun, rain, fog)?
    - Note that the variable `weather` contains this information

- We can specify **columns** of faceted charts, with each column corresponding to a different weather type


- First, we specify a **base chart**


- Then, we apply the `.facet()` method to the base chart and specify that each column of charts corresponds to a different value of `weather`

In [4]:
# Solution
base = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_bar().encode(
    alt.X('temp_max:Q').bin().title('Temperature (degrees C)'),
    alt.Y('count():Q')
).properties(
    width=150,
    height=150
)

base.facet(
    column=alt.Column('weather:N').title('Weather Type')
)

- What if we want to see the histograms of the daily high temperatures in New York as well?


- We can modify our code above and specify **rows** of faceted charts, with each row corresponding to a different value of `location`, like this:

In [5]:
# Solution
base = alt.Chart(df).mark_bar().encode(
    alt.X('temp_max:Q').bin().title('Temperature (degrees C)'),
    alt.Y('count():Q')
).properties(
    width=150,
    height=150
)

base.facet(
    row=alt.Row('location:N').title('Location'),
    column=alt.Column('weather:N').title('Weather Type')
)

- We end up with a matrix of charts: we subdivided the data based on the *combination* of values from two variables


- You can also specify rows *without* specifying columns

### Wrapping facets

- We can also **wrap** faceted charts to a fixed number of columns, like this:

In [6]:
# Solution
base = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_bar().encode(
    alt.X('temp_max:Q').bin().title('Temperature (degrees C)'),
    alt.Y('count():Q')
).properties(
    width=150,
    height=150
)

base.facet(
    facet=alt.Facet('weather:N').title('Weather Type'),
    columns=3
)

- Note the keyword `columns` &mdash; not `column`!

### Independent vs. shared axes and scales

- By default, faceted charts share axes and scales

- For instance, in the example below (copied from above),
    - each row of faceted charts shares the same y-axis, and 
    - each column of faceted charts shares the same x-axis

In [7]:
base = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_bar().encode(
    alt.X('temp_max:Q').bin().title('Temperature (degrees C)'),
    alt.Y('count():Q')
).properties(
    width=150,
    height=150
)

base.facet(
    facet=alt.Facet('weather:N').title('Weather Type'),
    columns=3
)

- Shared axes and scales helps the viewer compare values accurately

- Sometimes, though, it makes sense to have independent axes and scale for each faceted chart
    - For example, if the range of values in the cells differs significantly

- Similar to layered charts, faceted charts also support **resolving** the axes or scales independently or in a shared manner


- We can request `independent` y-axes using the `.resolve_axis()` method, like this:

In [8]:
# Solution
base = alt.Chart(df).mark_bar().encode(
    alt.X('temp_max:Q').bin().title('Temperature (degrees C)'),
    alt.Y('count():Q')
).properties(
    width=150,
    height=150
)

base.facet(
    row=alt.Row('location:N').title('Location'),
    column=alt.Column('weather:N').title('Weather Type')
).resolve_axis(
    y='independent'
)

- Now each faceted chart has its own y-axis, but the y-axes all still have the same scale


- On the other hand, the columns of faceted charts still share an x-axis


- We can take this further and request `independent` scales to be used on the y-axes using the `.resolve_scale()` method:

In [9]:
# Solution
base = alt.Chart(df).mark_bar().encode(
    alt.X('temp_max:Q').bin().title('Temperature (degrees C)'),
    alt.Y('count():Q')
).properties(
    width=150,
    height=150
)

base.facet(
    row=alt.Row('location:N').title('Location'),
    column=alt.Column('weather:N').title('Weather Type')
).resolve_scale(
    y='independent'
)

- Now we see faceted charts with different y-axis scale domains!

❓ **Exercise 1.** Do you think the chart above with independent y-axis scales is an effective visualization? Why or why not?

*Write your notes here. Double-click to edit.*

### Faceting works on layered charts

- These same faceting operations also work on layered charts, like the ones we produced in the previous lesson


- For example, let's take the layered temperature chart we created in the previous lesson, and create facets for each location:

In [None]:
temp_min_max = alt.Chart(df).mark_area(opacity=0.3).encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('mean(temp_max):Q').title('Average Temperature (degrees C)'),
    alt.Y2('mean(temp_min):Q'),
    alt.Color('location:N')
)

temp_mid = alt.Chart(df).transform_calculate(
    temp_mid='(datum.temp_min + datum.temp_max) / 2'
).mark_line().encode(
    alt.X('month(date):T'),
    alt.Y('mean(temp_mid):Q'),
    alt.Color('location:N')
)

# Create wrapped faceted charts, one for each location


In [10]:
# Solution
temp_min_max = alt.Chart(df).mark_area(opacity=0.3).encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('mean(temp_max):Q').title('Average Temperature (degrees C)'),
    alt.Y2('mean(temp_min):Q'),
    alt.Color('location:N')
)

temp_mid = alt.Chart(df).transform_calculate(
    temp_mid='(datum.temp_min + datum.temp_max) / 2'
).mark_line().encode(
    alt.X('month(date):T'),
    alt.Y('mean(temp_mid):Q'),
    alt.Color('location:N')
)

# Create wrapped faceted charts, one for each location
(temp_min_max + temp_mid).facet(
    facet=alt.Facet('location:N').title('Location')
)

<hr style="border-top: 2px solid gray; margin-top: 1px; margin-bottom: 1px"></hr>

## Concatenating charts

- Faceting creates multiple smaller plots that show separate subdivisions of the data


- However, we might wish to create a multi-view display with different views of the *same* dataset (not subsets) or views involving *different* datasets


- **Concatenating** charts allows us to stack arbitrary charts vertically or horizontally

### Horizontal concatenation

- Let's start with a basic line chart showing the average high temperature per month for both New York and Seattle:

In [11]:
alt.Chart(df).mark_line().encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('mean(temp_max):Q').title('Average High Temperature (degrees C)'),
    alt.Color('location:N')
).properties(
    width=240,
    height=180
)

- What if we want to compare not just temperature over time, but also precipitation and wind levels?


- We can create these 3 plots, and then concatenate them with the `|` operator so that they're side-by-side:

In [None]:
temp = alt.Chart(df).mark_line().encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('mean(temp_max):Q').title('Average High Temperature (degrees C)'),
    alt.Color('location:N')
).properties(
    width=240,
    height=180
)

precip = alt.Chart(df).mark_line().encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('mean(precipitation):Q').title('Average Precipitation (mm)'),
    alt.Color('location:N')
).properties(
    width=240,
    height=180
)

wind = alt.Chart(df).mark_line().encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('mean(wind):Q').title('Average Wind Speed (m/s)'),
    alt.Color('location:N')
).properties(
    width=240,
    height=180
)

# Concatenate!


In [12]:
# Solution
temp = alt.Chart(df).mark_line().encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('mean(temp_max):Q').title('Average High Temperature (degrees C)'),
    alt.Color('location:N')
).properties(
    width=240,
    height=180
)

precip = alt.Chart(df).mark_line().encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('mean(precipitation):Q').title('Average Precipitation (mm)'),
    alt.Color('location:N')
).properties(
    width=240,
    height=180
)

wind = alt.Chart(df).mark_line().encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('mean(wind):Q').title('Average Wind Speed (m/s)'),
    alt.Color('location:N')
).properties(
    width=240,
    height=180
)

# Concatenate!
temp | precip | wind

- The functional form `alt.hconcat(chart1, chart2)` is equivalent to `chart1 | chart2`

### Vertical concatenation

- Vertical concatenation works similarly to horizontal concatenation, but with the `&` operator or the `alt.vconcat()` function


- Also, note that horizontal and vertical concatenation can be combined!

❓ **Exercise 2.**
Rewrite the concatenation code above to stack the charts vertically instead of horizontally.

In [13]:
# Solution
temp & precip & wind

# Alternate solution
# alt.vconcat(temp, precip, wind)

❓ **Exercise 3.**
What happens if you write something like `temp | (precip & wind)`? Before you run the code, try to predict what the chart will look like.

In [14]:
# Solution
temp | (precip & wind)

<hr style="border-top: 2px solid gray; margin-top: 1px; margin-bottom: 1px"></hr>

## Repeating charts

- The concatenation operators above are quite general, allowing arbitrary charts to be combined

- However, the example above was a bit verbose
    - We have 3 very similar charts, but we still have to define them separately and then concatenate them

- Can we be <del>lazier</del> more efficient? 


- For cases where only one or two variables are changing, the `.repeat()` method provides a convenient shortcut for creating multiple, similar charts

- First, we create a template of the base chart, using placeholders for the variables we want to use in encodings:
    - `alt.repeat('column')` for variables we want to lay out across columns
    - `alt.repeat('row')` for variables we want to lay out across rows

- Then, we apply `.repeat()` to the base chart to specify lists of variables we want to use for the columns and rows with keyword arguments `column=...` and `row=...`, respectively

- For example, we can replicate what we did above like this:

In [15]:
# Solution
base = alt.Chart(df).mark_line().encode(
    alt.X('month(date):T').title('Month'),
    alt.Y(alt.repeat('column')).aggregate('mean').type('quantitative'),
    alt.Color('location:N')
).properties(
    width=240,
    height=180
)

base.repeat(
    column=['temp_max', 'precipitation', 'wind']
)

- Take a closer look at the methods applied to the `Y` encoding:
    - `.aggregate()` specifies the aggregate function to apply to the variable
    - `.type()` specifies the data type of the variable

- We need to apply methods here instead of the usual shorthand, like `mean()` and `:Q`, because unfortunately, Altair *cannot* parse this:

    ```python
    alt.Y("mean(alt.repeat('column')):Q")
    ```

- We can also use the `.repeat()` method to *wrap* a set of charts to a fixed number of columns with the `alt.repeat('repeat')` placeholder and the `repeat=...` and `columns=...` keyword arguments of the `.repeat()` method, like this:

In [16]:
# Solution
base = alt.Chart(df).mark_line().encode(
    alt.X('month(date):T').title('Month'),
    alt.Y(alt.repeat('repeat')).aggregate('mean').type('quantitative'),
    alt.Color('location:N')
).properties(
    width=240,
    height=180
)

base.repeat(
    repeat=['temp_max', 'precipitation', 'wind'],
    columns=2
)

- We can also use `row` and `column` repetition together, for example, to create a [scatter plot matrix (SPLOM)](https://en.wikipedia.org/wiki/Scatter_plot#Scatter_plot_matrices)


- Given a collection of variables to inspect, a SPLOM provides a grid of all pairwise plots of those variables, allowing us to assess potential associations


- Let's use the `.repeat()` method to create a SPLOM for the `temp_max`, `precipitation`, and `wind` variables, filtered for the Seattle location:

In [17]:
# Solution
base = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_point(size=15).encode(
    alt.X(alt.repeat('column')).type('quantitative'),
    alt.Y(alt.repeat('row')).type('quantitative')
).properties(
    width=150,
    height=150
)

base.repeat(
    row=['temp_max', 'precipitation', 'wind'],
    column=['wind', 'precipitation', 'temp_max']
)

❓ **Exercise 4.**
Modify the code above to get a better understanding of chart repetition. Try adding another variable (`temp_min`) to the SPLOM. What happens if you rearrange the order of the field names in either the `row` or `column` keyword arguments to the `.repeat()` method?

In [18]:
# Solution - play around with the code
base = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_point(size=15).encode(
    alt.X(alt.repeat('column')).type('quantitative'),
    alt.Y(alt.repeat('row')).type('quantitative')
).properties(
    width=150,
    height=150
)

base.repeat(
    row=['temp_min', 'temp_max', 'precipitation', 'wind'],
    column=['wind', 'precipitation', 'temp_max', 'temp_min']
)

<hr style="border-top: 2px solid gray; margin-top: 1px; margin-bottom: 1px"></hr>

## What's next?

- Configuring our visualizations so that the user can *interact* with them in meaningful ways


- Creating maps to visualize *spatial* data

<hr style="border-top: 2px solid gray; margin-top: 1px; margin-bottom: 1px"></hr>

## Problems

### Problem 1

The file `data/gapminder.csv` in the same folder as this notebook contains the Gapminder data we used in Lessons 2 and 3.

Create a 2x3 matrix of scatter plots showing life expectancy vs fertility in the year 2000, with each region of the world having its own scatter plot. You should end up with something that looks like this:

![](img/gapfacet.svg)

In [19]:
# Solution
gap_df = pd.read_csv('data/gapminder.csv')

base = alt.Chart(gap_df).transform_filter(
    'datum.year == 2000'
).mark_point().encode(
    alt.X('life_expect:Q').title('Average life expectancy (years)'),
    alt.Y('fertility:Q').title('Number of children per woman'),
    alt.Color('cluster:N').legend(None)
).properties(
    width=200,
    height=200
)

base.facet(
    facet=alt.Facet('cluster:N').title('Region'),
    columns=3
)

### Problem 2

Using the Gapminder data from Problem 1, create a 4x5 matrix of line charts showing the average life expectancy over time among countries in the Europe & Central Asia region. Each country in the region should have its own line chart. You should end up with something that looks like the figure below.

*Hint.* Use the `.properties()` and `.configure_title()` methods on the *combined* chart to add the title, and change the alignment and anchor of the title.

![](img/gapfacet2.svg)

In [20]:
# Solution
base = alt.Chart(gap_df).transform_filter(
    'datum.cluster == "Europe & Central Asia"'
).mark_line().encode(
    alt.X('year:O').title('Year'),
    alt.Y('life_expect:Q').title('Average Life Expectancy (years)')
).properties(
    width=150,
    height=150
)

base.facet(
    facet=alt.Facet('country:N', title=None),
    columns=5
).properties(
    title='Average Life Expectancy Over Time in Europe & Central Asia'
).configure_title(
    align='center',
    anchor='middle'
)

### Problem 3

Modify the code we wrote in this lesson to create a dashboard of Seattle weather, like the figure below.

*Hints.*

- You can decompose this dashboard into 3 separate parts, each of which we've written code for above.
- Use the concatenation operators to put these 3 parts together.
- Play around with the widths and heights to align the charts in an appealing way.
- Use the `.properties()` and `.configure_title()` methods on the *combined* chart to add the title, and change the font size, alignment, anchor, and vertical offset of the title.

![](img/dashboard.svg)

In [21]:
# Solution
splom_base = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_point(size=15).encode(
    alt.X(alt.repeat('column')).type('quantitative'),
    alt.Y(alt.repeat('row')).type('quantitative')
).properties(
    width=125,
    height=125
)

splom = splom_base.repeat(
    row=['temp_max', 'precipitation', 'wind'],
    column=['wind', 'precipitation', 'temp_max']
)

month_line_base = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_line().encode(
    alt.X('month(date):T').title('Month'),
    alt.Y(alt.repeat('row')).aggregate('mean').type('quantitative')
).properties(
    width=200,
    height=125
)

month_line = month_line_base.repeat(
    row=['temp_max', 'precipitation', 'wind']
)

type_hist_base = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_bar().encode(
    alt.X('temp_max:Q').bin().title('Temperature (degrees C)'),
    alt.Y('count():Q')
).properties(
    width=135,
    height=125
)

type_hist = type_hist_base.facet(
    facet=alt.Column('weather:N').title('Weather Type')
)

((splom | month_line) & type_hist).properties(
    title='Seattle Weather Dashboard'
).configure_title(
    fontSize=16,
    align='center',
    anchor='middle',
    dy=-10
)

<hr style="border-top: 2px solid gray; margin-top: 1px; margin-bottom: 1px"></hr>

## Notes and sources

- These lesson notes are based on the [Visualization Curriculum](https://uwdata.github.io/visualization-curriculum/) by the University of Washington


- [Altair documentation on layered and multi-view charts](https://altair-viz.github.io/user_guide/compound_charts.html)