**SA433 &#x25aa; Data Wrangling and Visualization &#x25aa; Fall 2024**

# Lesson 7. Multi-View Composition &mdash; Layers

## In this lesson...

- Over the past few lessons, we've learned how to use multiple encoding channels to simultaneously visualize several variables


- Unfortunately, as the number of encoding channels increase, a chart can quickly become difficult to read


- An alternative: compose multiple charts in a way that facilitates rapid comparisons


- Over the next few lessons, we will examine a variety of ways to perform **multi-view composition**


- In this lesson, we will focus on **layering**: placing compatible charts directly on top of each other

<hr style="border-top: 2px solid gray; margin-top: 1px; margin-bottom: 1px"></hr>

## Weather data

* Before we begin, let's import Pandas and Altair:

In [1]:
import pandas as pd
import altair as alt

- In the same folder as this notebook, there is a CSV file `data/weather.csv` containing weather statistics for Seattle and New York


- Let's load the dataset and peek at the first 5 rows:

In [2]:
df = pd.read_csv('data/weather.csv')
df.head()

Unnamed: 0,location,date,precipitation,temp_max,temp_min,wind,weather
0,Seattle,2012-01-01,0.0,12.8,5.0,4.7,drizzle
1,Seattle,2012-01-02,10.9,10.6,2.8,4.5,rain
2,Seattle,2012-01-03,0.8,11.7,7.2,2.3,rain
3,Seattle,2012-01-04,20.3,12.2,5.6,4.7,rain
4,Seattle,2012-01-05,1.3,8.9,2.8,6.1,rain


<hr style="border-top: 2px solid gray; margin-top: 1px; margin-bottom: 1px"></hr>

## Layered charts

- One way to combine multiple charts is to **layer** marks on top of each other


- If the underlying scale domains of the `X` and `Y` encodings are compatible, we can merge them to form **shared axes**


- If either of the `X` or `Y` encodings is not compatible, we can create a **dual-axis chart**, which overlays marks using separate scales and axes

<hr style="border-top: 2px solid gray; margin-top: 1px; margin-bottom: 1px"></hr>

## Shared axes

- Let's start by creating an area chart, showing the average minimum and maximum temperatures in each month:

In [3]:
# Solution
alt.Chart(df).mark_area().encode(
    alt.X('month(date):T'),
    alt.Y('mean(temp_max):Q'),
    alt.Y2('mean(temp_min):Q')
)

- Let's refine this further by using a color encoding to differentiate between the different locations in our dataset


- Let's also adjust the mark opacity so that we can see any overlapping areas

In [4]:
# Solution
alt.Chart(df).mark_area(opacity=0.3).encode(
    alt.X('month(date):T'),
    alt.Y('mean(temp_max):Q'),
    alt.Y2('mean(temp_min):Q'),
    alt.Color('location:N')
)

- This chart does a nice job showing the temperature ranges


- What if we also want to emphasize the *middle* of the range?

- First, let's create a line chart showing the average temperature midpoint for each month
    - How can we create a new variable containing the daily temperature midpoint?

In [5]:
# Solution
alt.Chart(df).transform_calculate(
    temp_mid='(datum.temp_min + datum.temp_max) / 2'
).mark_line().encode(
    alt.X('month(date):T'),
    alt.Y('mean(temp_mid):Q'),
    alt.Color('location:N')
)

- Now we can combine the area chart and the line chart with **layering**


- First, let's redefine the charts, assigning them to variables `temp_min_max` and `temp_mid`, respectively:

In [6]:
# Solution
temp_min_max = alt.Chart(df).mark_area(opacity=0.3).encode(
    alt.X('month(date):T'),
    alt.Y('mean(temp_max):Q'),
    alt.Y2('mean(temp_min):Q'),
    alt.Color('location:N')
)

temp_mid = alt.Chart(df).transform_calculate(
    temp_mid='(datum.temp_min + datum.temp_max) / 2'
).mark_line().encode(
    alt.X('month(date):T'),
    alt.Y('mean(temp_mid):Q'),
    alt.Color('location:N')
)

- Then, we can layer these charts on top of each other with the syntax 

    ```python
    chart1 + chart2
    ```
    <span style="margin-top:10px;"></span>

    where `chart1` is the first layer and `chart2` is a second layer drawn on top

In [7]:
# Solution
temp_min_max + temp_mid

- Now we have a multi-layer chart! 


- However, the y-axis title (though informative) has become a bit awkward...


- Let's give the `Y` encoding channel a more compact title


- If we give an encoding channel a title in one of the layers, it will automatically be used as a shared  title for all the layers:

In [8]:
# Solution
temp_min_max = alt.Chart(df).mark_area(opacity=0.3).encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('mean(temp_max):Q').title('Average Temperature (degrees C)'),
    alt.Y2('mean(temp_min):Q'),
    alt.Color('location:N')
)

temp_mid = alt.Chart(df).transform_calculate(
    temp_mid='(datum.temp_min + datum.temp_max) / 2'
).mark_line().encode(
    alt.X('month(date):T'),
    alt.Y('mean(temp_mid):Q'),
    alt.Color('location:N')
)

temp_min_max + temp_mid

❓ **Exercise 1.**
What happens if the `Y` encoding channel has a custom title in both layers? Modify the code above to find out.

*Write your notes here. Double-click to edit.*

*Notes.* If the `Y` encoding channel has a custom title in both layers, Altair displays both titles, separated by a comma.

- We used the `+` operator above to layer 2 charts


- This is actually a shorthand for Altair's `alt.layer()` function


- We can generate an identical layered chart using `alt.layer()`, like this:

In [9]:
# Solution
alt.layer(temp_min_max, temp_mid)

❓ **Exercise 2.**
Note that the order of inputs to a layer matters, as subsequent layers will be drawn on top of earlier layers. Try swapping the order of the charts in the cells above. What happens? *Hint.* Look closely at the color of the `line` marks.

*Write your notes here. Double-click to edit.*

*Notes.* If `temp_mid` is drawn before `temp_min_max`, the line chart appears underneath the area chart, giving the lines a more muted appearance.

<hr style="border-top: 2px solid gray; margin-top: 1px; margin-bottom: 1px"></hr>

## Dual-axis charts

- To illustrate how dual-axis charts work in Altair, let's look at precipitation alongside temperature in Seattle


- First, let's create a line plot that shows average monthly precipitation in Seattle:

In [10]:
precip = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_line(
    interpolate='monotone',
    stroke='grey'
).encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('mean(precipitation):Q').title('Average Precipitation')
)

precip

❓ **Exercise 3.**
What do the `interpolate` and `stroke` mark properties do? What are valid values for these properties? Find the relevant part of the Altair documentation.

*Write your notes here. Double-click to edit.*

*Notes.* [Here is the Altair documentation on mark properties.](https://altair-viz.github.io/user_guide/marks/index.html#mark-properties)

- Next, let's add another layer to our chart, consisting of average monthly temperature data:

In [None]:
temp = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_area(opacity=0.3).encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('mean(temp_max):Q').title('Average Temperature (degrees C)'),
    alt.Y2('mean(temp_min):Q')
)

precip = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_line(
    interpolate='monotone',
    stroke='grey'
).encode(
    alt.X('month(date):T'),
    alt.Y('mean(precipitation):Q').title('Average Precipitation')
)

# Layer the charts


In [11]:
# Solution
temp = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_area(opacity=0.3).encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('mean(temp_max):Q').title('Average Temperature (degrees C)'),
    alt.Y2('mean(temp_min):Q')
)

precip = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_line(
    interpolate='monotone',
    stroke='grey'
).encode(
    alt.X('month(date):T'),
    alt.Y('mean(precipitation):Q').title('Average Precipitation')
)

# Layer the charts
temp + precip

- This isn't so great, because the precipitation values use a much smaller range of the y-axis than the temperature values!

- By default, in a layered chart, common encoding channels use the same *scale*
    - Recall that the scale of an encoding channel defines how variable values are transformed to visual values, and includes things such as the domain, type (e.g. `'sqrt'`, `'log'`), and colors

- For our chart, this means that the `X` and `Y` encoding channels share the same domain by default


- This default behavior assumes that the layered values have the same units


- However, this doesn't work with our example, since we are combining temperature values (degrees Celsius) with precipitation values (inches)!


- To use different `Y` encoding channel scales for each layer, we can tell Altair to **resolve** these scales *independently*, like this:

In [None]:
temp = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_area(opacity=0.3).encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('mean(temp_max):Q').title('Average Temperature (degrees C)'),
    alt.Y2('mean(temp_min):Q')
)

precip = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_line(
    interpolate='monotone',
    stroke='grey'
).encode(
    alt.X('month(date):T'),
    alt.Y('mean(precipitation):Q').title('Average Precipitation')
)

# Layer the charts and resolve the y-axis scale


In [12]:
# Solution
temp = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_area(opacity=0.3).encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('mean(temp_max):Q').title('Average Temperature (degrees C)'),
    alt.Y2('mean(temp_min):Q')
)

precip = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_line(
    interpolate='monotone',
    stroke='grey'
).encode(
    alt.X('month(date):T'),
    alt.Y('mean(precipitation):Q').title('Average Precipitation')
)

# Layer the charts and resolve the y-axis scale
(temp + precip).resolve_scale(y='independent')

- That looks better!

- Now we know how to produce dual-axis charts, but...

- Use dual-axis charts sparingly, since they are prone to misinterpretation
    - When possible, you might consider transformations that map different variables to the same units
    - For example, you might show unitless values, such as quantiles or relative percentage change, instead of absolute values

- Here's one way to give the viewer a visual cue as to which mark corresponds to which axis: we can
    1. specify a color for the area and line marks, and
    2. modify the axis titles to use the same colors

- Something like this:

In [13]:
temp = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_area(
    opacity=0.3,
    color='blue'
).encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('mean(temp_max):Q') 
        .title('Average Temperature (degrees C)')
        .axis(titleColor='blue'),
    alt.Y2('mean(temp_min):Q')
)

precip = alt.Chart(df).transform_filter(
    'datum.location == "Seattle"'
).mark_line(
    interpolate='monotone',
    stroke='red'
).encode(
    alt.X('month(date):T'),
    alt.Y('mean(precipitation):Q') 
        .title('Average Precipitation')
        .axis(titleColor='red')
)

# Layer the charts and resolve the y-axis scale
(temp + precip).resolve_scale(y='independent')

<hr style="border-top: 2px solid gray; margin-top: 1px; margin-bottom: 1px"></hr>

## Problems

### Problem 1

Create a chart with two lines that show the median minimum temperature and the median maximum temperature for each month in New York. Your final product should look like this:

![](img/newyork.svg)

In [14]:
# Solution
minimum = alt.Chart(df).transform_filter(
      'datum.location == "New York"'
).mark_line().encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('median(temp_min):Q').title('Median minimum temperature'),
)

maximum = alt.Chart(df).transform_filter(
      'datum.location == "New York"'
).mark_line().encode(
    alt.X('month(date):T'),
    alt.Y('median(temp_max):Q').title('Median maximum temperature'),
)

minimum + maximum

### Problem 2

Create a dual-axis line chart that shows the average monthly wind speed and the average monthly precipitation with separate lines. Use monotone interpolation and different colors for both lines. You should end up with something that looks like this:

![](img/dual.svg)

In your opinion, is this a good visualization? Why or why not?

In [15]:
# Solution
wind = alt.Chart(df).mark_line(
    interpolate='monotone'
).encode(
    alt.X('month(date):T').title('Month'),
    alt.Y('mean(wind):Q').title('Average wind speed (m/s)')
)

precip = alt.Chart(df).mark_line(
    color='red',
    interpolate='monotone'
).encode(
    alt.X('month(date):T'),
    alt.Y('mean(precipitation):Q').title('Average precipitation (mm)')
)

(wind + precip).resolve_scale(y='independent')

*Write your answer here. Double-click to edit.*

*Solution.* The visualization above is not good, as-is. Most importantly, it's not clear which line corresponds to which y-axis!

### Problem 3

The file `data/gapminder.csv` contains the Gapminder data we used in Lessons 2 and 3.

In this problem, you will create a scatter plot with text labels.

1. Create a scatter plot of life expectancy vs fertility. Use only data for the year 2000 in the South Asia region.

2. Create a second scatter plot with the same filtered data and variables, but this time, use `.mark_text()`. Map the country names to the `Text` encoding channel.

3. Layer the two charts you created. 

4. Modify the chart you created in Step 2 so that the text labels are appropriately offset from the points. To do this, adjust the appropriate mark properties using the correct keyword arguments in `.mark_text()`; for example, `align=...`, `baseline=...`, `dx=...`, and/or `dy=...`.

5. Modify the chart you created in Step 1 to have proper axis labels.

Your final product should look like this:

![](img/gap.svg)

In [16]:
# Solution
gap_df = pd.read_csv('data/gapminder.csv')
gap_df.head()

points = alt.Chart(gap_df).transform_filter(
    '(datum.year == 2000) && (datum.cluster == "South Asia")'
).mark_point().encode(
    alt.X('life_expect:Q').title('Average life expectancy (years)'),
    alt.Y('fertility:Q').title('Number of children per woman')
)

text = alt.Chart(gap_df).transform_filter(
    '(datum.year == 2000) && (datum.cluster == "South Asia")'
).mark_text(
    align='right',
    dx=-5
).encode(
    alt.X('life_expect:Q'),
    alt.Y('fertility:Q'),
    alt.Text('country:N')
)

points + text

<hr style="border-top: 2px solid gray; margin-top: 1px; margin-bottom: 1px"></hr>

## Notes and sources

- These lesson notes are based on the [Visualization Curriculum](https://uwdata.github.io/visualization-curriculum/) by the University of Washington


- [Altair documentation on layered charts](https://altair-viz.github.io/user_guide/compound_charts.html#layered-charts)