# Cleveland's Research on Decoding

In their paper, *Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods*, William S. Cleveland and Robert McGill set out to determine, *by experiment*, what encodings were most accurate for *decoding* information. The results were expounded upon in Cleveland's book, *The Elements of Graphing Data*. Their results are generally considered the starting point for all evidence-based discussions about the visualizations and static charts.

The idea that charts represent an encoding of data to be decoded by viewers originates with Cleveland and therefore his experiments attempted to answer the following question, **"Of all the possible ways of encoding data, what encodings lead to the most accurate decodings?".**

The encodings he studied were as follows (at the time they were called "perceptual tasks"):

![Perceptual Tasks](../resources/visualization/perceptual_tasks.png){width="50%"}

We will talk more later about the visual elements--the "alphabet" of visualization. For now, it's enough to recognize the elements as the building blocks of many charts we've seen:

* **Position on a common scale** includes both scatter plots and bar charts. For example, the bar chart I drew at the start of the notebook has all the bars positioned on a common scale. 
* **Position on non-aligned scales** is best exemplified by the example Excel chart. When we go to compare the red and purple bars we can see that they are not on aligned scales. 
* **Length** is a general feature of bar charts. 
* **Direction** is related to line charts and charts using line direction to encode relationships. 
* **Angle** is, of course, a common element of pie charts. 
* **Area**, especially circular area, covers what are often called "triple scatter plots" or "bubble charts" made famous by Hans Rosling's TED talk. 
* **Volume** addresses 3d charts made infamous by Excel. 
* **Curvature** relates to Playfair's charts, especially those containing two lines. 
* **Shading** specifically refers to *cloropleths*...maps shaded in different colors to indicate values (yes, there is a word for those!).

Unfortunately, Cleveland presented all his results as charts and so the exact numbers are not available either to create new charts or to cite directly.
This makes describing his results difficult short of reproducing and re-explaining the charts from his paper and book.
I will reproduce some of the results here but the reader is invited to look into his paper, which is described in the Additional Resources.

## Experiment 1

For the first experiment, Cleveland looked into the difference between *position* and *length*. He did this by having subjects decode values from bar charts and stacked bar charts. In a bar chart, the value can be decoded from a bar simply by looking at the top of the bar and comparing it against the y-axis. The viewer only needs *position*.

For a stacked bar chart, for anything other than the bottom "slice", the view must use *length* as well as position. They must judge the position of the bottom of the slice and the position of the top of the slice. It gets somewhat worse when two slices on different bars are being compared.

Note that there is a lot of non-standard language surrounding "bar charts". *Most* of the time, a bar chart refers to what is often called a *horizontal bar chart*. In a horizontal bar chart, the bars run from left to right. A *vertical bar chart* is when the bars run from bottom to top. This is also called a "columnar chart" or "column chart".

For Cleveland (and this text), a bar chart is a vertical bar chart (no need to specify "vertical") and the left to right variant is a horizontal bar chart.

![Experiment 1](../resources/visualization/cleveland_experiment_1.png){width="75%"}

For this experiment, Type 1 and Type 3 are different bar charts and Type 2, 4, and 5 are stacked bar charts. The dot on Type 2 indicates a slice that can be decoded using position. On Types 4 and 5, the dots indicate slices that must be decoded using length.

## Experiment 2

For his second experiment, Cleveland compared decoding accuracy for position (bar charts) versus angles (pie charts) as in the figure below.

![Experiment 2](../resources/visualization/pie_v_bar.png){width="50%"}

## Results

The results are reported below. Because the results were log-normally distributed (something we'll talk about in a later chapter), Cleveland reported log base 2 absolute error. He chose base 2 because errors do not generally change in increments of 10 (so a log base 10 scale didn't help anything). The dot represents the *midmean* or the mean of the interquartile range. The line represents the 95% confidence interval. For now, we can interpret the confidence interval in a Bayesian way and say that, based on the data, there is a 95% probability that the midmean lies in the region. (This is not, however, the Frequentist interpretation).

![Cleveland Results](../resources/visualization/cleveland_results.png){width="50%"}

The results show that there are fairly large decoding errors associated with both stacked bar charts and pie charts as compared to plain bar charts.

Although we can see why decoding angles as compared to position might be difficult, why are length and position to difficult to distinguish?

In the figure below, we show two bars with unaligned scales.

![Unaligned Bars](../resources/visualization/cleveland_bars.png){width="30%"}

![Unaligned Stacked Bars](../resources/visualization/cleveland_framed_bars.png){width="30%"}

The dark bars in the second chart are the same size in both A and B.

## Curve Difference Charts

Cleveland's book expanded on his paper by quite a bit. If you're interested in the specifics, it's worth reading. However, if you simply want to make good charts *based* on the results, you can do better by reading Stephen Few. Nevertheless, I will report on one of the most interesting results where the main story is the *difference* between two curves. We will call such charts "curve-difference charts". They are essentially line charts with two (or more) lines showing magnitudes (exports, imports in this case) but the *message* is the difference (trade balance).

Below we have a reproduction of the Playfair curve-difference chart that shows the balance of trade between England and the East Indies.

![playfair curves](../resources/visualization/playfair_curves.png)

The chief problem with such charts is that the eyes want to do things differently than they should. That is, perceptually, we want to judge perpendicular differences between the curves (red) as if we were navigating a river. However, a correct reading of the chart requires us to gauge vertical distances between the lines (blue) and our eyes just won't go for it.

If you just show the difference, you get the following chart:

![playfair difference](../resources/visualization/playfair_difference.png)

Did you notice the bump in 1763? Probably not. This is a general problem with trying to judge the differences between two curve-difference charts.

The diagram below is reproduced from Cleveland's book shows nine such curve-difference charts on the left. They all look pretty much the same. We've probably looked at or produced many charts just like this without a second thought.

![Curves and Differences between Curves](../resources/visualization/curve_differences.png)

The charts on the right show just the difference between the curves. If our impression from the charts on the left is that they were largely the same or certainly similar, this impression is shattered by the charts on the right. They couldn't be more different. Some of the differences actually move in opposite directions of each other. Does this remind you of the Müller-Lyer Illusion?

Cleveland's research largely started the *scientific* investigation of charts and visualization. It has since started to involve psychology and neuroscience.

Next we turn to the basic elements of charts. This discussion follows Stephen Few in *Show Me the Numbers* first by discussing Colin Ware's *Preattentive Processing* and then *Gestalt Principles of Visualization*.