# Design of Visualizations

## The Four Levels of Measurement

#### Qualitative or categorical types (non-numeric types)
1. **Nominal data**: pure labels without inherent order (no label is intrinsically greater or less than any other). Examples: movie genre and country. 
2. **Ordinal data**: labels with an intrinsic order or ranking (comparison operations can be made between values, but the magnitude of differences are not be well-defined). Examples: letter grade and ranking. 

#### Quantitative or numeric types
3. **Interval data**: numeric values where absolute differences are meaningful (addition and subtraction operations can be made). Example: temperature.
4. **Ratio data**: numeric values where relative differences are meaningful (multiplication and division operations can be made). Examples: word count, mass (kg).

All quantitative-type variables also come in one of two varieties: discrete and continuous.
- **Discrete** quantitative variables can only take on a specific set values at some maximum level of precision.
- **Continuous** quantitative variables can (hypothetically) take on values to any level of precision.

#### Lickert Scale Example

Technically, responses on these kinds of questions should be considered `ordinal` in nature. There is a clear order in response values, but it may not be the case that the differences between consecutive levels are consistent in size. The criteria to move between Strongly Disagree and Disagree might be different from the criteria between Agree and Strongly Agree. However, Likert data is often treated as `interval` to simplify analyses.

## What Experts Say About Visual Encodings?
Experts and researchers have determined the types of visual patterns that allow humans to best understand certain information. In general, humans are able to best understand data encoded with **positional changes** (differences in x- and y- position as we see with scatterplots) and **length changes** (differences in box heights as we see with bar charts and histograms).

Alternatively, humans *struggle* with understanding data encoded with **color hue changes** (as are unfortunately commonly used as an additional variable encoding in scatter plots) and **area changes** (as we see in pie charts, which often makes them not the best plot choice).

## Chart Junk
It's important to think about not only what to put in a chart but also what will be left out. [Chart junk](https://en.wikipedia.org/wiki/Chartjunk) refers to all visual elements in charts and graphs that are not necessary to comprehend the information represented on the graph or that distract the viewer from this information.

Examples of chart junk include:

- Heavy grid lines
- Unnecessary text
- Pictures surrounding the visual
- Shading or 3d components
- Ornamented chart axes

## Data Ink Ratio
The data-ink ratio, credited to Edward Tufte, is directly related to the idea of chart junk. 

$$ data\ ink\ ratio = \frac{amount \ of \ ink \ used \ to \ describe \ the \ data}{amount \ of \ ink \ used \ to \ describe \ everything \ else}$$

The more of the ink in your visual that is related to conveying the message in the data, the better.

Limiting chart junk increases the data-ink ratio.

## Design Integrity

Lie factor depicts the degree to which a visualization distorts or misrepresents the data values being plotted. It is calculated in the following way:

$$ lie\ factor = \frac{\Delta visual \ / \ visual_{start}}{\Delta data \ / \ data_{start}} $$

Any lie factor different than 1 suggests that a visual is distorting the data. When the factor is greater than 1, we are making an effect larger than it actually is and factors less than 1 are hiding the magnitude of an effect.

## Using Color
Three tips for using color effectively:
1. Before adding color to a visualization, start with black and white.
2. When using color, use less intense colors - not all the colors of the rainbow, which is the default in many software applications.
3. Color for communication. Use color to highlight your message and separate groups of interest. Don't add color just to have color in your visualization.

### Designing for Color Blindness
To be sensitive to those with colorblindness, you should use color palettes that **do not move from red to green** without using another element to distinguish this change like shape, position, or lightness. Both of these colors appear in a yellow tint to individuals with the most common types of colorblindness. Instead, **use colors on a blue to orange palette**.

Further Reading
- Tableau Blog: [5 tips on designing colorblind-friendly visualizations](https://www.tableau.com/about/blog/2016/4/examining-data-viz-rules-dont-use-red-green-together-53463)

## Additional Encodings
In general, **color and shape** are best for **categorical** variables, while the **size of marker** can assist in adding additional **quantitative data**.

Only use these additional encodings when absolutely necessary. Often, overuse of these additional encodings suggest you are providing too much information in a single plot. **Instead, it might be better to break the information into multiple individual messages**, so the audience can understand every aspect of your message.