# Choropleth Mapping
### by [Kate Vavra-Musser](https://vavramusser.github.io) for the [R Spatial Notebook Series](https://vavramusser.github.io/r-spatial)

## Introduction

Choropleth maps are one of the most recognizable and widely used types of thematic maps. From election results to disease prevalence, from income distribution to educational attainment, choropleths are the go-to method for showing how values vary across geographic areas.

But creating an effective choropleth requires more than just coloring areas by data values. You need to understand **data classification**, **color scheme selection**, **normalization strategies**, and **design principles** that can make the difference between a map that reveals patterns and one that misleads.

This notebook provides a comprehensive guide to choropleth mapping in R, covering both the technical implementation and the cartographic theory that makes them effective.

In this notebook, we‚Äôll explore how to create a choropleth map using population data by county. [Choropleth maps](https://en.wikipedia.org/wiki/Choropleth_map) are thematic maps where areas are shaded or patterned in proportion to a variable of interest, in this case, population data. These maps help visualize spatial distributions and identify patterns, such as population density variations across geographic areas.

This notebook will use NHGIS population data by county based on the 2020 US Decennial Census.

### What is a Choropleth Map?

A [**choropleth map**](https://en.wikipedia.org/wiki/Choropleth_map) (from Greek *choros* "area" + *plethos* "multitude") uses color or shading to represent the magnitude of a variable across geographic areas. Key characteristics:

- **Area-based:** Values are mapped to polygons (counties, states, census tracts, etc.)
- **Univariate:** Shows one variable at a time
- **Pattern-focused:** Designed to reveal spatial patterns and clusters
- **Comparative:** Allows visual comparison across areas

**Examples:**
- Population density by county
- Unemployment rates by state
- Median income by census tract
- COVID-19 case rates by region
- Election results by precinct

### When to Use Choropleth Maps

**Use choropleths when:**
- ‚úÖ Data is associated with enumeration units (areas)
- ‚úÖ You want to show spatial patterns or clustering
- ‚úÖ Comparing values across regions is important
- ‚úÖ The variable is a rate, ratio, or density

**Avoid choropleths when:**
- ‚ùå Data represents counts of discrete phenomena (use proportional symbols instead)
- ‚ùå Areas vary dramatically in size (can mislead perception)
- ‚ùå Precise values matter more than patterns (use a table)
- ‚ùå Data is point-based, not area-based

### Notebook Goals

At the end of Chapter 2.4: IPUMS NHGIS Data Extraction Using ipumsr, you saved your data extraction as two file formats *ipums_nhgis_example.rds* and *ipums_nhgis_example.csv*.  You will need these files to run this notebook.  If you are working throuhg this chapter without previously completing, Chapter 1.2, you will need to copy the *ipums_nhgis_example.rds* file into your working directory prior to running this notebook.

- Understand when and why to use choropleth maps
- Calculate and normalize data appropriately (counts vs. rates)
- Apply different data classification methods
- Choose appropriate color schemes for your data
- Create choropleths with both ggplot2 and tmap
- Understand the strengths and limitations of classification methods
- Make informed cartographic design decisions
- Avoid common choropleth pitfalls
- Create effective, honest, readable choropleth maps

### ‚ú® Prerequisites ‚ú®

**Required:**
* [Introduction to sf: Reading, Writing, and Inspecting Vector Data](https://platform.i-guide.io/notebooks/9968babe-22e4-4c3d-98e2-d8b45e9672cd)
* [Working with CRS: Reprojection and Transformation](https://platform.i-guide.io/notebooks/76912ca7-73e4-437e-8ecf-0cb456bd7282)
* [Mapping Fundamentals](https://platform.i-guide.io/notebooks/dfe8fd72-f896-4dd2-9d61-6d9982394f1f)

**Recommended:**
* [Preparing Vector Data for Analysis](https://platform.i-guide.io/notebooks/44926d85-7f08-4774-a103-a22ff3876cad)
* [Thematic and Reference Mapping (6.02)](https://platform.i-guide.io/notebooks/2b9f579c-32b0-4078-af39-994bb31d50ec)
* [IPUMS NHGIS Data Extraction Using ipumsr](https://platform.i-guide.io/notebooks/be08e56e-1c08-458e-a230-263c64d386bc)

### üíΩ Data Used in this Notebook üíΩ

**Minnesota County-Level Population Data** (*ipums_nhgis_example.zip*)
- Contains 2020 U.S. Decennial Census population data at the county level for the state of Minnesota
- Source: IPUMS NHGIS
- Created in [IPUMS NHGIS Data Extraction Using ipumsr](https://platform.i-guide.io/notebooks/be08e56e-1c08-458e-a230-263c64d386bc)
- **Download:** [I-GUIDE Platform](https://platform.i-guide.io/datasets/0cb99a7c-97c0-4ffc-a2d7-ff539c8eadae) or [Kate's GitHub](https://github.com/vavramusser/r-spatial/blob/main/ipums_nhgis_example.zip)

### Notebook Overview

1. **Setup**
2. **Data Exploration and Preprocessing**
3. **Understanding Normalization**
4. **Choropleth Mapping with ggplot**
5. **Data Classification Methods**
6. **Color Schemes for Choropleth Maps**
7. **Design Principles for Choropleth Maps**

---

## 1. Setup
This section will guide you through the process of installing essential packages and setting your IPUMS API key.

#### Required Packages

**[ggplot2](https://cran.r-project.org/web/packages/ggplot2/index.html)** ¬∑ Create Elegant Data Visualizations

* [*ggplot*](https://rdrr.io/cran/ggplot2/man/ggplot.html) ¬∑ initialize a ggplot object
* [*geom_sf*](https://rdrr.io/cran/ggplot2/man/ggsf.html) ¬∑ map spatial sf objects
* [*scale_fill_*](https://rdrr.io/cran/ggplot2/man/scale_fill_gradient.html) ¬∑ control fill colors
* [*coord_sf*](https://rdrr.io/cran/ggplot2/man/ggsf.html) ¬∑ coordinate system

**[tmap](https://cran.r-project.org/web/packages/tmap/index.html)** ¬∑ Thematic Maps

* [*tm_shape*](https://rdrr.io/cran/tmap/man/tm_shape.html) ¬∑ specify data to map
* [*tm_polygons*](https://rdrr.io/cran/tmap/man/tm_polygons.html) ¬∑ draw and style polygons
* [*tm_layout*](https://rdrr.io/cran/tmap/man/tm_layout.html) ¬∑ customize map layout

**[classInt](https://cran.r-project.org/web/packages/classInt/index.html)** ¬∑ Classification Intervals

* [classIntervals](https://rdrr.io/cran/classInt/man/classIntervals.html) ¬∑ compute class intervals

**[viridis](https://cran.r-project.org/web/packages/viridis/index.html)** ¬∑ Colorblind-Friendly Color Palettes

**[RColorBrewer](https://cran.r-project.org/web/packages/RColorBrewer/index.html)** ¬∑ [ColorBrewer](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3) Color Palettes

**[sf](https://cran.r-project.org/web/packages/sf/index.html)** ¬∑ Simple Features for R

**[dplyr](https://cran.r-project.org/web/packages/dplyr/index.html)** ¬∑ Data Manipulation

### 1.1 Install and Load Required Packages

If you have not already installed the required packages, uncomment and run the code below.

In [None]:
# install.packages(c("sf", "ggplot2", "tmap", "dplyr", "viridis", "RColorBrewer", "classInt"))

Load the packages into your workspace.

In [None]:
library(sf)
library(ggplot2)
library(tmap)
library(dplyr)
library(viridis)
library(RColorBrewer)
library(classInt)

### 1.2 Load Data

Let's load our Minnesota county-level population data.

If you do not already have the Minnesota county demographic information (*ipums_nhgis_example.zip*) file in your workspace you can **download** it from the [I-GUIDE Platform](https://platform.i-guide.io/datasets/0cb99a7c-97c0-4ffc-a2d7-ff539c8eadae) or [Kate's GitHub](https://github.com/vavramusser/r-spatial/blob/main/ipums_nhgis_example.zip).

The *ipums_nhgis_example.zip* file contains information from the 2010 Decennial Census.

In [None]:
# unzip and load county demographic data
unzip("ipums_nhgis_example.zip")
counties <- st_read("ipums_nhgis_example.shp")

## 2. Data Exploration and Preprocessing

Before mapping, we need to ensure our data is clean and create any derived variables we'll need.

### 2.1 Inspect Data

In [None]:
# view the first few rows of data
head(counties)

In [None]:
# get a list of available attributes (columns)
names(counties)

In [None]:
# check the coordinate reference system
st_crs(counties)

### 2.2 Ensure Data is Valid sf Object

Before mapping, we need to ensure the data is in the correct format and that each geometry is valid. Invalid geometries can prevent accurate area calculation and mapping, so we‚Äôll clean and validate these before moving forward.

First we will convert *dat_shp* to an sf object.

Next we will fix any invalid geometries using st_make_valid() to handle any geometric issues that might interfere with area calculations or plotting.

In [None]:
# Ensure it's an sf object
counties <- st_as_sf(counties)

# Fix any invalid geometries
counties <- st_make_valid(counties)

### 2.2 Transform to Appropriate CRS

For U.S. mapping and area calculations, we'll use Albers Equal Area Conic projection (EPSG:5070):

In this step, we will transform the Coordinate Reference System (CRS) to a standard projection suitable for calculating area.  For this exercise, we will use the CRS 4326.

In [None]:
# Transform to Albers Equal Area for accurate area calculations
counties <- st_transform(counties, crs = 5070)

# Verify transformation
st_crs(counties)

**Why Albers Equal Area?**
- Preserves area (crucial for density calculations)
- Optimized for continental U.S.
- Minimizes distortion across the country
- Standard for U.S. Census Bureau mapping

## 3. Understanding Normalization: Counts vs. Rates vs. Density

Before making a choropleth, you need to understand what type of variable you're mapping.

Population data is often more informative when normalized by area. In this step, we‚Äôll calculate population density for each tract as the number of people per square kilometer. This allows us to compare population concentrations across areas of different sizes.

In the next line of code we first calculate the area of each tract in square kilometers using st_area(), converting the units to numeric values to simplify further calculations.  Then we calculate population density (pop_density) as the total population (pop2020) divided by the area in square kilometers.  And finally, we convert pop_density to a plain numeric variable (without units), which avoids potential issues when visualizing data with ggplot2.

### 3.1 The Problem with Raw Counts

Let's first map raw population (a count):

In [None]:
# Map raw 2020 population
ggplot(counties) +
  geom_sf(aes(fill = CL8AA2020), color = NA) +
  scale_fill_viridis_c(option = "plasma",
                       name = "Population",
                       labels = scales::comma) +
  labs(title = "Raw Population Count by County (2020)",
       subtitle = "Why this is misleading") +
  theme_void()

**The Problem:**
- Large counties appear "high population" simply because they're large
- Small urban counties may have higher populations but don't stand out
- The map shows **size √ó density** rather than just density
- Visually misleading‚Äîeye is drawn to large areas

**Rule:** For choropleth maps, **almost always normalize counts by area or base population**.

### 3.2 Calculating Population Density

The population map isn't very informative, so we'll make another version based on population density (pop_density).  To make the map more readable, we will also customize the color scale and legend. For example, using a logarithmic transformation can better capture population density variations, particularly if there‚Äôs a wide range between low-density and high-density areas.  In this step we:

1. Apply scale_fill_viridis_c() with a log transformation and specific breaks to improve visual contrast across the density range.
2. Adjust the legend position and add descriptive labels for clarity.

This step helps users interpret the data more effectively by adjusting the color scale to better fit the data‚Äôs distribution.

Population density normalizes by area:

In [None]:
# Calculate area in square kilometers and population density
counties <- counties %>%
  mutate(
    area_km2 = as.numeric(st_area(geometry) / 1e6),        # Convert m¬≤ to km¬≤
    pop_density_2020 = CL8AA2020 / area_km2,               # People per km¬≤
    pop_density_2020 = as.numeric(pop_density_2020)        # Remove units
  )

# Check results
summary(counties$pop_density_2020)       # convert to numeric to remove units

### 3.3 Types of Normalized Variables

**Density (per unit area):**
- Population density (people per km¬≤)
- Housing density (units per km¬≤)
- Road density (km of roads per km¬≤)

**Rate (per unit population):**
- Unemployment rate (unemployed per 100 people in labor force)
- Mortality rate (deaths per 100,000 people)
- Vaccination rate (vaccinated per 100 people)

**Percentage:**
- Percent with bachelor's degree
- Percent minority population
- Percent below poverty line

**Ratio:**
- Dependency ratio (young + old / working age)
- Sex ratio (males per 100 females)
- Income-to-poverty ratio

**Index:**
- Gini coefficient (inequality)
- Diversity index
- Deprivation index

All of these are appropriate for choropleth mapping because they're **already normalized**.

## 4. Choropleth Mapping with ggplot

With our data prepared and population density calculated, we can now map the population density across tracts. ggplot2 and geom_sf() allow us to map the polygons by filling each tract according to total population (pop2020), using a gradient color scale to represent low to high population counts.  In this step we do the following:

1. Use geom_sf(aes(fill = pop_density)) to color each tract based on 2020 population (pop2020).
2. Use scale_fill_viridis_c() to apply a colorblind-friendly gradient scale for the population count.
3. Limit the map view to the contiguous United States using coord_sf() with specified latitude and longitude bounds, focusing the map and removing excess whitespace.

This produces a clear choropleth map that allows users to easily identify areas of high and low population density across the U.S.

Now let's create our first proper choropleth using population density.

### 4.1 Simple Continuous Color Scale

In [None]:
# Basic choropleth with continuous scale
ggplot(counties) +
  geom_sf(aes(fill = pop_density_2020), 
          color = NA) +                    # No borders for cleaner look
  scale_fill_viridis_c(
    option = "plasma",
    name = "People per km¬≤",
    na.value = "gray90",                   # Color for missing data
    labels = scales::comma
  ) +
  labs(title = "Population Density by U.S. County (2020)") +
  theme_void() +
  theme(legend.position = "right")

### 4.2 Log Transformation for Skewed Data

Population density is typically highly skewed (few very dense areas, many sparse areas). A log transformation can help:

In [None]:
# Check data distribution
hist(counties$pop_density_2020, 
     breaks = 50,
     main = "Distribution of Population Density",
     xlab = "People per km¬≤")

In [None]:
# Choropleth with log transformation
ggplot(counties) +
  geom_sf(aes(fill = pop_density_2020), color = NA) +
  scale_fill_viridis_c(
    option = "plasma",
    trans = "log10",                      # Log transformation
    breaks = c(1, 10, 100, 1000, 10000),  # Nice round breaks
    labels = c("1", "10", "100", "1k", "10k"),
    name = "People per km¬≤\n(log scale)",
    na.value = "gray90"
  ) +
  labs(title = "Population Density by U.S. County (2020)",
       subtitle = "Log-transformed scale reveals patterns in sparse areas") +
  theme_void() +
  theme(legend.position = "right")

**When to use log transformation:**
- ‚úÖ Data spans multiple orders of magnitude
- ‚úÖ Want to show variation in both low and high values
- ‚úÖ Data is right-skewed (long tail of high values)
- ‚ùå Don't use for data with zeros or negative values
- ‚ùå Don't use if audience unfamiliar with logarithms

## 5. Data Classification Methods

Instead of continuous colors, you can classify data into discrete categories. **This is one of the most important decisions in choropleth mapping.**

### 5.1 Understanding Classification

Classification groups continuous data into discrete bins. Each bin gets one color. This:
- Simplifies interpretation
- Makes patterns more obvious
- But also **loses information** and can be **misleading** if done poorly

Let's explore the main classification methods:

### 5.2 Equal Interval Classification

In [None]:
# Create equal interval classes
breaks_equal <- classIntervals(counties$pop_density_2020, 
                               n = 5, 
                               style = "equal")
print(breaks_equal)

# Add classification to data
counties$density_equal <- cut(counties$pop_density_2020,
                              breaks = breaks_equal$brks,
                              include.lowest = TRUE)

# Map with equal intervals
ggplot(counties) +
  geom_sf(aes(fill = density_equal), color = NA) +
  scale_fill_brewer(palette = "YlOrRd",
                    name = "People per km¬≤",
                    na.value = "gray90") +
  labs(title = "Population Density: Equal Interval Classification",
       subtitle = "Divides range into equal-sized bins") +
  theme_void()

**Equal Interval:**
- Divides data range into equal-sized bins
- Example: 0-100, 100-200, 200-300, etc.

**Pros:**
- Easy to understand
- Good for evenly distributed data
- Maintains relative position in range

**Cons:**
- Poor for skewed data
- May create empty bins
- Most data may fall in one bin

### 5.3 Quantile Classification

In [None]:
# Create quantile classes
breaks_quantile <- classIntervals(counties$pop_density_2020,
                                  n = 5,
                                  style = "quantile")
print(breaks_quantile)

counties$density_quantile <- cut(counties$pop_density_2020,
                                 breaks = breaks_quantile$brks,
                                 include.lowest = TRUE)

# Map with quantiles
ggplot(counties) +
  geom_sf(aes(fill = density_quantile), color = NA) +
  scale_fill_brewer(palette = "YlOrRd",
                    name = "People per km¬≤",
                    na.value = "gray90") +
  labs(title = "Population Density: Quantile Classification",
       subtitle = "Equal number of counties in each bin") +
  theme_void()

**Quantile (Equal Count):**
- Equal number of features in each bin
- Example: 20% of counties in each of 5 bins

**Pros:**
- Even distribution of colors across map
- Works well for any distribution
- Highlights relative position

**Cons:**
- Bin ranges can be very different
- May group dissimilar values
- Can exaggerate small differences

### 5.4 Natural Breaks (Jenks)

In [None]:
# Create Jenks natural breaks
breaks_jenks <- classIntervals(counties$pop_density_2020,
                               n = 5,
                               style = "jenks")
print(breaks_jenks)

counties$density_jenks <- cut(counties$pop_density_2020,
                              breaks = breaks_jenks$brks,
                              include.lowest = TRUE)

# Map with Jenks
ggplot(counties) +
  geom_sf(aes(fill = density_jenks), color = NA) +
  scale_fill_brewer(palette = "YlOrRd",
                    name = "People per km¬≤",
                    na.value = "gray90") +
  labs(title = "Population Density: Natural Breaks (Jenks)",
       subtitle = "Minimizes within-class variance") +
  theme_void()

**Natural Breaks (Jenks):**
- Finds "natural" groupings in data
- Minimizes variance within classes
- Maximizes variance between classes

**Pros:**
- Often reveals actual patterns
- Good for most distributions
- Statistically optimal

**Cons:**
- Computationally intensive
- Different for different datasets (can't compare)
- Breaks may seem arbitrary

### 5.5 Standard Deviation

In [None]:
# Create standard deviation classes
breaks_sd <- classIntervals(counties$pop_density_2020,
                            n = 5,
                            style = "sd")
print(breaks_sd)

counties$density_sd <- cut(counties$pop_density_2020,
                           breaks = breaks_sd$brks,
                           include.lowest = TRUE)

# Map with standard deviation
ggplot(counties) +
  geom_sf(aes(fill = density_sd), color = NA) +
  scale_fill_brewer(palette = "RdYlBu",
                    name = "People per km¬≤",
                    na.value = "gray90",
                    direction = -1) +
  labs(title = "Population Density: Standard Deviation Classification",
       subtitle = "Breaks based on distance from mean") +
  theme_void()

**Standard Deviation:**
- Breaks based on distance from mean
- Example: mean ¬± 0.5 SD, ¬± 1 SD, etc.

**Pros:**
- Shows deviation from average
- Statistically meaningful
- Good with diverging color schemes

**Cons:**
- Assumes normal distribution
- Poor for skewed data
- May create empty bins

### 5.6 Comparison of Methods

In [None]:
# Create comparison plot
library(patchwork)

p1 <- ggplot(counties) +
  geom_sf(aes(fill = density_equal), color = NA) +
  scale_fill_brewer(palette = "YlOrRd") +
  labs(title = "Equal Interval") +
  theme_void() +
  theme(legend.position = "none")

p2 <- ggplot(counties) +
  geom_sf(aes(fill = density_quantile), color = NA) +
  scale_fill_brewer(palette = "YlOrRd") +
  labs(title = "Quantile") +
  theme_void() +
  theme(legend.position = "none")

p3 <- ggplot(counties) +
  geom_sf(aes(fill = density_jenks), color = NA) +
  scale_fill_brewer(palette = "YlOrRd") +
  labs(title = "Natural Breaks") +
  theme_void() +
  theme(legend.position = "none")

(p1 | p2 | p3) +
  plot_annotation(title = "Comparison of Classification Methods")

### 5.7 Which Method Should You Use?

**General Guidance:**

**Use Natural Breaks (Jenks) when:**
- ‚úÖ You want to reveal natural groupings
- ‚úÖ Data has clusters or modes
- ‚úÖ Creating single standalone map

**Use Quantile when:**
- ‚úÖ You want to show relative position
- ‚úÖ Data is highly skewed
- ‚úÖ Visual balance across map is important

**Use Equal Interval when:**
- ‚úÖ Data is roughly normally distributed
- ‚úÖ Comparing multiple maps
- ‚úÖ Breaks need to be intuitive (0-10, 10-20, etc.)

**Use Standard Deviation when:**
- ‚úÖ Showing deviation from mean is the point
- ‚úÖ Using diverging color scheme
- ‚úÖ Data is approximately normal

**Use Manual Breaks when:**
- ‚úÖ Domain-specific thresholds exist (e.g., poverty line)
- ‚úÖ Maintaining consistency across time series
- ‚úÖ Specific policy or scientific breakpoints matter

## 6. Color Schemes for Choropleth Maps

Color choice is just as important as classification method.

### 6.1 Sequential Color Schemes

For data that goes from low to high (most common):

In [None]:
# YlOrRd (yellow-orange-red)
p1 <- ggplot(counties) +
  geom_sf(aes(fill = density_jenks), color = NA) +
  scale_fill_brewer(palette = "YlOrRd") +
  labs(title = "YlOrRd") +
  theme_void() +
  theme(legend.position = "none")

# Blues
p2 <- ggplot(counties) +
  geom_sf(aes(fill = density_jenks), color = NA) +
  scale_fill_brewer(palette = "Blues") +
  labs(title = "Blues") +
  theme_void() +
  theme(legend.position = "none")

# Viridis
p3 <- ggplot(counties) +
  geom_sf(aes(fill = pop_density_2020), color = NA) +
  scale_fill_viridis_c(option = "viridis",
                       trans = "log10") +
  labs(title = "Viridis (continuous)") +
  theme_void() +
  theme(legend.position = "none")

(p1 | p2 | p3) +
  plot_annotation(title = "Sequential Color Schemes")

### 6.2 Diverging Color Schemes

For data with a meaningful midpoint (zero, average, baseline):

In [None]:
# Create deviation from median
median_density <- median(counties$pop_density_2020, na.rm = TRUE)
counties$density_deviation <- counties$pop_density_2020 - median_density

# Classify with standard deviation (good for diverging)
breaks_div <- classIntervals(counties$density_deviation,
                             n = 7,
                             style = "sd")
counties$density_div_class <- cut(counties$density_deviation,
                                  breaks = breaks_div$brks,
                                  include.lowest = TRUE)

# Map with diverging colors
ggplot(counties) +
  geom_sf(aes(fill = density_div_class), color = NA) +
  scale_fill_brewer(palette = "RdYlBu",
                    name = "Deviation from Median",
                    direction = -1,        # Reverse so red = high
                    na.value = "gray90") +
  labs(title = "Population Density Deviation from National Median",
       subtitle = "Blue = below median, Red = above median") +
  theme_void()

### 6.3 Color Scheme Best Practices

**Sequential Schemes (low ‚Üí high):**
- Use for: population density, income, temperature, any ordered data
- Light ‚Üí Dark or Saturated color
- Examples: YlOrRd, Blues, Viridis, Greens

**Diverging Schemes (low ‚Üê middle ‚Üí high):**
- Use for: change over time, deviation from average, political lean
- Two hues meeting at neutral middle
- Examples: RdYlBu, BrBG, PiYG, RdGy

**General Rules:**
- ‚ùå Never use rainbow (perceptually non-uniform)
- ‚úÖ Always test for colorblind accessibility
- ‚úÖ Use colorbrewer or viridis palettes
- ‚úÖ Limit to 5-7 classes for legibility
- ‚úÖ Ensure sufficient contrast
- ‚ùå Don't use red-green together (colorblind issue)

## 7. Design Principles for Effective Choropleths

### 7.1 The Choropleth Checklist

Before publishing your choropleth, verify:

**Data:**
- [ ] Variable is normalized (rate/density, not raw count)
- [ ] Classification method is appropriate for distribution
- [ ] Number of classes is reasonable (usually 4-7)

**Colors:**
- [ ] Color scheme matches data type (sequential/diverging)
- [ ] Colors are colorblind-friendly
- [ ] Sufficient contrast between adjacent classes
- [ ] Legend is clear and readable

**Design:**
- [ ] Borders are appropriate (often no borders for clean look)
- [ ] Title clearly states what's being shown
- [ ] Data source is cited
- [ ] Map projection is appropriate

### 9.2 Common Design Mistakes

**‚ùå Don't:**
1. Map raw counts (almost always misleading)
2. Use too many classes (>7 hard to distinguish)
3. Use rainbow color schemes
4. Use red-green combinations
5. Forget to show missing data
6. Use equal interval for skewed data
7. Omit the data source
8. Make legend text too small

**‚úÖ Do:**
1. Normalize by area or population
2. Test multiple classification methods
3. Use colorbrewer or viridis palettes
4. Show your data distribution (histogram)
5. Include clear title and legend
6. Consider your audience's familiarity
7. Test for colorblind accessibility
8. Be honest about uncertainty

### 7.3 When Choropleths Lie

Choropleths can be misleading:

**Visual Weight Problem:**
- Large areas draw attention even if values are low
- Small urban areas may be overlooked even if values are high
- Solution: Consider cartograms or proportional symbols

**Modifiable Areal Unit Problem (MAUP):**
- Patterns change depending on how areas are defined
- County vs. census tract vs. zip code show different patterns
- Solution: Acknowledge this limitation

**Classification Manipulation:**
- Different methods can show very different patterns
- Can be used to support predetermined conclusions
- Solution: Show multiple methods, explain your choice

**False Precision:**
- Sharp boundaries imply sudden changes that rarely exist
- Reality is usually more gradual
- Solution: Acknowledge smoothing, consider interpolated surfaces

## Summary and Next Steps

Congratulations! You now understand the art and science of choropleth mapping. You've learned:

‚úÖ When to use choropleths (and when not to)
‚úÖ The critical importance of normalization
‚úÖ Five major classification methods and when to use each
‚úÖ Sequential vs. diverging color schemes
‚úÖ Both ggplot2 and tmap workflows
‚úÖ Advanced techniques (bivariate, small multiples)
‚úÖ Design principles for effective communication
‚úÖ Common pitfalls and how to avoid them

### Key Takeaways

1. **Always normalize** - Map rates/densities, not counts
2. **Classification matters** - Test multiple methods
3. **Color is critical** - Use colorblind-friendly palettes
4. **Be honest** - Acknowledge limitations
5. **Test your map** - Try different classifications, get feedback

### Continue Learning:

**Next in Chapter 6:**
* **6.04-6.06 Adding Basemaps** - Add geographic context
* **6.07-6.08 Interactive Mapping** - Make choropleths explorable
* **6.09-6.10 Professional Cartography** - Final polish

**Related Topics:**
* Cartograms (area-adjusted choropleths)
* Dasymetric mapping (refined population distribution)
* Spatial statistics (identifying clusters)

**Resources:**
* [ColorBrewer](https://colorbrewer2.org) - Interactive color scheme picker
* [R Spatial Notebooks](https://vavramusser.github.io/r-spatial)
* [Mailing List](https://mailchi.mp/ab01e8fc8397/r-spatial-email-signup)

---

## ‚≠ê Thank You ‚≠ê

Thank you for working through this comprehensive choropleth guide!

* [**Support the Project**](https://buymeacoffee.com/vavramusser)
* Share with colleagues
* Provide feedback

---

## Quick Reference Code

In [None]:
# === ESSENTIAL CHOROPLETH PATTERNS ===

# Load and prepare data
counties <- st_read("counties.shp") %>%
  st_make_valid() %>%
  st_transform(crs = 5070)

# Calculate density
counties <- counties %>%
  mutate(area_km2 = as.numeric(st_area(geometry) / 1e6),
         pop_density = population / area_km2)

# Basic continuous choropleth
ggplot(counties) +
  geom_sf(aes(fill = pop_density), color = NA) +
  scale_fill_viridis_c(option = "plasma",
                       trans = "log10") +
  theme_void()

# Classified choropleth with natural breaks
library(classInt)
breaks <- classIntervals(counties$pop_density, n = 5, style = "jenks")
counties$density_class <- cut(counties$pop_density,
                              breaks = breaks$brks,
                              include.lowest = TRUE)

ggplot(counties) +
  geom_sf(aes(fill = density_class), color = NA) +
  scale_fill_brewer(palette = "YlOrRd") +
  theme_void()

# tmap version
library(tmap)
tm_shape(counties) +
  tm_polygons("pop_density",
              style = "jenks",
              n = 5,
              palette = "YlOrRd",
              title = "Density")

# Interactive
tmap_mode("view")
tm_shape(counties) +
  tm_polygons("pop_density",
              style = "jenks",
              palette = "YlOrRd",
              popup.vars = c("Name" = "name",
                            "Density" = "pop_density"))