# Choropleth Mapping
### by [Kate Vavra-Musser](https://vavramusser.github.io) for the [R Spatial Notebook Series](https://vavramusser.github.io/r-spatial)

## Introduction

In this notebook, weâ€™ll explore how to create a choropleth map using population data by county. [Choropleth maps](https://en.wikipedia.org/wiki/Choropleth_map) are thematic maps where areas are shaded or patterned in proportion to a variable of interest, in this case, population data. These maps help visualize spatial distributions and identify patterns, such as population density variations across geographic areas.

This notebook will use NHGIS population data by county based on the 2020 US Decennial Census.

### Notebook Goals
At the end of Chapter 2.4: IPUMS NHGIS Data Extraction Using ipumsr, you saved your data extraction as two file formats *ipums_nhgis_example.rds* and *ipums_nhgis_example.csv*.  You will need these files to run this notebook.  If you are working throuhg this chapter without previously completing, Chapter 1.2, you will need to copy the *ipums_nhgis_example.rds* file into your working directory prior to running this notebook.

### âœ¨ Prerequisites âœ¨
* Complete [Introduction to sf: Reading, Writing, and Inspecting Vector Data](https://platform.i-guide.io/notebooks/9968babe-22e4-4c3d-98e2-d8b45e9672cd)
* Complete [Working with CRS: Reprojection and Transformation](https://platform.i-guide.io/notebooks/76912ca7-73e4-437e-8ecf-0cb456bd7282)
* Complete [Preparing Vector Data for Analysis](https://platform.i-guide.io/notebooks/44926d85-7f08-4774-a103-a22ff3876cad)
* Complete [IPUMS NHGIS Data Extraction Using ipumsr](https://platform.i-guide.io/notebooks/be08e56e-1c08-458e-a230-263c64d386bc)

### ðŸ’½ Data Used in this Notebook ðŸ’½
* IPUMS NHGIS Example Data Extraction (*ipums_nhgis_example.zip*)
  * If you worked through [IPUMS NHGIS Data Extraction Using ipumsr](https://platform.i-guide.io/datasets/b033e365-cb1f-41d6-ad99-e6a13c41127c) you should have created and saved a copy of *ipums_nhgis_example.zip* in the final section of the notebook.
  * You can download a copy of *ipums_nhgis_example.zip* file from [the I-GUIDE platform](https://platform.i-guide.io/datasets/0cb99a7c-97c0-4ffc-a2d7-ff539c8eadae) or [Kate's GitHub](https://github.com/vavramusser/r-spatial/blob/main/ipums_nhgis_example.zip).  You will need to unzip *ipums_nhgis_example.zip* and extract *ipums_nhgis_example.shp* file to your workspace.

## 1. Setup
This section will guide you through the process of installing essential packages and setting your IPUMS API key.

##### Required Packages

[**dplyr**](https://cran.r-project.org/web/packages/dplyr/index.html) Â· A Grammar of Data Manipulation. This notebook uses the the following functions from *dplyr*.

* [*mutate*](https://rdrr.io/cran/dplyr/man/mutate.html) Â· create, modify, and delete columns
* [*rename*](https://rdrr.io/cran/dplyr/man/rename.html) Â· rename columns
* This notebook also uses [*%>%*](https://magrittr.tidyverse.org/reference/pipe.html), referred to as the *pipe* operator, which is used to pass the output from one function directly into the next function for the purpose of creating streamlined workflows.  The *pipe* operator is a commonly used component of the [*tidyverse*](https://www.tidyverse.org).

[**ggplot2**](https://cran.r-project.org/web/packages/ggplot2/index.html) Â· Create Elegant Data Visualisations Using the Grammar of Graphics.  A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.  This notebook uses the following functions from *ggplot2*.

* *CoordSf* Â· Visualize sf objects
  * *coord_sf* Â· geometric coordinates
  * *geom_sf* Â· geometric objects (points, lines, or polygons)
* [*ggplot*](https://rdrr.io/cran/ggplot2/man/ggplot.html) Â· Create a new ggplot
* [*ggtheme*](https://rdrr.io/cran/ggplot2/man/ggtheme.html) Â· Complete themes
  * *theme_minimal* Â· Minimal theme
* [*labs*](https://rdrr.io/cran/ggplot2/man/labs.html) Â· Modify axis, legend, and plot labels
* [*scale_colour_viridis_d*]() Â· Viridis colour scales from [viridisLite](https://cran.r-project.org/web/packages/viridisLite/index.html)
  * *scale_fille_viridis_c*
* [*theme*](https://rdrr.io/cran/ggplot2/man/theme.html) Â· Modify components of a theme

[**sf**](https://cran.r-project.org/web/packages/sf/index.html) Â· Support for simple features, a standardized way to encode spatial vector data. Binds to 'GDAL' for reading and writing data, to 'GEOS' for geometrical operations, and to 'PROJ' for projection conversions and datum transformations. Uses by default the 's2' package for spherical geometry operations on ellipsoidal (long/lat) coordinates.  This notebook uses the following functions from *sf*.

* [*geos_measures*](https://rdrr.io/cran/sf/man/geos_measures.html) Â· Compute geometric measurements
  * *st_area* Â· Compute area
* [*st_as_sf*](https://rdrr.io/cran/sf/man/st_as_sf.html) Â· Convert foreign object to an sf object
* [*st_make_valid*](https://rdrr.io/cran/sf/man/valid.html) Â· Check validity or make an invalid geometry valid
* [*st_transform*](https://rdrr.io/cran/sf/man/st_transform.html) Â· Transform or convert coordinates of simple feature

### 1a. Install and Load Required Packages
If you have not already installed the required packages, uncomment and run the code below:

In [None]:
# install.packages(c("dplyr", "ggplot2", "sf"))

Load the packages into your workspace.

In [1]:
library(dplyr)
library(ggplot2)
library(sf)


Attaching package: 'dplyr'


The following objects are masked from 'package:stats':

    filter, lag


The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union


Linking to GEOS 3.11.2, GDAL 3.8.2, PROJ 9.3.1; sf_use_s2() is TRUE



### 1b. Read in the Data File

Run the following line of code to read in the *ipums_nhgis_example.rds* file into memory.  You may need to update the file path to reflect the file's location on your machine or in your working directory.

The *ipums_nhgis_example.rds* file contains information from the 2010 Decennial Census.

In [None]:
unzip("ipums_nhgis_example.zip")
dat <- st_read("ipums_nhgis_example.shp")

## 2. Data Preparation

Before mapping, we need to ensure the data is in the correct format and that each geometry is valid. Invalid geometries can prevent accurate area calculation and mapping, so weâ€™ll clean and validate these before moving forward.

First we will convert *dat_shp* to an sf object.

In [None]:
dat_shp <- st_as_sf(dat_shp)

Next we will fix any invalid geometries using st_make_valid() to handle any geometric issues that might interfere with area calculations or plotting.

In [None]:
dat_shp <- st_make_valid(dat_shp)

In this step, we will transform the Coordinate Reference System (CRS) to a standard projection suitable for calculating area.  For this exercise, we will use the CRS 4326.

In [None]:
dat_shp <- st_transform(dat_shp, crs = 4326)

## 3. Calculating Population Density

Population data is often more informative when normalized by area. In this step, weâ€™ll calculate population density for each tract as the number of people per square kilometer. This allows us to compare population concentrations across areas of different sizes.

In the next line of code we first calculate the area of each tract in square kilometers using st_area(), converting the units to numeric values to simplify further calculations.  Then we calculate population density (pop_density) as the total population (pop2020) divided by the area in square kilometers.  And finally, we convert pop_density to a plain numeric variable (without units), which avoids potential issues when visualizing data with ggplot2.

In [None]:
# calculate area in square kilometers and population density
dat_shp <- dat_shp %>%
  mutate(area_km2 = as.numeric(st_area(.) / 1e6),     # convert area to square kilometers
         pop_density = pop2020 / area_km2)            # population density per sq km

# remove units from pop_density
dat_shp <- dat_shp %>%
  mutate(pop_density = as.numeric(pop_density))       # convert to numeric to remove units

## 4. Basic Choropleth Mapping with ggplot2

With our data prepared and population density calculated, we can now map the population density across tracts. ggplot2 and geom_sf() allow us to map the polygons by filling each tract according to total population (pop2020), using a gradient color scale to represent low to high population counts.  In this step we do the following:

1. Use geom_sf(aes(fill = pop_density)) to color each tract based on 2020 population (pop2020).
2. Use scale_fill_viridis_c() to apply a colorblind-friendly gradient scale for the population count.
3. Limit the map view to the contiguous United States using coord_sf() with specified latitude and longitude bounds, focusing the map and removing excess whitespace.

This produces a clear choropleth map that allows users to easily identify areas of high and low population density across the U.S.

In [None]:
ggplot(data = dat_shp) +
  geom_sf(aes(fill = pop2020), color = NA) +
  scale_fill_viridis_c(option = "plasma", na.value = "grey50") +     # use a colorblind-friendly scale
  coord_sf(xlim = c(-125, -66), ylim = c(24, 50)) +                  # limit to the contiguious United States
  labs(title = "Population by County (2020)", fill = "Population") +
  theme_minimal()

## 5. Mapping Population Density and Customizing the Color Scale and Legend

The population map isn't very informative, so we'll make another version based on population density (pop_density).  To make the map more readable, we will also customize the color scale and legend. For example, using a logarithmic transformation can better capture population density variations, particularly if thereâ€™s a wide range between low-density and high-density areas.  In this step we:

1. Apply scale_fill_viridis_c() with a log transformation and specific breaks to improve visual contrast across the density range.
2. Adjust the legend position and add descriptive labels for clarity.

This step helps users interpret the data more effectively by adjusting the color scale to better fit the dataâ€™s distribution.

In [None]:
ggplot(data = dat_shp) +
  geom_sf(aes(fill = pop_density), color = NA) +
  scale_fill_viridis_c(option = "plasma", trans = "log",  # Log transformation for density range
                       breaks = c(10, 100, 1000, 10000),  # Adjust breaks as needed
                       labels = c("10", "100", "1k", "10k"),
                       na.value = "grey50") +
  coord_sf(xlim = c(-125, -66), ylim = c(24, 50)) +
  labs(title = "Population Density by County (2020)",
       subtitle = "Log-transformed color scale for population density",
       fill = "Density (per sq km)") +
  theme_minimal() +
  theme(legend.position = "bottom")

The final map reveals spatial patterns of 2020 population density across United States counties. By examining these density patterns, you can identify urban centers (higher population density) and rural areas (lower population density). This information is essential for understanding demographic distributions and can support analyses in public health, urban planning, and environmental impact studies.

## Next Steps

* Continue to [**Chapter 6.2 Mapping Point and Polygon Data**](https://platform.i-guide.io/notebooks/2b9f579c-32b0-4078-af39-994bb31d50ec)
* Move on to Chapter 7: Foundational Spatial Analyses
  * [**Chapter 7.1 Geometric Binary Predicates: The Building Blocks of Geometric Queries**](https://platform.i-guide.io/notebooks/06a40182-91cc-4ed4-befb-7dad6ff99966)
  * [**Chapter 7.2 Spatial Joins and Filter by Location**](https://platform.i-guide.io/notebooks/a4f2cf0c-b777-4811-8aa1-6d5420795)
  * [**Chapter 7.3 Distance and Nearest Neighbor Calculations**](https://platform.i-guide.io/notebooks/02f7f46b-c45f-4a06-81e0-d7df3f81ca23)
* Return to the [**R Spatial Notebooks Project Chapter List**](https://vavramusser.github.io/r-spatial/#:~:text=Chapter%201%3A%20Data%20Sources%20and%20APIs) to view a list of all available notebooks organized in the R Spatial Notebooks chapter structure.
* Visit the [**R Spatial Notebooks Project Homepage**](https://vavramusser.github.io/r-spatial) to learn more about the project, view the list of all notebooks, and explore additional resources.
* Join the project [**Mailing List**](https://mailchi.mp/ab01e8fc8397/r-spatial-email-signup) to hear about future notebook releases and other updates.
* If you have an idea for a new notebook please submit your idea via the [**Suggestion Box**](https://us19.list-manage.com/survey?u=746bf8d366d6fbc99c699e714&id=54590a28ea&attribution=false).

---

## â˜… Thank You â˜…

Thank you so much for engaging with this notebook and supporting the project!  The R Spatial Notebooks Project is a labor of love so if you enjoy or benefit from these notebooks, please consider [**Donating to the Project**](https://buymeacoffee.com/vavramusser).  Your support allows me to continue producing notebooks and supporting the R Spatial Notebooks community.