# IPUMS [NHGIS](https://www.nhgis.org) Data Extraction Using [ipumsr](https://cran.r-project.org/web/packages/ipumsr/index.html) - Supplemental Exercise 2
### by [Kate Vavra-Musser](https://vavramusser.github.io) for the [R Spatial Notebook Series](https://vavramusser.github.io/r-spatial)

## Introduction
This notebook provides an additional example of the IPUMS NHGIS data extraction process using the IPUMS API via the ipumsr R package.  This exercise is a supplement to the workflow introducted in Chapter 3.4 IPUMS NHGIS Data Extraction Using ipumsr.

### Notebook Goals
This notebook replicates the IPUMS NHGIS data extraction process and extracts a NHGIS polygon shapefile.  The resulting data file is used in subsequent notebooks in the R Spatial Notebooks series.  The notebook provides an example of extracting spatial data only (without associated attribute data) from the IPUMS NHGIS repository.

### ✨ Prerequisites ✨
* Complete [Introduction to IPUMS and the IPUMS API](https://platform.i-guide.io/notebooks/82d3b176-e4e6-4307-8186-318a3fe6c81a)
* Set Up Your [IPUMS Account and API Key](https://account.ipums.org/api_keys)
* Complete [Introduction to sf: Reading, Writing, and Inspecting Vector Data](https://platform.i-guide.io/notebooks/9968babe-22e4-4c3d-98e2-d8b45e9672cd)
* Complete [IPUMS NHGIS Data Extraction Using ipumsr](https://platform.i-guide.io/notebooks/be08e56e-1c08-458e-a230-263c64d386bc)

### Notebook Overview
1. Setup
2. Extraction Workflow: Shapefiles Only

---

## 1. Setup
This section will guide you through the process of installing essential packages and setting your IPUMS API key.

#### Required Packages

[**dplyr**](https://cran.r-project.org/web/packages/dplyr/index.html) A Grammar of Data Manipulation. This notebook uses the the following functions from *dplyr*.

* [*filter*](https://rdrr.io/cran/dplyr/man/filter.html) · keep rows that match a condition

[**ipumsr**](https://cran.r-project.org/web/packages/ipumsr/index.html) An R Interface for Downloading, Reading, and Handling IPUMS Data.  This notebook uses the the following functions from *ipumsr*.

* [*define_extract_nhgis*](https://rdrr.io/cran/ipumsr/man/define_extract_nhgis.html) · define an IPUMS NHGIS extract request
* [*download_extract*](https://rdrr.io/cran/ipumsr/man/download_extract.html) · download a completed IPUMS data extract
* [*get_metadata_nhgis*](https://rdrr.io/cran/ipumsr/man/get_metadata_nhgis.html) · list available data sources from IPUMS NHGIS
* [*read_ipums_sf*](https://rdrr.io/cran/ipumsr/man/read_ipums_sf.html) · read spatial data from an IPUMS extract
* [*set_ipums_api_key*](https://rdrr.io/cran/ipumsr/man/set_ipums_api_key.html) · set your IPUMS API key
* [*submit_extract*](https://rdrr.io/cran/ipumsr/man/submit_extract.html) · submit an extract request via the IPUMS API
* *tst_spec* · create a *tst_spec* object containing a time series table specification
* [*wait_for_extract*](https://rdrr.io/cran/ipumsr/man/wait_for_extract.html) · wait for an extract to finish processing

[**sf**](https://cran.r-project.org/web/packages/sf/index.html) Support for simple features, a standardized way to encode spatial vector data. Binds to 'GDAL' for reading and writing data, to 'GEOS' for geometrical operations, and to 'PROJ' for projection conversions and datum transformations. Uses by default the 's2' package for spherical geometry operations on ellipsoidal (long/lat) coordinates.  This notebook uses the following functions from *sf*.

* [*st_write*](https://rdrr.io/cran/sf/man/st_write.html) · Write simple features object to file or database

### 1a. Install and Load Required Packages
If you have not already installed the required packages, uncomment and run the code below:

In [None]:
# install.packages("dplyr", "ipumsr", "purr", "sf")

Load the packages into your workspace.

In [None]:
library(dplyr)
library(ipumsr)
library(purrr)
library(sf)

### 1b. Set Your IPUMS API Key

Store your [IPUMS API key](https://account.ipums.org/api_keys) in your environment using the following code.

Refer to [Chapter 1.1 Introduction to IPUMS and the IPUMS API](https://platform.i-guide.io/notebooks/82d3b176-e4e6-4307-8186-318a3fe6c81a) for instructions on setting up your IPUMS account and API key.

In [None]:
ipumps_api_key = readline("Please enter your IPUMS API key: ")
set_ipums_api_key(ipumps_api_key, save = T, overwrite = T)

## 2. NHGIS Polygons Only

### 2a. View and Filter the List of Geography Shapefiles

Forthis exercise, we only want the geography shapefile and aren't interested in downloading any data from the time-series tables NHGIS repository.  Therefore, we will jump directly into e will taking a look at the list of geography shapefiles that fit our critera.  We are looking for state boundary shapefiles so let's filter the shapefile metadata to only incluede shapefiles which include the word "state" on the description of their *geographic_level*.  We will also focus on only shapefiles using the 2010 Tiger-Line files so we will also filter based on the *year = 2010* criteria.

In [None]:
metadata_shp <- get_metadata_nhgis("shapefiles") %>%
    filter(year == 2010 & grepl("state", geographic_level, ignore.case = T)) %>%
    print(n = Inf)

This filter resulted in a list of sevem potential shapefiles.  Let's select the 2010 shapefile based on 2010 Tiger-Line shapefiles for states (*us_state_2010_tl2010*).

### 2b. Shapefile Extraction Specification and Submission

Now that we have selected our shapefile (*us_state_2010_tl2010*) we are ready to define and submit our extraction request to the IPUMS API.

In [None]:
extract_definition <- define_extract_nhgis(description = "I-GUIDE NHGIS State Polygons Shapefile Extraction",
                                           shapefiles = "us_state_2010_tl2010")

Submitting the extraction definition object *extract_definition* to the API.

In [None]:
extraction_submitted <- submit_extract(extract_definition)
extraction_complete <- wait_for_extract(extraction_submitted)
extraction_complete$status
filepath <- download_extract(extraction_submitted, overwrite = T)

The result of the extraction request will be one file, the state boundaries geography shapefile.  The next step is to read that file into R.

In [None]:
shp <- read_ipums_sf(filepath)

Let's take a look at the dimesions of the shapefile (*shp*).

In [None]:
dim(shp)

The shapefile includes 18 variables for 52 polygons inckding the 50 states, Washington D.C., and Puerto Rico.

Let's take a look at the first few lines of the shapefile

In [None]:
head(shp)

Finally, we will save the shapefile.  Before we do that however, we subset out data to only the column we will use in subsequent analyses.  This will make it easier to save and work work this data.

In [None]:
colnames(shp)

In [None]:
shp_cols <- c("STATEFP10", "STUSPS10", "NAME10")
shp <- shp[shp_cols]

We are ready to save the shapefile to our workspace.

In [None]:
st_write(shp, "ipums_nhgis_states.shp", driver = "ESRI Shapefile", delete_dsn = T)

At the end of this notebook we have saved a copy of the geographic data file for US states to the shapefile *ipums_nhgis_states.shp*.

---

## Next Steps

* Continue to [**Chapter 3.01.3 IPUMS NHGIS Data Extraction using ipumsr: Supplemental Exercise 3**](https://platform.i-guide.io/notebooks/55dd96e5-fdf6-408f-a050-7fcd006d0575)
* Move on to Chapter 5: Data Cleaning, Preparation, and Exploratory Data Analysis (EDA)
  * [**Chapter 5.02 Spatial Data Exploration and Preprocessing with IPUMS NHGIS**]()
* Return to the [**R Spatial Notebooks Project Chapter List**](https://vavramusser.github.io/r-spatial/#:~:text=Chapter%201%3A%20Data%20Sources%20and%20APIs) to view a list of all available notebooks organized in the R Spatial Notebooks chapter structure.
* Visit the [**R Spatial Notebooks Project Homepage**](https://vavramusser.github.io/r-spatial) to learn more about the project, view the list of all notebooks, and explore additional resources.
* Join the project [**Mailing List**](https://mailchi.mp/ab01e8fc8397/r-spatial-email-signup) to hear about future notebook releases and other updates.
* If you have an idea for a new notebook please submit your idea via the [**Suggestion Box**](https://us19.list-manage.com/survey?u=746bf8d366d6fbc99c699e714&id=54590a28ea&attribution=false).

---

## ★ Thank You ★

Thank you so much for engaging with this notebook and supporting the project!  The R Spatial Notebooks Project is a labor of love so if you enjoy or benefit from these notebooks, please consider [**Donating to the Project**](https://buymeacoffee.com/vavramusser).  Your support allows me to continue producing notebooks and supporting the R Spatial Notebooks community.