# Reading IPUMS Higher Education Data Extracts Using ipumsr

## Introduction

The [IPUMS Higher Education](https://highered.ipums.org/highered) database offers harmonized microdata from surveys focused on postsecondary education, workforce outcomes, and research activities from the [National Survey of College Graduates (NSCG)](https://ncses.nsf.gov/surveys/national-survey-college-graduates), [National Survey of Recent College Graduates (NSRCG)](https://www.nsf.gov/statistics/srvyrecentgrads), and [Survey of Doctorate Recipients (SDR)](https://ncses.nsf.gov/surveys/doctorate-recipients). It provides detailed information on educational attainment, employment, and demographic characteristics, enabling analysis of trends in higher education and its impacts on the labor market. Through harmonization, IPUMS Higher Education ensures data can be seamlessly compared across survey years, addressing differences in variable definitions, survey design, and geographic classifications.

**From the [IPUMS Higher Education Website](https://highered.ipums.org/highered):** IPUMS Higher Ed disseminates data from the [Scientists and Engineers Statistical Data System (SESTAT)](https://www.nsf.gov/statistics/sestat), the leading surveys for studying the science and engineering (STEM) workforce in the United States. Data from the National Surveys of College Graduates (NSCG), Recent College Graduates (NSRCG), and Doctorate Recipients (SDR) are integrated from 1993 to the present.

This notebook introduces the process of using the the [ipumsr R package](https://cran.r-project.org/web/packages/ipumsr/index.html) to read [IPUMS Higher Education](https://highered.ipums.org/highered) data extracts which were previously downloaded from the IPUMS data repository. By the end of this notebook, users will have the skills to prepare IPUMS Higher Education datasets for spatial and statistical workflows.

*Note that the IPUMS API does not currently support direct data extraction from the IPUMS Higher Education database.*

### ★ Prerequisites ★
* Complete Chapter 1.1: Introduction to IPUMS and the IPUMS API

### Notebook Overview
1. Setup
2. ...

## 1. Setup
This section will guide you through the process of installing essential packages and setting your IPUMS API key.

##### Required Packages

[**ipumsr**](https://cran.r-project.org/web/packages/ipumsr/index.html) A package for interacting with IPUMS datasets and the IPUMS API. It allows users to define and submit data extraction requests, download data, and read it directly into R for analysis.  This notebook uses the the following functions from *ipumsr*.

* *set_ipums_api_key()* for setting your IPUMS API key
* *get_sample_info()* for retrieving sample identification codes and descriptions for IPUMS microdata collections
* *get_metadata_nhgis()* for listing available data sources from IPUMS NHGIS
* *define_extract_micro()* for defining the parameters of an IPUMS microdata extract request to be submitted via the IPUMS API
* *define_extract_nhgis()* for defining an IPUMS NHGIS extract request
* *tst_spec()* for creating a tst_spec object containing a time-series table specification
* *submit_exract()* for submitting an extract request via the IPUMS API and return an *ipums_extract* object
* *wait_for_extract()* wait for an extract to finish processing
* *download_extract()* download an extract's data files
* *read_ipums_ddi()* for reading metadata about an IPUMS microdata extract from a DDI codebook (.xml) file
* *read_ipums_micro()* for reading data from an IPUMS microdata extract
* *read_nhgis()* for reading tabular data from an NHGIS extract
* *read_ipums_sf()* for reading spatial data from an IPUMS extract

### 1a. Install and Load Required Packages
If you have not already installed the required packages, uncomment and run the code below:

In [None]:
# install.packages("ipumsr")

Load the packages into your workspace.

In [None]:
library(ipumsr)

##### SDR 2013

**Variable Selection**
* Age (AGE)
* Gender (GENDER)
* Type of Highest Certificate or Degree (DGRDG)
* Field of Major for Highest Degree (major group)
* Labor Force Status (LFSTAT)

**IPUMS Preselected Variables**
* Individual Identification Number (PERSONID)
* Survey Year (YEAR)
* SESTAT Weight (WEIGHT)
* Sample Identifier (SAMPLE)
* Survey Identifier (SURID)

## Recommended Next Steps
* **Continue with Chapter 2: IPUMS Data Acquisition and Extraction**
  * 2.1: IPUMS USA Data Extraction Using ipumsr
  * 2.2: IPUMS CPS Data Extraction Using ipumsr
  * 2.3: IPUMS International Microdata Extraction Using ipumsr
  * 2.4: IPUMS NHGIS Data Extraction Using ipumsr
  * 2.5: IPUMS Time Use Data Extraction Using ipumsr
  * 2.6: IPUMS Health Surveys Data Extraction Using ipumsr
  * 2.7: Reading IPUMS Global Health Data Extracts Using ipumsr