# IPUMS Health Surveys Data Extraction Using ipumsr

## Introduction

The [IPUMS Health Surveys](https://healthsurveys.ipums.org) database offers harmonized microdata from the [National Health Interview Survey (NHIS)](https://www.cdc.gov/nchs/nhis/index.html) and the [Medical Expenditure Panel Survey (MEPS)](https://meps.ahrq.gov/mepsweb), providing detailed information on population health, healthcare access, and medical expenditures. It enables researchers to analyze trends in health outcomes, insurance coverage, and healthcare utilization across time and demographic groups. Through harmonization, IPUMS Health Surveys ensures data can be seamlessly compared across survey years, addressing changes in survey design, variable definitions, and geographic classifications.

**From the [IPUMS Health Surveys Website](https://healthsurveys.ipums.org):** IPUMS Health Surveys provide free individual-level survey data for research purposes from two leading sources of self-reported health and health care access information: the National Health Interview Survey (NHIS) and the Medical Expenditure Panel Survey (MEPS).

The IPUMS Health Surveys database organized into the following two subsections, based on data source:

* [National Health Interview Survey (NHIS)](https://nhis.ipums.org/nhis): The National Health Interview Survey (NHIS) provides harmonized annual microdata from the 1960s to the present.
* [Medical Expenditure Panel Survey (MEPS)](https://meps.ipums.org/meps): The Medical Expenditure Panel Survey (MEPS) provides harmonized microdata from the longitudinal survey of U.S. health care expenditures and utilization, covering the period 1996 to the present.

This notebook introduces the process of extracting [IPUMS Health Surveys](https://healthsurveys.ipums.org) data using the [IPUMS API](https://developer.ipums.org/docs/v2/apiprogram) via the [ipumsr R package](https://cran.r-project.org/web/packages/ipumsr/index.html). Users will learn how to define, submit, and download an IPUMS Health Surveys data extract, specifying desired variables, time periods, and geographic units for analysis. By the end of this notebook, users will have the skills to efficiently acquire customized IPUMS Health Surveys datasets and prepare them for spatial and statistical workflows.

### ★ Prerequisites ★
* Complete Chapter 1.1: Introduction to IPUMS and the IPUMS API
* Set Up Your [IPUMS Account and API Key](https://account.ipums.org/api_keys)

### Notebook Overview
1. Setup
2. IPUMS NHIS Metadata Exploration
3. IPUMS NHIS Data Extraction Specification and Submission
4. IPUMS MEPS Metadata Exploration
5. IPUMS MEPS Data Extraction Specification and Submission

## 1. Setup
This section will guide you through the process of installing essential packages and setting your IPUMS API key.

##### Required Packages

[**ipumsr**](https://cran.r-project.org/web/packages/ipumsr/index.html) A package for interacting with IPUMS datasets and the IPUMS API. It allows users to define and submit data extraction requests, download data, and read it directly into R for analysis.  This notebook uses the the following functions from *ipumsr*.

* *set_ipums_api_key()* for setting your IPUMS API key
* *get_sample_info()* for retrieving sample identification codes and descriptions for IPUMS microdata collections
* *get_metadata_nhgis()* for listing available data sources from IPUMS NHGIS
* *define_extract_micro()* for defining the parameters of an IPUMS microdata extract request to be submitted via the IPUMS API
* *define_extract_nhgis()* for defining an IPUMS NHGIS extract request
* *tst_spec()* for creating a tst_spec object containing a time-series table specification
* *submit_exract()* for submitting an extract request via the IPUMS API and return an *ipums_extract* object
* *wait_for_extract()* wait for an extract to finish processing
* *download_extract()* download an extract's data files
* *read_ipums_ddi()* for reading metadata about an IPUMS microdata extract from a DDI codebook (.xml) file
* *read_ipums_micro()* for reading data from an IPUMS microdata extract
* *read_nhgis()* for reading tabular data from an NHGIS extract
* *read_ipums_sf()* for reading spatial data from an IPUMS extract

### 1a. Install and Load Required Packages
If you have not already installed the required packages, uncomment and run the code below:

In [None]:
# install.packages("ipumsr")

Load the packages into your workspace.

In [None]:
library(ipumsr)

#### 1b. Set Your IPUMS API Key

Store your [IPUMS API key](https://account.ipums.org/api_keys) in your environment using the following code.

Refer to *Chapter 1.1: Introduction to IPUMS and the IPUMS API* for instructions on setting up your IPUMS account and API key.

In [None]:
ipumps_api_key = readline("Please enter your IPUMS API key: ")
set_ipums_api_key(ipumps_api_key, save = T, overwrite = T)

nhis
meps

## National Health Interview Series (NHIS)

From the [**IPUMS National Health Interview Series (NHIS) Webpage**](https://nhis.ipums.org/nhis): The National Health Interview Survey is a survey collecting information on the health, health care access, and health behaviors of the civilian, non-institutionalized U.S. population, with digital data files available from 1963 to present. IPUMS Health Surveys harmonizes these data and allows users to create custom NHIS data extracts for analysis.

##### NHIS 2020

**Variable Selection**
* Regin of Residence (REGION)
* Health Status (HEALTH)
* Health Insurance Coverage Status (HINOTCOVE)
* Ever Told Had Hypertension on 2+ Visits (HYP2TIME)
* Ever Smoked 100 Cigarettes in Life (SMOKEV)
* Frequency of Vigorous Activity 10+ Minutes: Times per Week (VIG10FWK)

**IPUMS Preselected Variables**
* Survey Year (YEAR)
* Sequential Serial Number, Household Record (SERIAL)
* Stratum for Variance Estimation (STRATA)
* Primary Sampling Unit (PSU) for Variance Estimation (PSU)
* NHIS Unique Identifier, Household (NHISHID)
* Person Number within Family/Household (from reformatting) (PERNUM)
* NHIS Unique Identifier, Person (NHISPID)
* Household Number (from NHIS)
* Sample Person Weight (SAMPWEIGHT)
* Sample Adult Weight, Longitudinal Sample (LONGWEIGHT)
* Sample Adult Weight, Partial Sample (PARTWEIGHT)
* Sample Adult Flag (ASTATFLG)
* Sample Child Flag (CSTATFLG)

## Medical Expenditures Panel Survey (MEPS)

From the [**IPUMS Medical Expenditures Panel Survey (MEPS) Webpage**](https://nhis.ipums.org/nhis): MEPS provides nationally representative, longitudinal data from 1996 to the present on health status, medical conditions, healthcare utilization, and healthcare expenditures for the U.S. civilian, non-institutionalized population. IPUMS MEPS harmonizes these data and allows users to create customized data extracts for analysis.

##### MEPS 2020

**Variable Selection**
* Health Status (HEALTH)
* Health Insuraance Coverage Type (Hierarchy) (COVERTYPE)
* Annual Total of Direct Health Care Payments (EXPTOT)
* Annual Total Number of Visits Made to Office-Based Medical Providers (OBTOTVIS)
* Respondent Has Been Told They Have Diabetes (DCSDIABDX)

**IPUMS Preselected Variables**
* Record Type (RECTYPE)
* Survey Year (YEAR)
* Sequential Serial Number, Person Record (SERIAL)
* Person Number within Family/Household (from reformatting) (PERNUM)
* Dwelling Unit ID (DUID)
* Person Number (PID)
* MEPS Unique Identifier (IPUMS Generated)
* Panel (PANEL)
* Annual Primary Sampling Unit (PSU) for Variance Estimation (PSUANN)
* Annual Stratum for Variance Estimation (STRATANN)
* Pooled Primary Sampling Unit (PSU) for Variance Estimation (PSUPLD)
* Pooled Variance Stratum (STRATAPLD)
* Year Entered MEPS (PANELYR)
* Relative Year 1 or 2 in Panel (RELYR)
* Final Basic Annual Weight (PERWEIGHT)
* Self-Administered Questionnaire Weight (SAQWEIGHT)
* Diabetes Care Weight (DIABWEIGHT)

## Recommended Next Steps
* **Continue with Chapter 2: IPUMS Data Acquisition and Extraction**
  * 2.1: IPUMS USA Data Extraction Using ipumsr
  * 2.2: IPUMS CPS Data Extraction Using ipumsr
  * 2.3: IPUMS International Microdata Extraction Using ipumsr
  * 2.4: IPUMS NHGIS Data Extraction Using ipumsr
  * 2.5: IPUMS Time Use Data Extraction Using ipumsr
  * 2.7: Reading IPUMS Global Health Data Extracts Using ipumsr
  * 2.8: Reading IPUMS Higher Education Data Extracts Using ipumsr