# Reading IPUMS Global Health Data Extracts Using ipumsr

## Introduction

The [IPUMS Global Health](https://globalhealth.ipums.org) database offers harmonized microdata from international health surveys, including the [Demographic and Health Surveys (DHS)](https://dhsprogram.com), [Performance Monitoring for Action (PMA)](https://www.pmadata.org) surveys, and the [Multiple Indicator Cluster Surveys (MICS)](https://mics.unicef.org). It provides detailed information on maternal and child health, family planning, nutrition, and access to healthcare across countries. Through harmonization, IPUMS Global Health enables researchers to compare health outcomes and determinants across countries and survey years, overcoming challenges posed by differences in survey design, variable definitions, and geographic contexts.

**From the [IPUMS Global Health Website](https://globalhealth.ipums.org):** IPUMS Global Health provides integrated international health survey data at no cost for research and educational purposes from three data series: the Demographic Health Surveys (DHS), the UNICEF Multiple Indicator Cluster surveys (MICS), and Performance Monitoring for Action (PMA).

The IPUMS Global Health database organized into the following three subsections, based on data source:

* [Demographic and Health Surveys (DHS)](https://www.idhsdata.org/idhs): Integrated Demographic and Health Surveys from The DHS Program, currently covering Africa and South Asian surveys from the 1980s to the present.
* [Performance Monitoring for Action (PMA)](https://pma.ipums.org/pma): Integrated PMA surveys on fertility, contraception, hygiene, and health, administered frequently to monitor trends in select high-fertility countries.
* [Multiple Indicator Cluster Surveys (MICS)](https://mics.ipums.org/mics): Integrated MICS surveys covering child health and well-being.

This notebook introduces the process of using the the [ipumsr R package](https://cran.r-project.org/web/packages/ipumsr/index.html) to read [IPUMS Health Surveys](https://healthsurveys.ipums.org) data extracts which were previously downloaded from the IPUMS data repository. By the end of this notebook, users will have the skills to prepare IPUMS Global Health datasets for spatial and statistical workflows.

*Note that the IPUMS API does not currently support direct data extraction from the IPUMS Global Health database.*

### ★ Prerequisites ★
* Complete Chapter 1.1: Introduction to IPUMS and the IPUMS API

### Notebook Overview
1. Setup
2. ...

## 1. Setup
This section will guide you through the process of installing essential packages and setting your IPUMS API key.

##### Required Packages

[**ipumsr**](https://cran.r-project.org/web/packages/ipumsr/index.html) A package for interacting with IPUMS datasets and the IPUMS API. It allows users to define and submit data extraction requests, download data, and read it directly into R for analysis.  This notebook uses the the following functions from *ipumsr*.

* *set_ipums_api_key()* for setting your IPUMS API key
* *get_sample_info()* for retrieving sample identification codes and descriptions for IPUMS microdata collections
* *get_metadata_nhgis()* for listing available data sources from IPUMS NHGIS
* *define_extract_micro()* for defining the parameters of an IPUMS microdata extract request to be submitted via the IPUMS API
* *define_extract_nhgis()* for defining an IPUMS NHGIS extract request
* *tst_spec()* for creating a tst_spec object containing a time-series table specification
* *submit_exract()* for submitting an extract request via the IPUMS API and return an *ipums_extract* object
* *wait_for_extract()* wait for an extract to finish processing
* *download_extract()* download an extract's data files
* *read_ipums_ddi()* for reading metadata about an IPUMS microdata extract from a DDI codebook (.xml) file
* *read_ipums_micro()* for reading data from an IPUMS microdata extract
* *read_nhgis()* for reading tabular data from an NHGIS extract
* *read_ipums_sf()* for reading spatial data from an IPUMS extract

### 1a. Install and Load Required Packages
If you have not already installed the required packages, uncomment and run the code below:

In [None]:
# install.packages("ipumsr")

Load the packages into your workspace.

In [1]:
library(ipumsr)

## 2. Read an IPUMS Demographic and Health Surveys (DHS) Data Extract

From the [**IPUMS Demographic Health Surveys (DHS) Webpage**](https://www.atusdata.org/atus): IPUMS-DHS facilitates analysis of Demographic and Health Surveys, administered in low- and middle-income countries since the 1980s. IPUMS-DHS contains thousands of consistently coded variables on the health and well-being of women, children, births, men, and on all members of randomly selected households, for 42 African countries and 9 Asian countries. Users can determine variable availability at a glance and create data files with just the variables and samples they need.

### ★ Create Your DHS Account ★
Prior to downloading from the IPUMS DHS repository, you will need to [**set up a DHS account**](https://dhsprogram.com/data/dataset_admin/login_main.cfm).

##### Kenya 2020 (Women)

**Variable Selection**
* Displaced GPS Coordinates of Primary Sampling Unit (GPSLATLONG)
* Number of Entries in the Birth History (TOTBIRTHIST)
* Timing of First Antenatal Visit for the Pregnancy: Months (ANVISMO_ALL)
* Number of Antenatal Visits During the Pregnancy (ANVISNO_ALL)
* Took SP/Fansidar for Malaria During Pregnancy (Last Birth) (ANMALSP_01)
* Woman's Household Has Bednet for Sleeping (BEDNETHAVE)

**IPUMS Preselected Variables**
* IPUMS-DHS Sample Identifier (SAMPLE)
* IPUMS-DHS Sample Identifier (string) (SAMPLESTR)
* Country (COUNTRY)
* Year of Sample (YEAR)
* Unique Cross-Sample Respondent Identifier (IDHSPID)
* Unique Cross-Sample Household Identifer (IDHSHID)
* Key to Link DHS Clusters to Context String (string) (DHSID)
* Uniqye Sample-Case PSU Identifier (IDHSPSU)
* Unique Cross-Sample Sampling Strata (IDHSSTRATA)
* Sample-Specific Respondent Identifer (CASEID)
* Sample-Specific Household Identifer (HHID)
* Sample-Specific Primary Sampling Unit (PSU)
* Sample-Specific Sampling Strata (STRATA)
* Sample-Specific Domain (DOMAIN)
* Household Number in Cluster (HHNUM)
* Sample-Specific Cluster Number (CLUSTERNO)
* Household Line Number of Women Respondent (LINENO)
* Sample Weight for Persons (PERWEIGHT)
* Population Factor Weight (Women) (POPWT)
* All Woman Factor for Total Population (AWFACTT)
* Urban-Rural Status (URBAN)
* Kenya Regions, 2020 (GIS) (GEO_KE2020)
* Age (AGE)
* Age in 5 Year Groups (AGE5YEAR)
* Usual Resident or Visitor (RESIDENT)
* Religion (RELIGION)
* Total Children Ever Born (CHEB)
* Household Wealth Index in Quintiles (WEALTHQ)
* Wealth Index Factor Score (5 Decimals) (WEALTHS)
* Highest Education Level (EDUCLVL)
* Total Years of Education (EDYRTOTAL)

## 3. Read an IPUMS Performance Monitoring for Action (PMA) Data Extract

From the [**IPUMS Performance Monitoring for Action (PMA) Webpage**](https://pma.ipums.org/pma): IPUMS PMA harmonizes the Performance Monitoring for Action (PMA) data series (it was formerly known as Performance Monitoring and Accountability 2020 - PMA2020). It provides an interactive web dissemination system for PMA data with variable documentation on thousands of harmonized variables on reproductive and sexual health, family planning, maternal and newborn health, and nutrition. PMA is fielded by the Bill & Melinda Gates Foundation and Johns Hopkins University using streamlined and high-frequency data collection in 11 FP2020 pledging countries.

### ★ Authorize Your IPUMS Account for PMA Data Access ★
Prior to downloading from the IPUMS PMA repository, you will need to [**authorize your IPUMS account for PMA data access**](https://uma.pop.umn.edu/pma/registration).  Note that, unlike other data sources within the IPUMS repository, **the PMA access authorization process is not automatic**.  You may be required to to wait multiple days for your request to be processed and your account to be approved.

##### Indonesia 2016 (Service Delivery Point - Family Planning)

**Variable Selection**
* Census Block, Indonesia (BLOCKID)
* Number of Days per Week Facility is Open (DAYSOPEN)
* Number of Beds (BEDS)
* Total Number of Doctors (DOCTORNUM)
* Facility Usually Offers FP (FPOFFERED)
* Provide Antenatal Services (ANCPROV)

**IPUMS Preselected Variables**
* PMA Sample Number (SAMPLE)
* PMA Country (COUNTRY)
* Year (YEAR)
* PMA Survey Round in this Country (ROUND)
* Enumeration Area (EAID)
* Facility ID (FACILITYID)
* Consent Obtained from Interviewee (CONSENTSQ)
* Strata (STRATA)
* Type of Facility, Detailed (FACILITYTYPE)
* Type of Facility, General (FACILITYTYPEGEN)

## 4. Read an IPUMS Multiple Indicator Cluster Surveys (MICS) Data Extract

From the [**IPUMS Multiple Indicator Cluster Surveys (MICS) Webpage**](https://www.mtusdata.org/mtus): IPUMS MICS harmonizes the Multiple Indicator Cluster Surveys (MICS) implemented by countries under the program developed by UNICEF. IPUMS MICS contains consistently coded variables on the health and well-being of women, children, adolescents, men, household members, and households. Users can determine variable availability at a glance and view documentation to create and request custom extracts with just the variables and samples they need.

### ★ Create Your UNICEF Account ★
Prior to downloading from the IPUMS PMA repository, you will need to [**set up an UNICEF account**](https://knowledge.unicef.org/user/login).

##### Bangladesh 2019 (Children 0-4)

**Variable Selection**
* Bangladesh, District 2006-2019 (Consistent Boundaries, GIS) (GEO2_BD)
* Child's Age in Months (AGECHMO)
* Urban Wealth Index Quintile (WINDEX5U)
* Rural Wealth Index Quintile (WINDEX5R)
* Highest Level of School Attended by Mother/Caretaker (EDLEVELMOM)
* Child Attends Early Childhood Education Programme (ECATTEND)

**IPUMS Preselected Variables**
* Unit of Analysis (UNITANALYSIS)
* IPUMS Sample Identifer (SAMPLE)
* Country (COUNTRY)
* Survey Year (YEAR)
* Survey Round (ROUND)
* Subnational Survey (SUBNATIONAL)
* Serial Number (SERIAL)
* Cluster Number (CLUSTER)
* Household Number (HHNO)
* Child's Line Number (LINECH)
* Mother/Caretaker's Line Number (LINEMC)
* Result of Interview for Children Under 5 (RESULTCH)
* Children Under 5 Sample Weight (WEIGHTCH)

## Recommended Next Steps
* **Continue with Chapter 2: IPUMS Data Acquisition and Extraction**
  * 2.1: IPUMS USA Data Extraction Using ipumsr
  * 2.2: IPUMS CPS Data Extraction Using ipumsr
  * 2.3: IPUMS International Microdata Extraction Using ipumsr
  * 2.4: IPUMS NHGIS Data Extraction Using ipumsr
  * 2.5: IPUMS Time Use Data Extraction Using ipumsr
  * 2.6: IPUMS Health Surveys Data Extraction Using ipumsr
  * 2.8: Reading IPUMS Higher Education Data Extracts Using ipumsr