# IPUMS Time Use Data Extraction Using ipumsr
### by [Kate Vavra-Musser](https://vavramusser.github.io) for the [R Spatial Notebook Series](https://vavramusser.github.io/r-spatial)

This notebook builds on the the workflow introduced in the **[Introduction to the IPUMS API for R Users](https://tech.popdata.org/ipumsr/articles/ipums-api.html)** article on the IPUMS website.  As the author of the R Spatial Notebook series, I recognize the IPUMS article as a significant inspiration and source of information for this notebook.

## Introduction
The [IPUMS Time Use](https://timeuse.ipums.org) database offers harmonized microdata from the [American Time Use Survey (ATUS)](https://www.bls.gov/tus) and international time use surveys, providing insights into how individuals allocate their time across various activities. It includes detailed records on work, caregiving, leisure, and daily routines, enabling analysis of time-use patterns and trends across different populations and time periods. Through harmonization, IPUMS Time Use ensures data can be seamlessly compared across survey years and countries, addressing differences in activity coding, survey methodologies, and geographic contexts.

**From the [IPUMS Time Use Website](https://timeuse.ipums.org):** These projects provide free individual-level time use data for research purposes. The data extract systems make it easy to create data sets containing time use and other variables a user needs.

#### Data Included in the IPUMS Time Use Repository
The IPUMS Time Use data repository is divided into three subsections.

#### [American Time Use Survey (ATUS)](https://www.atusdata.org/atus)
The ATUS is a nationally representative U.S. time diary survey for period since 2003. IPUMS Time Use harmonizes these data and provides a data extract builder that allows users to create custom time use variables and data extracts for analysis. ATUS-X is a collaboration of the IPUMS Center for Data Integration and the [Maryland Population Research Center](https://www.popcenter.umd.edu).
* The IPUMS ATUS repository incudes annual ATUS data from 2003 to present.

#### [American Heritage Time Use Study (AHTUS)](https://www.ahtusdata.org/ahtus)
The AHTUS is a harmonized collection of time diary data from the U.S. for the period 1930 to 2012. AHTUS-X is a data extract builder that allows users to create custom time use variables and data extracts for analysis. This project is a collaboration of the Minnesota Population Center, the Maryland Population Research Center and the Centre for Time Use Research at the University College London.  The IPUMS AHTUS repository includes data from the following surveys:

* [The 1930 Women's College Time Use Study (1930-1931)](https://www.ahtusdata.org/ahtus/us1930a.shtml) administered by the United States Department of Agriculture (USDA)
* [The Multinational Comparative Time-Budget Research Project (1965-1966)](https://www.ahtusdata.org/ahtus/us1965a.shtml), including a Jackson Michigan and a national USA sample, administered by the [Survey Research Center at the University of Michigan](https://src.isr.umich.edu) and the [Social Relations Department at Harvard University](https://en.wikipedia.org/wiki/Harvard_Department_of_Social_Relations), with funding from the National Science Foundation (NSF) (part of the Szalai Multinational Time Budget Research Project).
* [American's Use of Time: Time Use in Economic and Social Accounts (1975-1976)](https://www.ahtusdata.org/ahtus/us1975a.shtml) administered by the [Survey Research Center at the University of Michigan](https://src.isr.umich.edu) with funding from the National Science Foundation (NSF) and the US Department of Health, Education, and Welfare.
* [American's Use of Time Project (1985)](https://www.ahtusdata.org/ahtus/us1985a.shtml) administered by the [Survey Research Center at the University of Michigan](https://src.isr.umich.edu) with funding from The National Science Foundation (NSF) and the US Department of Health, Education, and Welfare.
* [National Human Activity Pattern Survey (1993)](https://www.ahtusdata.org/ahtus/us1993a.shtml) administered by the [Survey Research Center at the University of Maryland](https://jpsm.umd.edu) with funding from the United States Environmental Protection Agency (EPA).
* [National Time-Diary Study (1994-1995)](https://www.ahtusdata.org/ahtus/us1995a.shtml) administered by the [Survey Research Center at the University of Maryland](https://jpsm.umd.edu) with funding from the Environmental Protection Agency (EPA).
* [Family Interaction, Social Capital, and Trends in Time Use Study (FISCT) (1998)](https://www.ahtusdata.org/ahtus/us1998a.shtml) administered by the [Survey Research Center at the University of Maryland](https://jpsm.umd.edu) with funding from the National Science Foundation (NSF) and the National Institute on Aging (NIA).  This data set combines two small-scale surveys collected by the University of Maryland Survey Research Centre, the 1998-99 Family Interaction, Social Capital, and Trends in Time Use Study (FISCT), a small-scale contiguous state sample funded by the National Science Foundation, and the 1999-2001 National Survey of Parents (NSP), funded by the Sloane Foundation.
* [American Time Use Survey (ATUS) (2003-2012 and 2018)](https://www.bls.gov/tus)

#### [Multinational Time Use Study (MTUS)](https://www.mtusdata.org/mtus)
MTUS is a collection of time diary data from a growing number of countries that are harmonized for compatibility across time and space. IPUMS MTUS is a data extract builder that allows users to create custom time use variables and data extracts for analysis. This project is a collaboration of the Minnesota Population Center, the Maryland Population Research Center and the Centre for Time Use Research at University College London.
* The IPUMS MTUS repository includes [time use surveys from over 20 countries dating back to the 1960s](https://www.mtusdata.org/mtus/samples.shtml). 

### ★ Prerequisites ★
* Complete [Chapter 1.1 Introduction to IPUMS and the IPUMS API](https://platform.i-guide.io/notebooks/82d3b176-e4e6-4307-8186-318a3fe6c81a)
* Set Up Your [IPUMS Account and API Key](https://account.ipums.org/api_keys)

### Notebook Overview
1. Setup
2. IPUMS American Time Use Survey (ATUS) Metadata Exploration
3. IPUMS American Time Use Survey (ATUS) Data Extraction Specification and Submission
4. IPUMS American Heritage Time Use Study (AHTUS) Metadata Exploration
5. IPUMS American Heritage Time Use Study (AHTUS) Data Extraction Specification and Submission
6. IPUMS Multinational Time Use Study (MTUS) Metadata Exploration
7. IPUMS Multinational Time Use Study (MTUS) Data Extraction Specification and Submission

## 1. Setup
This section will guide you through the process of installing essential packages and setting your IPUMS API key.

#### Required Packages

[**dplyr**](https://cran.r-project.org/web/packages/dplyr/index.html) A Grammar of Data Manipulation. This notebook uses the the following function from *dplyr*.

* [*filter*](https://rdrr.io/cran/dplyr/man/filter.html) · keep rows that match a condition
* This notebook also uses [*%>%*](https://magrittr.tidyverse.org/reference/pipe.html), referred to as the *pipe* operator.  The *pip* operator is used to pass the output from one function directly into the next function for the purpose of creating streamlined workflows and is a commonly used component of the [*tidyverse*](https://www.tidyverse.org).

[**ipumsr**](https://cran.r-project.org/web/packages/ipumsr/index.html) An R Interface for Downloading, Reading, and Handling IPUMS Data.  This notebook uses the the following functions from *ipumsr*.

* [*define_extract_micro*](https://rdrr.io/github/mnpopcenter/ripums/man/define_extract_micro.html) · define an extract request for an IPUMS microdata collection
* [*download_extract*](https://rdrr.io/cran/ipumsr/man/download_extract.html) · download a completed IPUMS data extract
* [*get_sample_info*](https://rdrr.io/cran/ipumsr/man/get_sample_info.html) · list available samples for IPUMS microdata collections
* [*read_ipums_ddi*](https://rdrr.io/cran/ipumsr/man/read_ipums_ddi.html) · read metadata about an IPUMS microdata extract from a DDI codebook (.xml) file
* [*read_ipums_micro*](https://rdrr.io/cran/ipumsr/man/read_ipums_micro.html) · read data from an IPUMS microdata extract
* [*set_ipums_api_key*](https://rdrr.io/cran/ipumsr/man/set_ipums_api_key.html) · set your IPUMS API key
* [*submit_extract*](https://rdrr.io/cran/ipumsr/man/submit_extract.html) · submit an extract request via the IPUMS API
* [*wait_for_extract*](https://rdrr.io/cran/ipumsr/man/wait_for_extract.html) · wait for an extract to finish processing

[**stringr**](https://cran.r-project.org/web/packages/stringr/index.html) Simple, Consistent Wrappers for Common String Operations.  This notebook uses the following function from *stringr*.

* [*str_detect*](https://stringr.tidyverse.org/reference/str_detect.html) · detect the presence or absence of a match

### 1a. Install and Load Required Packages
If you have not already installed the required packages, uncomment and run the code below:

In [None]:
# install.packages(c("dplyr", "ipumsr", "stringr"))

Load the packages into your workspace.

In [None]:
library(dplyr)
library(ipumsr)
library(stringr)

#### 1b. Set Your IPUMS API Key

Store your [IPUMS API key](https://account.ipums.org/api_keys) in your environment using the following code.

Refer to [Chapter 1.1 Introduction to IPUMS and the IPUMS API](https://platform.i-guide.io/notebooks/82d3b176-e4e6-4307-8186-318a3fe6c81a) for instructions on setting up your IPUMS account and API key.

In [None]:
ipumps_api_key = readline("Please enter your IPUMS API key: ")
set_ipums_api_key(ipumps_api_key, save = T, overwrite = T)

## 2. IPUMS American Time Use Survey (ATUS) Metadata Exploration

Before submitting an IPUMS data extraction request, it’s essential to ensure the parameters of the extraction definition are set up correctly.  The extraction definition specifies the sample, variables, and other options.

If this is your first time using the IPUMS API in R, or if you are setting up a new data extract for a new project, it is a good idea to start by exploring the available data which can be done using the *ipumsr* package.

### 2a. Review the List of Samples

First, let's take a look at the entire list of datasets available from the [IPUMS ATUS data repository](https://www.atusdata.org/atus/sample_summary.shtml).  The time use data available for direct extraction using the IPUMS API include the [American Time Use Survey (ATUS)](https://www.bls.gov/tus) from 2003 to present.

For this step, we will use the [*get_sample_info*](https://rdrr.io/cran/ipumsr/man/get_sample_info.html) function from the [**ipumsr**](https://cran.r-project.org/web/packages/ipumsr/index.html) package.  This function will return a list of all datasets from the specified IPUMS data repository which are available to be downloaded using the IPUMS API.  Since we are focusing on IPUMS ATUS, we will specify that we want to view all available samples from the IPUMS ATUS repository by passing *"atus"* to the function.  This code stores the metadata from all available samples in the IPUMS ATUS repository to the object *metadata_atus*.

**★ Pro Tip:** You can use the [*get_sample_info*](https://rdrr.io/cran/ipumsr/man/get_sample_info.html) function to retrieve metadata for any of the available IPUMS repositories by changing the database reference code.

In [None]:
# retrive and view the list of samples from the IPUMS ATUS database
metadata_atus <- get_sample_info("atus")

# view the list of samples
metadata_atus

Viewing the metdata table tells us that there are 21 samples available from IPUMS ATUS.

The IPUMS USA metada table has 1) a **name**, corresponding to a sample identification code, and 2) a **description**, providing a short description or label for each sample.  We will need to select a sample and make note of its sample identification code (**name**) which we will use when defining our data extraction.  Refer to the [Sample-Level Information](https://www.atusdata.org/atus/sample_summary.shtml) pafe on the IPUMS ATUS website for a list of all IPUMS ATUS samples and more detailed information on each sample.

For this exercise we will use the **ATUS 2020** which is referred to using identification code (*name*) **at2020**.

## 3. IPUMS American Time Use Survey (ATUS) Data Extraction Specification and Submission

Once we have reviewed the available samples and decided on the dataset, the next step is to set up a data extraction using the [*define_extract_micro*](https://rdrr.io/github/mnpopcenter/ripums/man/define_extract_micro.html) function from the [**ipumsr**](https://cran.r-project.org/web/packages/ipumsr/index.html) package.  This function requires the following minimum parameters:

* *collection* · the IPUMS data collection for the extract (for this exercise we are downloading from IPUMS ATUS so we use the code "atus")
* *description* · text description of the extract
* *samples* · vector of samples to include in the extract; samples should be specified using the sample identification codes
* *variables* · vector of variables to include in the extract

### 3a. Define the Variable List

We already know what we will pass to the function for the *collection* ("atus") and *samples* ("at2020") parameters.  Next we will need to determine which variables we want.

If you are already familiar with IPUMS ATUS data extractions using their web-based data extract platforms, you might already know which variables are available for our selected sample.  If not, the best place to start is by exploring the web-based [**IPUMS ATUS Data Extract Platform**](hhttps://www.atusdata.org/atus-action/variables/group) to see what variables are available and identify the appropriate variable codes.  Before searching for variables, be sure to click the **Select Samples** button in the top-left corner of the search platform and select the samples you are planning to use.  Since we are using the ATUS 2020 sample for this example, you should select the ATUS 2020 sample within the search platform.  What variables are available, and the codes used for the variables, may differ based on your selected sample, so it is important to be specific.

For this example we will use the following set of variables from the ATUS 2020.

**Variable Selection**
* [County FIPS Code (COUNTY)](https://en.wikipedia.org/wiki/Federal_Information_Processing_Standard_state_code)
* Age (AGE)
* Sex (SEX)
* Hours Usually Worked per Week (UHRSWORKT)
* Total Time Spent on Secondary Childcare for All Children (SCC_ALL)
* Eldercare Provided in Last 3 Months (ECPRIOR)

By default, the data extraction will also include both our selected variables and a set of IPUMS preselected variables.  The preselected variables include metainformation such as identification codes and survey weights.  We will explore and list the preselected variables after completing the data extraction.

### 3b. Define the Data Extract

Now that we know the collection ("atus"), sample ("at2020"), and list of variables (c("COUNTY", "AGE", "SEX", "UHRSWORKT", "SCC_ALL", "ECPRIOR")) we are ready to submit our data extract request.  In this step we will add a text description of the request which can be anything and is included to help us differentiate between requests.  For this extract we will use the simple description "IPUMS ATUS Data Extraction".

Here we pass all the extraction definition information to the [*define_extract_micro*](https://rdrr.io/github/mnpopcenter/ripums/man/define_extract_micro.html) function from the [**ipumsr**](https://cran.r-project.org/web/packages/ipumsr/index.html) package and store the resulting extraction definition in the object *extract_definition*.

**★ Pro Tip:** You can specify multiple samples in the same data extract by specifying all sample identification codes as a list.  Be sure that the variables you specify are available for all of the samples!

In [None]:
# set up the data extraction definition
extract_definition <- define_extract_micro(collection = "atus",
                                           description = "IPUMS ATUS Data Extraction",
                                           samples = c("at2020"),
                                           variables = c("COUNTY", "AGE", "SEX", "UHRSWORKT", "SCC_ALL", "ECPRIOR"))

Let's review the extraction definition information to make sure we have set it up the way we intended.

In [None]:
# review the extraction definition
extract_definition

Everything looks good so we will submit the extraction request, wait for it to complete, and download the resulting data.

### 3c. Submit the Extract Request

Now that the extraction definition is set up, we can submit it to the IPUMS API using the [*submit_extract*](https://rdrr.io/cran/ipumsr/man/submit_extract.html) from the [**ipumsr**](https://cran.r-project.org/web/packages/ipumsr/index.html).

For this exercise, after submitting the request we will also use the [*wait_for_extract*](https://rdrr.io/cran/ipumsr/man/wait_for_extract.html) function from the [**ipumsr**](https://cran.r-project.org/web/packages/ipumsr/index.html) package to monitor the status of the request.  This is not a necessary step but it is helpful, especially when submitting large requests.

Finally, once the extract is complete, we can download it using the [*download_extract*](https://rdrr.io/cran/ipumsr/man/download_extract.html) function from the [**ipumsr**](https://cran.r-project.org/web/packages/ipumsr/index.html) package and save it in the object *filepath*.

In [None]:
# submit extraction request
extract_submitted <- submit_extract(extract_definition)

# wait for completion
extraction_complete <- wait_for_extract(extract_submitted)

# check completion status
extraction_complete$status

# get the extract filepath
filepath <- download_extract(extract_submitted, overwrite = T)

### 3d. Review the Extract

Once we have downloaded the extract, we are ready to review it and transform it to a format we can easily use.  The data extract download will contain the following two files.

1. A [DDI (Data Documentation Initiative)](https://ddialliance.org) codebook file (file extension .xml) containing metadata and descriptive information for the data.
2. A zipped data (.dat) file (file extension .dat.gz) containing the data.

Read the ddi and data files into a format which we can work with in R.  The final *dat* object will contain the data from our extraction in a table format which is easy to use in R.

In [None]:
ddi <- read_ipums_ddi(filepath)
dat <- read_ipums_micro(ddi)

We now have a useable version of our dataset stored in *dat*.  Let's take a look at the number of observations and variables in the data.

In [None]:
dim(dat)

The 2020 ATUS data we downloaded includes information on 12 variables for 8782 individuals.

Let's take a look at the first few lines of the data.

In [None]:
head(dat)

Notice that this data is in [*tibble*](https://tibble.tidyverse.org) format rather than the more common *data.frame* format you might be used to as an R user.  A tibble can be thought of as a version of a data.frame that includes additional functionality and metadata visibility.  It is also more compatible with the [*tidyverse*](https://www.tidyverse.org) packages, including the [*dplyr*](https://cran.r-project.org/web/packages/dplyr/index.html) package we use in this notebook.

As mentioned above, IPUMS includes a set of preselected variables in data extractions, along with the variables selected by the user.  We only selected 6 variables for the extraction but the resulting download includes 18 variables.  Let's take a look at the list of column names.

In [None]:
colnames(dat)

This list includes the 6 variables we originally selected as well as 6 additional IPUMS preselected variables which mainly include metadata such as identification codes, weights, and other metainformation.

**Variable Selection**
* [County (FIPS Code) (COUNTY)](https://en.wikipedia.org/wiki/Federal_Information_Processing_Standard_state_code)
* Age (AGE)
* Sex (SEX)
* Hours Usually Worked per Week (UHRSWORKT)
* Total Time Spent on Secondary Childcare for All Children (SCC_ALL)
* Eldercare Provided in Last 3 Months (ECPRIOR)

**IPUMS Preselected Variables**
* Survey Year (YEAR)
* ATUS Case ID (CASEID)
* Household Serial Number (SERIAL)
* Person Number (General) (PERNUM)
* Person Line Number (LINENO)
* Person Weight, 2020 Methodology (WT20)

### 3e. Save the Data

Finally, let's save the data we extracted from IPUMS ATUS.  We will save the data in the following two formats:

* A *.rds* version of the data.  The **R Data Serialization (RDS)** format will retain metadata for the next time we want to import the file back into R.  One downside to the .rds format is it is only useable within R.
* A *.csv* version of the data.  The [**Comma-Separated Values (CSV)**](https://en.wikipedia.org/wiki/Comma-separated_values) format is versitile and can be easily accessed in other programs.  However, the CSV file format does not include metadata such as labels for variable levels.

**★ Pro Tip:** Setting the *row.names* parameter to *FALSE* in the *write.csv* function will supress the creation of an additional column of index values which is automatically generated in the CSV writing process.

In [None]:
saveRDS(dat, "ipums_atus_example.rds")
write.csv(dat, "ipums_atus_example.csv", row.names = F)

At the end of this exercise we have a freshly downloaded dataset from the IPUMS ATUS repository saved in our workspace.

## 4. IPUMS American Heritage Time Use Study (AHTUS) Metadata Exploration

Before submitting an IPUMS data extraction request, it’s essential to ensure the parameters of the extraction definition are set up correctly.  The extraction definition specifies the sample, variables, and other options.

If this is your first time using the IPUMS API in R, or if you are setting up a new data extract for a new project, it is a good idea to start by exploring the available data which can be done using the *ipumsr* package.

### 4a. Review the List of Samples

First, let's take a look at the entire list of datasets available from the [IPUMS AHTUS data repository](https://www.ahtusdata.org/ahtus/sample_summary.shtml).  The IPUMS AHTUS data repository incudes seven histoic US time-use studies and [American Time Use Survey](https://www.atusdata.org/atus) information for 2003-2012 and 2018.

For this step, we will use the [*get_sample_info*](https://rdrr.io/cran/ipumsr/man/get_sample_info.html) function from the [**ipumsr**](https://cran.r-project.org/web/packages/ipumsr/index.html) package.  This function will return a list of all datasets from the specified IPUMS data repository which are available to be downloaded using the IPUMS API.  Since we are focusing on IPUMS AHTUS, we will specify that we want to view all available samples from the IPUMS AHTUS repository by passing *"ahtus"* to the function.  This code stores the metadata from all available samples in the IPUMS AHTUS repository to the object *metadata_ahtus*.

**★ Pro Tip:** You can use the [*get_sample_info*](https://rdrr.io/cran/ipumsr/man/get_sample_info.html) function to retrieve metadata for any of the available IPUMS repositories by changing the database reference code.

In [None]:
# retrive and view the list of samples from the IPUMS AHTUS database
metadata_ahtus <- get_sample_info("ahtus")

# view the list of samples
metadata_ahtus

Viewing the metdata table tells us that there are 118 samples available from IPUMS ATUS.

The IPUMS AHTUS metada table has 1) a **name**, corresponding to a sample identification code, and 2) a **description**, providing a short description or label for each sample.  We will need to select a sample and make note of its sample identification code (**name**) which we will use when defining our data extraction.  Refer to the [Sample-Level Information](https://www.ahtusdata.org/ahtus/sample_summary.shtml) page on the IPUMS AHTUS website for a list of all IPUMS AHTUS samples and more detailed information on each sample.

For this exercise we will use the **ATUS 2010** which is referred to using identification code (*name*) **us2010a**.

## 5. IPUMS AHTUS Data Extraction Specification and Submission

Once we have reviewed the available samples and decided on the dataset, the next step is to set up a data extraction using the [*define_extract_micro*](https://rdrr.io/github/mnpopcenter/ripums/man/define_extract_micro.html) function from the [**ipumsr**](https://cran.r-project.org/web/packages/ipumsr/index.html) package.  This function requires the following minimum parameters:

* *collection* · the IPUMS data collection for the extract (for this exercise we are downloading from IPUMS USA so we use the code "ahtus")
* *description* · text description of the extract
* *samples* · vector of samples to include in the extract; samples should be specified using the sample identification codes
* *variables* · vector of variables to include in the extract

### 5a. Define the Variable List

We already know what we will pass to the function for the *collection* ("ahtus") and *samples* ("us2010a") parameters.  Next we will need to determine which variables we want.

If you are already familiar with IPUMS USA data extractions using their web-based data extract platforms, you might already know which variables are available for our selected sample.  If not, the best place to start is by exploring the web-based [**IPUMS AHTUS Data Extract Platform**](https://www.ahtusdata.org/ahtus-action/variables/group) to see what variables are available and identify the appropriate variable codes.  Before searching for variables, be sure to click the **Select Samples** button in the top-left corner of the search platform and select the samples you are planning to use.  Since we are using the 2010 ATUS sample for this example, you should select the 2010 ATUS sample within the search platform.  What variables are available, and the codes used for the variables, may differ based on your selected sample, so it is important to be specific.

For this example we will use the following set of variables from the 2010 ATUS.

**Variable Selection**
* State (STATE)
* Age (AGE)
* Sex (SEX)
* Household Type (HHTYPE)
* Number of Hours Worked Per Week (WKHRS)
* Number of Full-Time Workers in Household (NWORK)

By default, the data extraction will also include both our selected variables and a set of IPUMS preselected variables.  The preselected variables include metainformation such as identification codes and survey weights.  We will explore and list the preselected variables after completing the data extraction.

### 5a. Define the Data Extract

Now that we know the collection ("ahtus"), sample ("us2010a"), and list of variables (c("STATE", "AGE", "SEX", "HHTYPE", "WKHRS", "NWORK")) we are ready to submit our data extract request.  In this step we will add a text description of the request which can be anything and is included to help us differentiate between requests.  For this extract we will use the simple description "IPUMS AHTUS Data Extraction".

Here we pass all the extraction definition information to the [*define_extract_micro*](https://rdrr.io/github/mnpopcenter/ripums/man/define_extract_micro.html) function from the [**ipumsr**](https://cran.r-project.org/web/packages/ipumsr/index.html) package and store the resulting extraction definition in the object *extract_definition*.

**★ Pro Tip:** You can specify multiple samples in the same data extract by specifying all sample identification codes as a list.  Be sure that the variables you specify are available for all of the samples!

In [None]:
# set up the data extraction definition
extract_definition <- define_extract_micro(collection = "ahtus",
                                           description = "IPUMS AHTUS Data Extraction",
                                           samples = c("us2010a"),
                                           variables = c("STATE", "AGE", "SEX", "HHTYPE", "WKHRS", "NWORK"))

Let's review the extraction definition information to make sure we have set it up the way we intended.

In [None]:
# review the extraction definition
extract_definition

Everything looks good so we will submit the extraction request, wait for it to complete, and download the resulting data.

### 5c. Submit the Extract Request

Now that the extraction definition is set up, we can submit it to the IPUMS API using the [*submit_extract*](https://rdrr.io/cran/ipumsr/man/submit_extract.html) from the [**ipumsr**](https://cran.r-project.org/web/packages/ipumsr/index.html).

For this exercise, after submitting the request we will also use the [*wait_for_extract*](https://rdrr.io/cran/ipumsr/man/wait_for_extract.html) function from the [**ipumsr**](https://cran.r-project.org/web/packages/ipumsr/index.html) package to monitor the status of the request.  This is not a necessary step but it is helpful, especially when submitting large requests.

Finally, once the extract is complete, we can download it using the [*download_extract*](https://rdrr.io/cran/ipumsr/man/download_extract.html) function from the [**ipumsr**](https://cran.r-project.org/web/packages/ipumsr/index.html) package and save it in the object *filepath*.

In [None]:
# submit extraction request
extract_submitted <- submit_extract(extract_definition)

# wait for completion
extraction_complete <- wait_for_extract(extract_submitted)

# check completion status
extraction_complete$status

# get the extract filepath
filepath <- download_extract(extract_submitted, overwrite = T)

### 5d. Review the Extract

Once we have downloaded the extract, we are ready to review it and transform it to a format we can easily use.  The data extract download will contain the following two files.

1. A [DDI (Data Documentation Initiative)](https://ddialliance.org) codebook file (file extension .xml) containing metadata and descriptive information for the data.
2. A zipped data (.dat) file (file extension .dat.gz) containing the data.

Read the ddi and data files into a format which we can work with in R.  The final *dat* object will contain the data from our extraction in a table format which is easy to use in R.

In [None]:
ddi <- read_ipums_ddi(filepath)
dat <- read_ipums_micro(ddi)

We now have a useable version of our dataset stored in *dat*.  Let's take a look at the number of observations and variables in the data.

In [None]:
dim(dat)

The 2010 ATUS data we downloaded includes information on 13 variables for 13,260 individuals.

Let's take a look at the first few lines of the data.

In [None]:
head(dat)

Notice that this data is in [*tibble*](https://tibble.tidyverse.org) format rather than the more common *data.frame* format you might be used to as an R user.  A tibble can be thought of as a version of a data.frame that includes additional functionality and metadata visibility.  It is also more compatible with the [*tidyverse*](https://www.tidyverse.org) packages, including the [*dplyr*](https://cran.r-project.org/web/packages/dplyr/index.html) package we use in this notebook.

As mentioned above, IPUMS includes a set of preselected variables in data extractions, along with the variables selected by the user.  We only selected 6 variables for the extraction but the resulting download includes 13 variables.  Let's take a look at the list of column names.

In [None]:
colnames(dat)

This list includes the 6 variables we originally selected as well as 7 additional IPUMS preselected variables which mainly include metadata such as identification codes, weights, and other metainformation.

**Variable Selection**
* State (STATE)
* Age (AGE)
* Sex (SEX)
* Household Type (HHTYPE)
* Number of Hours Worked Per Week (WKHRS)
* Number of Full-Time Workers in Household (NWORK)

**IPUMS Preselected Variables**
* Sample (SAMPLE)
* Person Number (PERNUM)
* Identifier (IDENT)
* Person Identifer (PID)
* Person Serial Number (SERIAL)
* Year Diary Kept (YEAR)
* Recommended Sample (Day) Weight Removing Low Quality Diaries and Missing Age or Sex (RECWGHT)

### 5e. Save the Data

Finally, let's save the data we extracted from IPUMS USA.  We will save the data in the following two formats:

* A *.rds* version of the data.  The **R Data Serialization (RDS)** format will retain metadata for the next time we want to import the file back into R.  One downside to the .rds format is it is only useable within R.
* A *.csv* version of the data.  The [**Comma-Separated Values (CSV)**](https://en.wikipedia.org/wiki/Comma-separated_values) format is versitile and can be easily accessed in other programs.  However, the CSV file format does not include metadata such as labels for variable levels.

**★ Pro Tip:** Setting the *row.names* parameter to *FALSE* in the *write.csv* function will supress the creation of an additional column of index values which is automatically generated in the CSV writing process.

In [None]:
saveRDS(dat, "ipums_ahtus_example.rds")
write.csv(dat, "ipums_ahtus_example.csv", row.names = F)

At the end of this exercise we have a freshly downloaded dataset from the IPUMS USA repository saved in our workspace.

## 6. IPUMS Multinational Time Use Study (MTUS) Metadata Exploration

From the [**IPUMS Multinational Time Use Study (MTUS) Webpage**](https://www.mtusdata.org/mtus): MTUS is a collection of time diary data from a growing number of countries that are harmonized for compatibility across time and space. IPUMS MTUS is a data extract builder that allows users to create custom time use variables and data extracts for analysis. This project is a collaboration of the Minnesota Population Center, the Maryland Population Research Center and the Centre for Time Use Research at University College London.

### 5a. Review the List of Samples

In [None]:
# retrive and view the list of samples from the IPUMS USA database
metadata_mtus <- get_sample_info("mtus")

# view the dimensions of the list of samples
dim(metadata_mtus)

In [None]:
# view the first few lines of the list of samples
head(metadata_mtus)

In [None]:
# filter the list of samples by country
metadata_mtus %>% filter(str_detect(description, "Finland"))

## 5. IPUMS MTUS Data Extraction Specification and Submission

Once we know the dataset and variable selection we want, we can define our data extraction using the *define_extract_micro* function from the *ipumsr* package.  This function requires the following parameters:

### 5a. Define the Data Extract

For this example we will use the 2010 ACS sample.

**Variable Selection**
* Region: Finland (REGION_FI)
* Age (AGE)
* Sex (SEX)
* Sector of Employment (SECTOR)
* Hours Paid Work Last Week Including Overtime (WORKHRS)
* Has Disability/Long-Term Health Condition (DISAB)

By default, the data extraction will also include a number of IPUMS preselected variables.  These variables include metainformation such as identification codes and survey weights.  We will explore and list the preselected variables after completing the data extraction.

* **collection** Code for the IPUMS collection represented by this extract request.  In our case we are downloading from IPUMS USA so we use the code "usa".
* **description** Text description of the extract.
* **samples** Vector of samples to include in the extract request.  In our case we are downloading the 2010 ACS data (us2010a).
* **variables** Vector of variable names or a list of detailed variable specifications to include in the extract request.

In [None]:
# set up the data extraction definition
extract_definition <- define_extract_micro(collection = "mtus",
                                           description = "IPUMS MTUS Data Extraction",
                                           samples = c("fi2009a"),
                                           variables = c("REGION_FI", "AGE", "SEX", "SECTOR", "WORKHRS", "DISAB"))

In [None]:
# review the extraction definition
extract_definition

### 5b. Submit the Extract Request

In [None]:
# submit extraction request
extract_submitted <- submit_extract(extract_definition)

# wait for completion
extraction_complete <- wait_for_extract(extract_submitted)

# check completion status
extraction_complete$status

# get the extract filepath
filepath <- download_extract(extract_submitted, overwrite = T)

### 5c. Review the Extract

The data extract download will contain the following two files.

1. A [DDI (Data Documentation Initiative)](https://ddialliance.org) codebook file (file extension .xml) containing metadata and descriptive information for you data.
2. A zipped data (.dat) file (file extension .dat.gz) containing your data.

Read the ddi and data files into a format which we can work with in R.

In [None]:
ddi <- read_ipums_ddi(filepath)
dat <- read_ipums_micro(ddi)

In [None]:
dim(dat)

In [None]:
head(dat)

In [None]:
colnames(dat)

**Variable Selection**
* Region: Finland (REGION_FI)
* Age (AGE)
* Sex (SEX)
* Sector of Employment (SECTOR)
* Hours Paid Work Last Week Including Overtime (WORKHRS)
* Has Disability/Long-Term Health Condition (DISAB)

**IPUMS Preselected Variables**
* Person Serial Number (SERIAL)
* Sample (SAMPLE)
* Identifier (IDENT)
* Country or Region of Survey (COUNTRY)
* Household ID (HLDID)
* Person/Diarist Identifer (PERSID)
* Diary Order (DIARY)
* Year Diary Kept (YEAR)
* Diary Identifier (DIARYID)
* Person Number (PERNUM)
* Proposed Weight (PROPWT)

### 6d. Save the Data

Next let's save a couple versions of our IPUMS ACS data file.

* A *.rds* version of the data.  The **R Data Serialization (RDS)** format will retain metadata for the next time we want to import the file back into R.  One downside to the .rds format is it is only useable within R.
* A *.csv* version of the data.  The [**Comma-Separated Values (CSV)**](https://en.wikipedia.org/wiki/Comma-separated_values) format is versitile and can be easily accessed in other programs.  However, the CSV file format does not include metadata such as labels for variable levels.

**★ Pro Tip:** Setting the *row.names* parameter to *FALSE* in the *write.csv* function will supress the creation of an additional column of index values which is automatically generated in the CSV writing process.

In [None]:
saveRDS(dat, "ipums_mtus_example.rds")
write.csv(dat, "ipums_mtus_example.csv", row.names = F)

## Recommended Next Steps
* **Continue with Chapter 2: IPUMS Data Acquisition and Extraction**
  * 2.1: IPUMS USA Data Extraction Using ipumsr
  * 2.2: IPUMS CPS Data Extraction Using ipumsr
  * 2.3: IPUMS International Microdata Extraction Using ipumsr
  * 2.4: IPUMS NHGIS Data Extraction Using ipumsr
  * 2.6: IPUMS Health Surveys Data Extraction Using ipumsr
  * 2.7: Reading IPUMS Global Health Data Extracts Using ipumsr
  * 2.8: Reading IPUMS Higher Education Data Extracts Using ipumsr

## Quick Code
Don't forget to update the code with your IPUMS API key!