# Missing Data Treatment: A Hand-on Illustration using `R` Package `mice`

## 0. A Bird-eye View of this Session

### 0.1 Learning Intention
This hand-on session aims to empower the `R` user community with a gentle introduction to the multiple imputation procedure in order to address missing data problems using one widely used `R` package: multivariate imputation by chained equations (`mice`, van Buuren & Groothuis-Oudshoorn, 2011).

### 0.2 Success Criteria
At the end of this session, participants would gain *interest* and *confidence* in dealing with missing data in their work. 

### 0.3 Learning Structure

## 1. Background

### 1.1 Rationale for Multiple Imputation
Complete-case analyses are only valid and unbiased under very restricted conditions (MCAR, defined below). Even when such condition holds, removing cases would cause huge loss in estimation efficiency. In addition, all resources and efforts may go to waste due to one single impurity in an observation. Multiple imputation tries to salvage imperfections by filling the "holes" with "guesses". The uncertainty of our guesses is reflected in the variation of the imputed values--the wider the variation, the less certain we are about our guesses.

### 1.2 Two Approaches to Missing Data Treatment
Joint modelling (JM, Schafter (1997); `R` package `jomo`) and fully conditional specification (FCS) are the two main approaches to missing data treatment. FCS is also known as multivariate imputation by chained equations (MICE). This session will focus exclusively on the `R` package `mice` by van Buuren and Groothuis-Oudshoorn (2011), currently in Version 3.14.0, to demonstrate the power and flexibility of the MICE procedure for handling missing data. Other `R` packages that work with missing data are `Amelia`, `Hmisc`, `jomo`, `mi`, `norm`, `norm2` and `pan`. See Table 5.1 of Kleinke et al. (2020) for a comparison of these packages. Table 6 of Grund et al. (2018) provides specific advice on package uses for multilevel models.

## 2. Data Missing Mechanism (Rubin, 1976)

### 2.1 Missing Completely at Random (**MCAR**)

### 2.2 Missing at Random (**MAR**)

### 2.3 Missing not at Random (**MNAR**)

### 2.4 Ignorability


## 9 References
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). `mice`: Multivariate imputation by chained equations in `R`. Journal of Statistical
Software, 45(3), 1–67. <https://doi.org/10.18637/jss.v045.i03>

Grund, S., Lüdtke, O., & Robitzsch, A. (2018). Multiple imputation of missing data for multilevel models: Simulations and
recommendations. Organizational Research Methods, 21(1), 111–149. <https://doi.org/10.1177/1094428117703686>

Kleinke, K., Reinecke, J., Salfrán, D., & Spiess, M. (2020). Applied multiple imputation: Advantages, pitfalls, new developments and
applications in `R`. Springer. <https://doi.org/10.1007/978-3-030-38164-6>

Rose, N. (2013). Item nonresponses in educational and psychological measurement [PhD Thesis, Friedrich-Schiller-Universität
Jena]. Open Access Thesis and Dissertations.
<https://www.db-thueringen.de/servlets/MCRFileNodeServlet/dbt_derivate_00027809/Diss/NormanRose.pdf>

Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592. <https://doi.org/10.1093/biomet/63.3.581>

Schafer, J. L. (1997). Analysis of incomplete multivariate data. Chapman & Hall; CRC.