OpenSDP data cleaning tutorials in R for education data analysts
OpenSDP Data Janitor Tutorial (R)

Nearly Unique

This example tutorial teaches how to implement decision rules in R when cleaning longitudinal data. You will start with a sample data file that is nearly unique at the student and school year level, and clean each variable until the data is internally consistent.

This tutorial is in the form of an R Markdown Notebook, and to run it you will need RStudio version 1.0 or higher. Download, unzip, and extract the files. Open the .Rmd file within RStudio and click the "Preview Notebook" button. Or, step through the code on your own by running each section of code and checking the results.

This tutorial was originally authored by the Strategic Data Project.

OpenSDP is an online, public repository of analytic code, tools, and training intended to foster collaboration among education analysts and researchers in order to accelerate the improvement of our school systems. The community is hosted by the Strategic Data Project, an initiative of the Center for Education Policy Research at Harvard University. We welcome contributions and feedback.