Skip to content

7. Organizing, preparing and processing metadata

Shelley Staples edited this page Nov 30, 2021 · 8 revisions

Organizing, preparing, and processing metadata are steps in corpus building that ultimately lead to a metadata spreadsheet (or spreadsheets) which contains information about the texts and participants who created the texts (e.g., course, assignment, student country of origin, student TOEFL scores). When organizing, preparing, and processing metadata, you need to take into account what participants you have information about (e.g., instructors, students, interviewers). You might also have information on other contextual variables, such as the courses in which assignments were completed or length of a timed exam.

This information is helpful to have on its own, as part of your dataset to keep track of information about participants. A metadata spreadsheet can also be used to add headers to files, change filenames, and as an aid the deidentification process. The metadata may be gathered from your university’s registrar, or a survey that participants take. Alternatively, if your filenames already contain metadata information, you can create a spreadsheet with metadata by extracting the information from the filenames. We will focus on the first use case.

Next steps

First, we provide guidance on gathering and preparing metadata from various sources in 7a. Gathering and preparing metadata. Next, we provide a script to combine metadata into one spreadsheet in 7b. Running the metadata processing script. You will add the metadata to your files in 8. Adding headers and changing filenames.

Navigating CIABATTA

Previous: 6b. Manually converting your data

Next: 7a. Gathering and preparing metadata