Skip to content

Importing Data to REDCap

Shawn Garbett edited this page Jan 25, 2023 · 1 revision

My initial experience with importing data to REDCap through the API was horrific. REDCap is strict about what it will accept for data, and in most cases, it only accepts the coded data. In cases where I received labelled data that needed to be imported to REDCap, I found it quite time consuming to properly encode the data so that REDCap would take it. More frustrating was that I felt the process was prone to error and that I might mistakenly assign the wrong codes to the data, thereby corrupting my data.

Illustrating the Danger

Suppose I have received data that is going to be used in a chart review. I have extracted surgical procedures from a database of medical records, and wish to import the surgical modality. The surgical modality is classified in REDCap as

a, Abdominal

b, Laparoscopic

c, Robotic

d, Single Port Robotic

If I try to pass the value 'Abdominal' to REDCap, the API returns the message "'Abdominal' is not a valid category." In the process of trying to convert from labelled to coded data, I could mistakenly reverse the codes for Abdominal and Laparoscopic and not realize my mistake until well into analysis.

It seemed to me that the import process was very machine oriented, and I wanted a human oriented process.

Making Imports Work for Humans

The underlying principle for the import method is that the data should be easy for a human to understand when submitted for import. It is easier to understand 'Robotic' than it is to understand 'c'. Additionally, I also wanted the method to handle coded data seamlessly, as I can anticipate receiving coded data as well.

In the end, importRecords is much more relaxed in the requirements of what data it will accept while still meeting the rigorous standards of the API. For the surgical methodology example, we may pass any of the values in the definition, coded or labelled, and know that the correct value is imported to REDCap. And the data we pass may even be a mix of coded and labelled data.

This flexible and mixed coding is especially advantageous in the case of checkboxes, which will accept the values '0', '1', 'Checked', 'Unchecked', or the labelled value from the data dictionary (this labelled value can be returned from the API using the checkboxLabel argument with REDCap version 6.0).

In instances where the package validation cannot decide what the appropriate code is, the value is changed to NA which prevents any value from going into REDCap. Thus, you should always review the log for data that may have failed to import.

Data validation details

The following details are also available in R via ?validateImport

Although the log messages will indicate a preference for dates to be in mm/dd/yyyy format, the function will accept mm/dd/yy, yyyy-mm-dd, yyyy/mm/dd, and yyyymmdd formats as well. When possible, pass dates as Date objects or POSIXct objects to avoid confusion. Dates are also compared to minimum and maximum values listed in the data dictionary. Records where a date is found out of range are allowed to import and a message is printed in the log.

For continuous/numeric variables, the values are checked against the minimum and maximum allowed in the data dictionary. Records where a value is found out of range are allowed to import and a message is printed in the log.

ZIP codes are tested to see if they fit either the 5 digit or 5 digit + 4 format. When these conditions are not met, the data point is deleted and a message printed in the log.

YesNo fields permit any of the values 'yes', 'no', '0', '1' to be imported to REDCap with 0=No, and 1=Yes. The values are converted to lower case for validation, so any combination of lower and upper case values will pass (ie, the data frame is not case-sensitive).

TrueFalse fields will accept 'TRUE', 'FALSE', 0, 1, and logical values and are also not case-sensitive.

Radio and dropdown fields may have either the coding in the data dictionary or the labels in the data dictionary. The validation will use the meta data to convert any matching values to the appropriate coding before importing to REDCap. Values that cannot be reconciled are deleted with a message printed in the log. These variables are case-sensitive.

Checkbox fields require a value of "Checked", "Unchecked", "0", "1", or the labelled value from the data dictionary. These are case sensitive. Values that do not match these are deleted with a warning printed in the log.

Phone numbers are required to be 10 digit numbers. The phone number is broken into three parts: 1) a 3 digit area code, 2) a 3 digit exchange code, and 3) a 4 digit station code. The exchange code must start with a number from 2-9, followed by 0-8, and then any third digit. The exchange code starts with a number from 2-9, followed by any two digits. The station code is 4 digits with no restrictions.

E-mail addresses are considered valid when they have three parts. The first part comes before the @ symbol, and may be any number of characters from a-z, A-Z, a period, underscore, percent, plus, or minus. The second part comes after the @, but before the period, and may consist of any number of letters, numbers, periods, or dashes. Finally, the string ends with a period then anywhere from 2 to 6 letters.