Skip to content

Case Study: Report of Fields With Missing Values

Shawn Garbett edited this page Jan 25, 2023 · 1 revision

Scenario

A research institution conducted a randomized trial to evaluate the effectiveness of a surgical technique. Before beginning data analysis, it was desired to know if there were patients for whom data collection was not complete. The research team wanted to ensure that all critical fields had been completed before beginning the analysis. But with 155 records, 477 fields, and 6 events, a manual audit of the data was impractical. The statistician was approached and asked if it were possible to create a report of all of the fields containing missing values for each patient-event.

Obstacles

  1. Branching Logic: When branching logic applies, fields do not appear on the data input form unless certain conditions apply. If the field does not appear, we expect the data to be missing and the field should not be included in the report.
  2. Applicability of Forms: Some forms may not apply to all patients. Most notably, the Adverse Event form for a patient is not filled out if the patient did not experience an adverse event. Fields in these scenarios should not be included in the report.
  3. Clinical Interest: Not all fields are of clinical interest to the researchers and may not be vital to the analysis. Only fields of clinical interest should be included in the report.

To complete the request, a function was written that can access the REDCap database and search for missing values. The function returns a data.frame object listing the patient ID, REDCap event name, data access group, number of missing fields, and the list of missing fields.

Obtaining the Function

The missingSummary function is not a formal part of the redcapAPI package, though it makes use of the package functionality. The function can be downloaded as a gist from GitHub. After saving the code to the location gist_location, it can be loaded into the R workspace using

source([gist_location])

Function Arguments

missingSummary is a generic function with an active method for redcapApiConnection objects. The core arguments are:

  • rcon a REDCap connection object generated by redcapConnection.
  • excludeMissingForms When true, this assumes that if a form contains only missing values, then it was intended that they are missing (such as no adverse events), and those fields are left off the report.

For redcapApiConnection objects, the user may also specify a proj, a redcapProjectInfo object (created by the redcapProjectInfo function) and the batch.size for limiting the number of record ID's pulled in any one batch.

Offline Arguments

If using the function in "offline mode" (meaning using the data downloads instead of the API), the rcon arguments is replaced by records and meta_data, which take the file paths of the raw/unlabelled data download and data dictionary, respectively.

Function Procedure

missingSummary operates by doing the following tasks:

  1. export records from REDCap.
  2. export meta data from REDCap.
  3. Translate REDCap branching logic to R expressions.
  4. Designate fields excluded by branching logic as non-missing.
  5. Designate fields excluded by unused forms as non-missing.
  6. Apply is.na to all fields.
  7. Produce the summary of results.

This procedure deals with obstacles 1 and 2. Obstacle 3 is dealt with outside of the function, and we will show how to do this later.

Generating the Summary

The basic summary that will accomplish the objectives of

library(redcapAPI)
library(stringr)
options(redcap_api_url = [REDCAP_API_ADDRESS])
rcon <- redcapConnection(token = [SUPER_SECRET_TOKEN])

Miss <- missingSummary(rcon)

The resulting summary produces the following data frame (only the first six rows are shown)

patient_id_incl redcap_event_name redcap_data_access_group n_missing missing
1 10-abc baseline_arm_1 dag1 3 reason, partner_satisfaction, lying
2 10-abc surgery_arm_1 dag1 0
3 10-abc 6_week_followup_arm_1 dag1 1 pt_initials_pod_7
4 10-abc 3_month_followup_arm_1 dag1 8 pain_interfere, reason, partner_satisfaction, overall_satisfaction, incon16, recur_incon17, bulk_agent18, urin_ret19
5 10-abc 6_month_followup_arm_1 dag1 0
6 10-abc 12_month_followup_arm_1 dag1 4 limit, reason, partner_satisfaction, incon16

Reducing the results to clinically interesting variables

Not all of the variables in the table above are of concern if they are missing. For instance, if intitals are missing from pt_initials_pod_7 at the six week follow-up, it won't affect the results of the analysis. In order to further reduce this table to just the clinically meaningful variables, we require additional information from the researchers.

In this case, the researchers provided a list of fields they felt were crucial to gather. With their list of fields, the following code was run:

varsToKeep <- c("hosp_6m", "er_visits_6_m", "clinic_6_m", ...
        [A lot of other field names],
        "prolapse", "sexual", "urinary_incont")                

Miss$missing <- as.character(Miss$missing)
Miss$label <- NA

for (i in 1:nrow(Miss)){
  tmp <- unlist(str_split(Miss$missing[i], ", "))
  tmp <- tmp[tmp %in% varsToKeep]
  lbl <- ifelse(length(tmp) > 0, paste(Hmisc::label(VAULT[, tmp]), collapse="\r\n"), "")
  tmp <- ifelse(length(tmp) > 0, paste(tmp, collapse="\r\n"), "")
  Miss$missing[i] <- tmp
  Miss$label[i] <- lbl
}

The label column was added for the convenience of the data entry staff--the field label is probably more familiar to them than the field name. The resulting data frame shows:

patient_id_incl redcap_event_name redcap_data_access_group n_missing missing label
1 10-abc baseline_arm_1 dag1 3
2 10-abc surgery_arm_1 dag1 0
3 10-abc 6_week_followup_arm_1 dag1 1
4 10-abc 3_month_followup_arm_1 dag1 8 incon16; recur_incon17; bulk_agent18; urin_ret19 18. Has the patient had surgery for new urinary incontinence since the study procedure ?; 19. Has the patient had surgery for recurrent or persistent urinary incontinence since the study procedure?; 20. Has the patient undergone bulking agent Injection (i.e. Collagen, Contigen, Durasphere, Coaptite) for post-op urinary incontinence since the study procedure?; 21. Does the patient have or has the patient had voiding dysfunction or urinary retention requiring intermittent self-catheritization or surgery (i.e. urethrolysis) since the study procedure. (Do not include catheterization less than 6 weeks postop.)
5 10-abc 6_month_followup_arm_1 dag1 0
6 10-abc 12_month_followup_arm_1 dag1 4 incon16 18. Has the patient had surgery for new urinary incontinence since the study procedure ?

Note: In the code, I specify a the characters \r\n to separate each field and field label. I replaced these with semi-colons for the wiki presentation. But the \r\n separator is useful when passing a CSV file back to the research because it will place each field on a separate line within the CSV, which makes for easy reading.