Skip to content
Vincent Major edited this page Nov 16, 2016 · 5 revisions

Welcome to the OLD MIMIC-ICU-filter-modules wiki!

New documentation is coming soon!

This wiki will describe the main modules and their application in the MIMIC-III database. Each module is written to be reusable and transferable to other use cases. The surrounding code that pieces together each module can be written in your language of choice but these modules are written in R.

a_diagnosis_groups_by_icd9_list

Script to preprocess the DIAGNOSIS_DATA_TABLE.csv by one vector of ICD-9 codes

Usage:

a_diagnosis_groups_by_icd9_list = function(icd9.codes.vector, path.raw){

Where,

  • icd.codes.vector is a data.frame or vector of the ICD-9 codes in either format e.g 123.45 or 12345. Leading zeros will be replaced if nchar < 3 e.g. 1 --> 001 and the function works in a hierarchical manner so that a code of 123 will include any code within the range 123.01-123.99.
  • path.raw is the subdirectory that contains the raw DIAGNOSIS_DATA_TABLE.csv file, e.g. "raw"

Output is the DIAGNOSIS table subsetted to the rows that include ICD-9 codes within the given list. The SUBJECT_ID or HADM_ID can then easily be extracted from the output data.frame.

b_ef_regex_from_raw_note

A script that given the raw, free-text echocardiogram report will search first for numeric instances of LVEF = X% and then at failure to do so revert to searching for cases of severe, moderate or mild reduced LV systolic function/hypokinesis or normal LVEF function.

Usage:

b_ef_regex_from_raw_note = function (temp_str)

Where,

  • temp_str is the large string containing the raw echocardiogram note with all grammar, newlines etc included.

Output is a vector of numeric EF values recovered (which I suggest is subsequently min and maxed for the extremes) OR in the case of text based severe/moderate/mild searches, the values represent the extremes of the clinically defined ranges.

b_ef_binning_3210

A function to take in numeric values of LVEF min and max and bin them into the clinically defined ranges

  • severe (3) EF < 30
  • moderate (2) 30 <= EF << 40
  • mild (1) 40 <= EF <= 50
  • normal (0) 50 < EF

Usage:

b_ef_binning_3210 = function(EFmin, EFmax)

Where,

  • EFmin is the minimum EF obtained from the b_ef_regex_from_raw_note function
  • EFmax is the maximum EF obtained from the b_ef_regex_from_raw_note function

Output is the integer 0, 1, 2, or 3 representing normal EF, mild, moderate or severely depressed LVEF/hypokinesis.

c_string_based_admission

Function that uses two predefined lists of strings to include and exclude patients by their unstructured string-based admission diagnosis.

Usage,

c_string_based_admission = function (ADMISS, in_list, ex_list)

Where,

  • ADMISS is a data.frame subset of the raw ADMISSIONS table that includes the DIAGNOSIS field.
  • in_list is a character vector of inclusion terms to match on
  • ex_list is a character vector of exclusion terms to match on

Output is a list of HADM_ID vectors for included, missed and excluded in that order i.e. list(included_HADM_ID,missed_HADM_ID,excluded_HADM_ID)

d_past_medical_history_regex_raw_note

A function to take in the raw discharge note in a data.frame subset of the raw NOTE table with CATEGORY == "Discharge". At least the input data.frame must have HADM_ID and TEXT fields. A regex process is used to extract the Past Medical History Section from the raw Discharge note. Unfortunately, instances of 'Prior Medical History' as well as 'HISTORY OF PRESENT ILLNESS' also occur and are included.

Usage,

d_past_medical_history_regex_raw_note = function(NOTE_HADM_from_file)

Where,

  • NOTE_HADM_from_file is a data.frame subset of the raw NOTE table with every row $CATEGORY == "Discharge".

Output is a data.frame of HADM_ID and Past Medical History snippets that can subsequently be regexed for strings of interest.

e_CHARTEVENTS_OASIS_extraction

A function to take in each separate OASIS parameter from a data.frame subset of CHARTEVENTS and extract the extreme and median values, over the first 24 hours, for latter use in e_chart_parameters_to_oasis_scores. Takes as input, the subsetted CHARTEVENTS table, the variable to inspect, and the ICUIN table for the time of ICU admission.

Usage,

e_CHARTEVENTS_OASIS_extraction = function(CHARTEVENTS, variable, ICUIN)

Where,

  • CHARTEVENTS is a data.frame subset of raw CHARTEVENTS of the patients of interest for only the parameter of interest (for the sake of memory).
  • variable is a string of the OASIS variable to inspect. Must be one of c('FiO2', 'HR', 'GCS', 'MAP', 'RR', 'Temp').

Output is a data.frame of OASIS parameter values, min, max and median that can then be redirected into e_chart_parameters_to_oasis_scores

e_chart_parameters_to_oasis_scores

A function to take in raw OASIS parameter values, as a table, and translate into scores based on the paper, Johnson et al. 2013, A New Severity of Illness Scale Using a Subset of Acute Physiology and Chronic Health Evaluation Data Elements Shows Comparable Predictive Accuracy.

Usage,

e_chart_parameters_to_oasis_scores = function(OASIS_table)

Where,

  • OASIS_table is a table of the OASIS score parameters, at least min and max. Fields that are required are age, GCS, HRmin, HRmax, MAPmin, MAPmax, RRmin, RRmax, Tmin, Tmax, urine, FiO2min, and elective_flag.

Output is a table of scores based on the thresholds from the Johnson 2013 paper.

f_fluids_administered_summation

A script that sums the fluids in, out, and urine out to a patient over a specified timerange with options for the minimum volume to include and whether or not to normalize if the time interval that fluids in are recorded is less that the specified timerange.

f_fluids_administered_summation = function(temp_INS, temp_OUTS, temp_urine, temp_ICUSTAY_INTIME, timerange, min_volume_to_include, normalize)

Where, temp_INS - table of INS for one HADM_ID to include all of

  • temp_OUTS - table of OUTS for one HADM_ID to include all of
  • temp_urine - table of urine OUTS for one HADM_ID to include all of
  • temp_ICUSTAY_INTIME - ICUSTAY$INTIME of the individual, the start time that timerange extends
  • timerange - the time interval to summ fluids over.
  • min_volume_to_include - If a minimum fluid volume is desired, in mL. For example large intraveneous fluids will all be > 250 mL
  • normalize - TRUE/FALSE value to determine whether or not to normalize the fluids over time for instances where fluids are given for times shorter than timerange.

Output is the summed fluids in, out and urine out over the timerange respectively as a list i.e. list(Vol_in, Vol_out, urine_out)