### Hematocrit, D-Dimer and CvO2 (SvO2/vO2)

A number of measurements indicative of various kinds of shock were not included in the MIMIC-Code output, therefore, in this notebook, we will extract these quantities manually from the raw MIMIC-III data files.

In [1]:
import numpy as np
import pandas as pd
from collections import defaultdict
from tqdm import tqdm

CHUNK_SIZE = 10000

### Convenience functions

As the files are in the orders of GBs we read in and process the files incrementally;

In [2]:
# reads file from path in chunks of size `chunksize`
def read_csv(path, usecols, chunksize=CHUNK_SIZE):
    for i, chunk in enumerate(pd.read_csv(path, usecols=usecols, encoding='latin1', engine='c', chunksize=chunksize)):
        yield i, chunk.reset_index(drop=True) # resets index so that indices range from 0 to chunksize - 1

### Patients

As the raw MIMIC-III database does not identify admissions by `ICUSTAY_ID`, rather by `SUBJECT_ID` (or more specific `hadm_id`), and subjects can be re-admitted several times, we need to check for each of our measurements to which admission it actually belongs by checking the time it was recorded and for which patient.

In [3]:
icustay_cols = ['ICUSTAY_ID', 'SUBJECT_ID', 'INTIME', 'OUTTIME']
icustays = pd.read_csv(r"D:/mimic-iii-clinical-database-1.4/ICUSTAYS.csv", usecols=icustay_cols)
icustays.head()

Unnamed: 0,SUBJECT_ID,ICUSTAY_ID,INTIME,OUTTIME
0,268,280836,2198-02-14 23:27:38,2198-02-18 05:26:11
1,269,206613,2170-11-05 11:05:29,2170-11-08 17:46:57
2,270,220345,2128-06-24 15:05:20,2128-06-27 12:32:29
3,271,249196,2120-08-07 23:12:42,2120-08-10 00:39:04
4,272,210407,2186-12-25 21:08:04,2186-12-27 12:01:13


### Hematocrit, S(c)vO2 and D-Dimer

Sources: 
- [D_LABITEMS.csv - MIMIC_III](https://physionet.org/content/mimiciii-demo/1.4/D_LABITEMS.csv)
- [D_ITEMS.csv - MIMIC_III](https://physionet.org/content/mimiciii-demo/1.4/D_ITEMS.csv)

#### Hematocrit, D-Dimer

In [4]:
dct = pd.read_csv('D:/mimic-iii-clinical-database-1.4/D_ITEMS.csv')
dct = dct[dct.LABEL.str.contains('SCVO', case=False).fillna(False)]
dct

Unnamed: 0,ROW_ID,ITEMID,LABEL,ABBREVIATION,DBSOURCE,LINKSTO,CATEGORY,UNITNAME,PARAM_TYPE,CONCEPTID
10265,14787,227542,ScvO2 (Presep) Calibrated,ScvO2 (Presep) Calibrated,metavision,datetimeevents,Hemodynamics,,Date time,
10269,14791,227549,ScvO2 (Presep),ScvO2 (Presep),metavision,chartevents,Hemodynamics,%,Numeric,
10601,14928,227806,ScvO2 (Presep) SQI,ScvO2 (Presep) SQI,metavision,chartevents,Hemodynamics,,Text,
11892,14408,226541,ScvO2 Central Venous O2% Sat,ScvO2 Central Venous O2% Sat,metavision,chartevents,Labs,,Numeric,


In [5]:
HEMATOCRIT = [
    227017, # -- Hematocrit_ApacheIV
    220545, # -- Hematocrit (serum)
    226540, # -- Hematocrit (whole blood - calc)
    226762, # -- HematocritApacheIIValue
    50810,  # -- Hematocrit, Calculated
    51480,  # -- Hematocrit
]

DDIMER = [
    1526,   # -- D-Dimer
    228240, # -- D-Dimer (SOFT)
    50915,  # -- D-Dimer
    51196,  # -- D-Dimer
]

SVO2 = [
    223772, # -- SvO2
    226541, # -- ScvO2 Central Venous O2% Sat (not the same but only 2-3% different)
    227549, # -- ScvO2 (Presep)
]

VALUENUM_RANGES = {
    '<500': 500,
    '500-1000': 750,
    '1000-2000': 1500,
    '>2000': 2000,
    '>10000': 10000,
    'GREATER THAN 10,000': 10000,
    'GREATER THAN 10000': 10000,
    'GREATER THAN 21000': 21000
}

def range_to_valuenum(string):
    """ tries to parse the VALUE column of LABEVENTS.csv 
    (with an emphasis on trying)
    """
    # If it can be parsed as a float, return as float
    try:
        return float(string)
    except:
        pass
    
    # If it is one of a known value range, return centroid of range
    if string in VALUENUM_RANGES:
        return VALUENUM_RANGES[string]
    
    # If ERROR or NETWORK TIMEOUT drop value
    return np.NaN

In [None]:
results = []

for _, chunk in tqdm(read_csv(r"D:/mimic-iii-clinical-database-1.4/LABEVENTS.csv", usecols=['SUBJECT_ID', 'HADM_ID', 'ITEMID', 'VALUEUOM', 'CHARTTIME', 'VALUE', 'VALUENUM'])):
    # Link lab results to their corresponding icustay_ids (using subject_id and ICU stay time window)
    chunk = icustays.merge(chunk, on='SUBJECT_ID', how='inner')
    chunk = chunk[(chunk.CHARTTIME > chunk.INTIME) & (chunk.CHARTTIME < chunk.OUTTIME)].copy()
    
    # Drop admissions where itemids are not of interest
    chunk = chunk[chunk.ITEMID.isin(HEMATOCRIT + DDIMER)]
    
    chunk['LABID'] = ''
    chunk.loc[chunk.ITEMID.isin(HEMATOCRIT), 'LABID'] = 'HEMATOCRIT'
    chunk.loc[chunk.ITEMID.isin(DDIMER), 'LABID'] = 'D-DIMER'
    chunk.loc[chunk.ITEMID.isin(SVO2), 'LABID'] = 'SVO2'
    
    # Map non-integer VALUE column of D-Dimer to values
    chunk.loc[chunk.ITEMID == 50915, 'VALUENUM'] = chunk.VALUE.transform(lambda x: range_to_valuenum(x))
    chunk.loc[chunk.ITEMID == 51196, 'VALUENUM'] = chunk.VALUE.transform(lambda x: range_to_valuenum(x))
    
    chunk = pd.DataFrame({
        'icustay_id': chunk.ICUSTAY_ID,
        'charttime': chunk.CHARTTIME,
        'lab_id': chunk.LABID,
        'valuenum': chunk.VALUENUM
    })
    results.append(chunk)

523it [00:24, 22.30it/s]

---

In [None]:
for _, chunk in tqdm(read_csv(r"D:/mimic-iii-clinical-database-1.4/CHARTEVENTS.csv", usecols=['SUBJECT_ID', 'ITEMID', 'CHARTTIME', 'VALUENUM'])):
    # Limit measurements to CvO2
    chunk = chunk[chunk.ITEMID.isin(HEMATOCRIT + DDIMER + SVO2)].copy()
    
    chunk['LABID'] = ''
    chunk.loc[chunk.ITEMID.isin(HEMATOCRIT), 'LABID'] = 'HEMATOCRIT'
    chunk.loc[chunk.ITEMID.isin(DDIMER), 'LABID'] = 'D-DIMER'
    chunk.loc[chunk.ITEMID.isin(SVO2), 'LABID'] = 'SVO2'
    
    if not len(chunk):
        continue
    
    # Link vitals to their corresponding icustay_ids (using subject_id and ICU stay time window)
    chunk = icustays.merge(chunk, on='SUBJECT_ID', how='inner')
    chunk = chunk[(chunk.CHARTTIME > chunk.INTIME) & (chunk.CHARTTIME < chunk.OUTTIME)].copy()
    
    chunk = pd.DataFrame({
        'icustay_id': chunk.ICUSTAY_ID,
        'charttime': chunk.CHARTTIME,
        'lab_id': chunk.LABID,
        'valuenum': chunk.VALUENUM
    })
    results.append(chunk)

In [None]:
# merge!
results = pd.concat(results, axis=0).reset_index(drop=True)

# save!
results.to_csv('final/hematocrit_d-dimer_svo2.csv', index=False)