# Notebook to Convert AmsterdamUMCdb data files to MIMIC-III Format

Here we will be converting the [AmsterdamUMCdb](https://github.com/AmsterdamUMC/AmsterdamUMCdb) data files to the MIMIC-III data file format as generated by [MIMIC-Code](https://github.com/MIT-LCP/mimic-code). We do this as to allow the exact same preprocessing pipeline to be applied to both MIMIC and the AmsterdamUMCdb.

In [21]:
import re
import numpy as np
import pandas as pd
from tqdm import tqdm

CHUNK_SIZE = 20000
MAX_CHUNKS = 1000   # kept small for debugging

OUT_DIR = 'final/'

## I/O and Formatting

As the files are in the order of tens of GBs, we read in and process the files incrementally, i.e. in chunks;

In [2]:
def read_csv(path, usecols, chunksize=CHUNK_SIZE):
    """ Reads file limited to columns in `usecols` from path in chunks of size `chunksize` """
    for i, chunk in enumerate(pd.read_csv(path, usecols=usecols, encoding='latin1', engine='c', chunksize=chunksize)):
        yield i, chunk.reset_index(drop=True) # resets index so that indices range from 0 to chunksize - 1

Also, we use a progressbar which shows the number of admissions processed so far, how many there are still to go and the estimated time left;

In [3]:
def pbar(iterator, total_admissions=23106):
    # Keep track of admissions already seen
    processed_admissions = set()
    
    with tqdm(total=total_admissions) as progress_bar:
        for i, chunk in iterator:
            
            # count number of new admissions not yet seen
            # we do it only if last admission is new (saves time checking)
            if chunk.admissionid.values[-1] not in processed_admissions:
                new_admissions = set(chunk.admissionid) - processed_admissions
                processed_admissions.update(new_admissions)
            
                # update progress bar
                progress_bar.update(len(new_admissions))
                
            yield i, chunk

---
## Times
Times, e.g. `administeredat` of a vital measurement, are defined in milliseconds relative to the first admission of that same patient (which itself has a starttime of 0ms). As this can be hard to reason over, we convert charttimes in the dataset to absolute timestamps (of the form `YYYY-MM-DD HH:mm:ss.xxxxxx`) and define the start of each first admission as the time of running this notebook (this can be arbitrarily chosen);

In [4]:
# Start times of admissions relative to the first admission (with start = 0)
start_of_admission_ms = dict()
for _, chunk in read_csv(r"D:/AmsterdamUMCdb-v1.0.2/admissions.csv", usecols=['admissionid', 'admittedat']):
    # admissionid + admission time
    start_of_admission_ms.update(dict(zip(chunk.admissionid, chunk.admittedat)))

print('Time since last admission: %.2f years' % (start_of_admission_ms[5489] / (1000 * 60 * 60 * 24 * 365)))

Time since last admission: 1.51 years


#### Convenience functions

In [5]:
# unit conversions
def hours_to_ms(hours):
    return hours * 1000 * 3600

def ms_to_hours(ms):
    return ms / (1000 * 3600)

In [6]:
# creates a timestamp by adding relative miliseconds to some chosen starttime (e.g. chosen time stamp of start first admission)
def to_timestamp(starttime, ms):       
    return starttime + pd.to_timedelta(arg=ms, unit='ms')

# computes time in hours within current admission from ms relative to first admission
def hours_in_admission(ms, admissionid):
    ms_since_first_admission = admissionid.transform(lambda x: start_of_admission_ms[x] if x in start_of_admission_ms else 0)
    return ms_to_hours(ms - ms_since_first_admission)

In [7]:
# for all first admissions, assume start date is now
first_admission_start = pd.Timestamp.now()

print('Admission start:  ', first_admission_start)
print('Two hours in:     ', to_timestamp(first_admission_start, ms=hours_to_ms(2)))

Admission start:   2022-11-18 18:01:52.896987
Two hours in:      2022-11-18 20:01:52.896987


---

## Patient cohort

We define our cohort as all patient admitted to the ICU, therefore no further filtering is performed here

In [9]:
cohort_df = []
for i, chunk in read_csv(r"D:/AmsterdamUMCdb-v1.0.2/admissions.csv", usecols=['admissionid', 'admittedat', 'destination']):
    # infer whether patient passed away in-hospital (1) or discharged (0)
    expire_flag = chunk.destination == 'Overleden'
    
    # determine start- and end-time of admission in hours
    # Note: As a proxy for the start of first admission we use the time of running this notebook
    window_start = to_timestamp(first_admission_start, ms=chunk.admittedat)
    window_end = to_timestamp(first_admission_start, ms=chunk.admittedat + hours_to_ms(72))
    
    chunk_df = pd.DataFrame({
        'icustay_id': chunk.admissionid,
        'window_start': window_start,
        'window_end': window_end,
        'hospital_expire_flag': expire_flag.astype(int),
    })
    cohort_df.append(chunk_df)
    
# merge
cohort_df = pd.concat(cohort_df, axis=0).reset_index(drop=True)
cohort_df.head()

Unnamed: 0,icustay_id,window_start,window_end,hospital_expire_flag
0,0,2022-11-18 18:01:52.896987,2022-11-21 18:01:52.896987,0
1,1,2022-11-18 18:01:52.896987,2022-11-21 18:01:52.896987,0
2,2,2022-11-18 18:01:52.896987,2022-11-21 18:01:52.896987,0
3,3,2022-11-18 18:01:52.896987,2022-11-21 18:01:52.896987,0
4,4,2022-11-18 18:01:52.896987,2022-11-21 18:01:52.896987,0


In [10]:
# save!
cohort_df.to_csv(OUT_DIR + 'cohort.csv', index=False)

----

## Demographics

### Mechanical ventilation

Source: [mechanical_ventilation.sql](https://github.com/AmsterdamUMC/AmsterdamUMCdb/blob/b6f295fbfbd9a1f8f9cfa71dab58c670120543d8/amsterdamumcdb/sql/lifesupport/mechanical_ventilation.sql)

In [11]:
# Combinations of itemids and valueids in listitems.csv indicating
# use of invasive or non-invasive mechanical ventilation
MECH_VENT_SETTINGS = [
    (9534, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)),                    # --Type beademing Evita 1
    (6685, (1, 3, 5, 6, 8, 9, 10, 11, 12, 13, 14, 20, 22)),                 # --Type Beademing Evita 4
    (8189, (16,)),                                                          # --Toedieningsweg O2
    (12290, (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)),  # --Ventilatie Mode (Set) Servo-I and Servo-U ventilators
    (12347, (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)),  # --Ventilatie Mode (Set) (2) Servo-I and Servo-U ventilators
    (12376, (1, 2)),                                                        # --Mode (Bipap Vision)
]    

In [12]:
# to help check whether some (itemid, valueid) pair in listitems indicates use of ventilator
class Ventilation():
    def __init__(self, mech_vent):
        """ Helper class to check for mechanical ventilation
        """
        self._vent_settings = set()
        for itemid, valueids in mech_vent:
            self._vent_settings.update([(itemid, valueid) for valueid in valueids])
            
    def __call__(self, itemids, valueids):
        """ Checks if (itemid, valueid) pair is a valid ventilator settings """
        return np.array([x in self._vent_settings for x in list(zip(itemids, valueids))])
    
is_mech_vent = Ventilation(MECH_VENT_SETTINGS)

For each admission, see if mechanical ventilation is used in the first X hours of admission (defined by `START_OF_ADMISSION`) (see Roggeveen et al., 2021)

In [13]:
admissions_with_vent = set()

for _, chunk in pbar(read_csv(r"D:/AmsterdamUMCdb-v1.0.2/listitems.csv", usecols=['admissionid', 'itemid', 'valueid', 'measuredat'])):
    # which observations are within 12 hours of start admission?
    times = hours_in_admission(chunk.measuredat, chunk.admissionid)
    within_12h = (times > 0) & (times < 12)
    
    # which of those admissions indicate the use of ventilatory support?
    adm_with_vent = chunk[within_12h & is_mech_vent(chunk.itemid, chunk.valueid)].admissionid
    
    admissions_with_vent.update(adm_with_vent)
        
print('Number of admissions with ventilation:', len(admissions_with_vent))

100%|███████████████████████████████████████████████████████████████████████████| 23106/23106 [01:31<00:00, 251.60it/s]

Number of admissions with ventilation: 16098





### Glasgow Coma Scale
Sources: 
- Glasgow Coma Scale: [gcs.sql](https://github.com/AmsterdamUMC/AmsterdamUMCdb/blob/b6f295fbfbd9a1f8f9cfa71dab58c670120543d8/amsterdamumcdb/sql/common/gcs.sql)
- GCS definition: https://www.mdcalc.com/calc/64/glasgow-coma-scale-score-gcs

In [15]:
## credits Patrick Thoral

def gcs_eyes_score(df):
    df.loc[df.itemid == 6732, 'eyes_score'] = 5 - df.valueid   # --Actief openen van de ogen
    df.loc[df.itemid == 13077, 'eyes_score'] = df.valueid      # --A_Eye
    df.loc[df.itemid == 14470, 'eyes_score'] = df.valueid - 4  # --RA_Eye
    df.loc[df.itemid == 16628, 'eyes_score'] = df.valueid - 4  # --MCA_Eye
    df.loc[df.itemid == 19635, 'eyes_score'] = df.valueid - 4  # --E_EMV_NICE_24uur
    df.loc[df.itemid == 19638, 'eyes_score'] = df.valueid - 8  # --E_EMV_NICE_Opname
    return df
    
def gcs_motor_score(df):
    df.loc[df.itemid == 6734, 'motor_score'] = 7 - df.valueid   # --Beste motore reactie van de armen
    df.loc[df.itemid == 13072, 'motor_score'] = df.valueid      # --A_Motoriek
    df.loc[df.itemid == 14476, 'motor_score'] = df.valueid - 6  # --RA_Motoriek
    df.loc[df.itemid == 16634, 'motor_score'] = df.valueid - 6  # --MCA_Motoriek
    df.loc[df.itemid == 19636, 'motor_score'] = df.valueid - 6  # --M_EMV_NICE_24uur
    df.loc[df.itemid == 19639, 'motor_score'] = df.valueid - 12 # --M_EMV_NICE_Opname
    return df

def gcs_verbal_score(df):
    df.loc[df.itemid == 6735, 'verbal_score'] = 6 - df.valueid   # --Beste verbale reactie
    df.loc[df.itemid == 13066, 'verbal_score'] = df.valueid      # --A_Verbal
    df.loc[df.itemid == 14482, 'verbal_score'] = df.valueid - 5  # --RA_Verbal
    df.loc[df.itemid == 16640, 'verbal_score'] = df.valueid - 5  # --MCA_Verbal
    df.loc[df.itemid == 19637, 'verbal_score'] = df.valueid - 9  # --V_EMV_NICE_24uur
    df.loc[df.itemid == 19640, 'verbal_score'] = df.valueid - 15 # --M_EMV_NICE_Opname
    
    # cap verbal score to at least 1
    df.loc[df['verbal_score'] < 1, 'verbal_score'] = 1
    return df

In [16]:
gcs_on_admission = dict()

for i, chunk in pbar(read_csv(r"D:/AmsterdamUMCdb-v1.0.2/listitems.csv", usecols=['admissionid', 'itemid', 'valueid', 'measuredat'])): 
    # which observations are within 12h of start admission?
    times = hours_in_admission(chunk.measuredat, chunk.admissionid)
    chunk = chunk[(times > 0) & (times < 12)].copy()
    
    # component scores
    chunk = gcs_eyes_score(chunk)
    chunk = gcs_motor_score(chunk)
    chunk = gcs_verbal_score(chunk)
        
    # GCS = eye + motor + verbal score
    gcs = chunk.groupby('admissionid').first()[['eyes_score', 'motor_score', 'verbal_score']].sum(axis=1, min_count=3) # need all three component scores else NaN!
    
    # add gcs scores for admissions we could compute it for
    gcs = gcs[gcs.notna()].to_dict()
    gcs_on_admission.update(gcs)

100%|███████████████████████████████████████████████████████████████████████████| 23106/23106 [01:41<00:00, 227.81it/s]


In [17]:
print('GCS on start of admission 11:', gcs_on_admission[11])
print('Admissions with available GCS scores:', len(gcs_on_admission))

GCS on start of admission 11: 3.0
Admissions with available GCS scores: 12522


### Estimating FiO2

For computing the SOFA score we need to have an estimate of the fraction of inspired oxygen (FiO2). As this is not readily known for patients not on ventilatory support we must estimate it from the O2 flow and oxygen device used (see [sofa.ipynb](https://github.com/AmsterdamUMC/AmsterdamUMCdb/blob/master/concepts/severityscores/sofa.ipynb) and the data dictionary by [Stichting NICE](https://www.stichting-nice.nl/dd/#468));

#### Oxygen device settings, PaCo2 and PaO2

In [18]:
OXY_FLOW_NO_SUPP = [
    8845,  # -- O2 l/min
    10387, # --Zuurstof toediening (bloed)
    18587  # --Zuurstof toediening
]

OXY_FLOW_RESP_SUPP = [
    6699,  # --FiO2 %: setting on Evita ventilator
    12279, # --O2 concentratie --measurement by Servo-i/Servo-U ventilator
    12369, # --SET %O2: used with BiPap Vision ventilator
    16246  # --Zephyros FiO2: Non-invasive ventilation
]

PAO2 = [
    7433, # --PO2
    9996, # --PO2 (bloed)
    21214 # --PO2 (bloed) - kPa
]

PACO2 = [
    6846, # --PCO2
    9990, # --pCO2 (bloed)
    21213 # --PCO2 (bloed) - kPa
]

In [22]:
## Oxygen device and O2 flow from numericitems
oxy_devices = []
pao2 = []
paco2 = []

for i, chunk in pbar(read_csv(r"D:/AmsterdamUMCdb-v1.0.2/numericitems.csv", usecols=['admissionid', 'itemid', 'unitid', 'value', 'measuredat'])):    
    # Oxygen device settings
    oxy_chunk = chunk[chunk.itemid.isin(OXY_FLOW_NO_SUPP + OXY_FLOW_RESP_SUPP)]
    if len(oxy_chunk) > 0:
        oxy_devices.append(oxy_chunk)
        
    # Convert PaO2 units from kPa to mmHg
    chunk.loc[chunk.unitid == 152, 'value'] = chunk.value * 7.50061683
    
    # PaO2
    pao2_chunk = chunk[chunk.itemid.isin(PAO2)].rename({'value': 'pao2'}, axis=1)
    if len(pao2_chunk) > 0:
        pao2.append(pao2_chunk)
    
    # PaCO2
    paco2_chunk = chunk[chunk.itemid.isin(PACO2)].rename({'value': 'paco2'}, axis=1)
    if len(paco2_chunk) > 0:
        paco2.append(paco2_chunk)
        
    if i > MAX_CHUNKS:
        break
    
# merge 
oxy_devices = pd.concat(oxy_devices, axis=0)
pao2 = pd.concat(pao2, axis=0)
paco2 = pd.concat(paco2, axis=0)

  2%|█▌                                                                            | 456/23106 [00:39<32:29, 11.62it/s]


In [23]:
print('Oxygen devices:')
oxy_devices.head()

Oxygen devices:


Unnamed: 0,admissionid,itemid,value,unitid,measuredat
525,0,8845,6.0,26,60720000
526,0,8845,6.0,26,65520000
527,0,8845,10.0,26,69120000
528,0,8845,15.0,26,69180000
529,0,8845,15.0,26,72720000


In [24]:
print('PaO2:')
pao2.head()

PaO2:


Unnamed: 0,admissionid,itemid,pao2,unitid,measuredat
729,0,9996,71.0,173,-978000000
730,0,9996,90.0,173,20520000
731,0,9996,149.0,173,25920000
732,0,9996,104.0,173,33120000
733,0,9996,32.0,173,36720000


In [25]:
print('PaCO2:')
paco2.head()

PaCO2:


Unnamed: 0,admissionid,itemid,paco2,unitid,measuredat
683,0,9990,41.0,173,-978000000
684,0,9990,39.0,173,20520000
685,0,9990,37.0,173,25920000
686,0,9990,36.0,173,33120000
687,0,9990,43.0,173,36720000


#### Estimating FiO2

In [26]:
# For each admission estimate FiO2 from Oxygen Flow settings without respiratory support
estimated_fio2 = []

for i, chunk in pbar(read_csv(r"D:/AmsterdamUMCdb-v1.0.2/listitems.csv", usecols=['admissionid', 'itemid', 'valueid', 'measuredat'])): 
    ## -- Toedieningsweg (Oxygen device)
    chunk = chunk[chunk.itemid == 8189]
    if len(chunk) == 0:
        continue
    
    # merge valueid of oxy_devices by admission and measurementtime
    chunk = chunk.merge(oxy_devices, on=['admissionid', 'measuredat'], how='inner').copy().reset_index(drop=True)
    if len(chunk) == 0:
        continue
    
    # Set all FiO2 to their expected "regular air" baseline
    chunk.loc[:, 'fio2'] = 0.21
        
    # Option 1: Patient received ventilatory support, thus FiO2 can be read off from settings
    fio2_known = chunk.itemid_y.isin(OXY_FLOW_RESP_SUPP)
    chunk.loc[fio2_known, 'fio2'] = chunk.value.fillna(0.21)
    
    # Option 2: Patient did not receive vent. support, thus we must estimate FiO2
    # Source: https://github.com/AmsterdamUMC/AmsterdamUMCdb/blob/master/amsterdamumcdb/sql/common/pO2_FiO2_estimated.sql
    cat1 = chunk.valueid.isin([2, 7])
    chunk.loc[~fio2_known & cat1 & (chunk.value >= 1) & (chunk.value < 2), 'fio2'] = 0.22
    chunk.loc[~fio2_known & cat1 & (chunk.value >= 2) & (chunk.value < 3), 'fio2'] = 0.25
    chunk.loc[~fio2_known & cat1 & (chunk.value >= 3) & (chunk.value < 4), 'fio2'] = 0.27
    chunk.loc[~fio2_known & cat1 & (chunk.value >= 4) & (chunk.value < 5), 'fio2'] = 0.30
    chunk.loc[~fio2_known & cat1 & (chunk.value >= 5), 'fio2'] = 0.35
    
    cat2 = chunk.valueid.isin([1, 3, 8, 9, 4, 18, 19])
    chunk.loc[~fio2_known & cat2 & (chunk.value >= 1) & (chunk.value < 2), 'fio2'] = 0.22
    chunk.loc[~fio2_known & cat2 & (chunk.value >= 2) & (chunk.value < 3), 'fio2'] = 0.25
    chunk.loc[~fio2_known & cat2 & (chunk.value >= 3) & (chunk.value < 4), 'fio2'] = 0.27
    chunk.loc[~fio2_known & cat2 & (chunk.value >= 4) & (chunk.value < 5), 'fio2'] = 0.30
    chunk.loc[~fio2_known & cat2 & (chunk.value >= 5) & (chunk.value < 6), 'fio2'] = 0.35
    chunk.loc[~fio2_known & cat2 & (chunk.value >= 6) & (chunk.value < 7), 'fio2'] = 0.40
    chunk.loc[~fio2_known & cat2 & (chunk.value >= 7) & (chunk.value < 8), 'fio2'] = 0.45
    chunk.loc[~fio2_known & cat2 & (chunk.value >= 8), 'fio2'] = 0.50
    
    cat3 = chunk.valueid.isin([10, 11, 13, 14, 15, 16, 17])
    chunk.loc[~fio2_known & cat3 & (chunk.value >= 6) & (chunk.value < 7), 'fio2'] = 0.60
    chunk.loc[~fio2_known & cat3 & (chunk.value >= 7) & (chunk.value < 8), 'fio2'] = 0.70
    chunk.loc[~fio2_known & cat3 & (chunk.value >= 8) & (chunk.value < 9), 'fio2'] = 0.80
    chunk.loc[~fio2_known & cat3 & (chunk.value >= 9) & (chunk.value < 10), 'fio2'] = 0.85
    chunk.loc[~fio2_known & cat3 & (chunk.value >= 10), 'fio2'] = 0.90
    
    estimated_fio2.append(chunk[['admissionid', 'measuredat', 'fio2']])
    
# merge 
estimated_fio2 = pd.concat(estimated_fio2, axis=0)
estimated_fio2.head()

100%|███████████████████████████████████████████████████████████████████████████| 23106/23106 [03:05<00:00, 124.36it/s]


Unnamed: 0,admissionid,measuredat,fio2
0,0,60720000,0.35
1,0,65520000,0.35
2,0,69120000,0.5
3,0,69180000,0.9
4,0,72720000,0.9


#### Merge with PaO2 and PaCO2 (for PF-ratio)

In [27]:
# merge PaO2, PaCO2 into one DataFrame by admission and measurement time
merged_pao2_paco2 = pao2.merge(paco2, on=['admissionid', 'measuredat'], how='inner', suffixes=('_pao2', '_paco2'))

# merge with FiO2 within window of -60 min to 15 min from a PaO2/PaCO2 measurement
merged_fio2 = merged_pao2_paco2.merge(estimated_fio2, on='admissionid', how='inner', suffixes=('', '_fio2'))

merged_fio2 = merged_fio2[
    (merged_fio2.measuredat_fio2 > (merged_fio2.measuredat - 60 * 60 * 1000)) &\
    (merged_fio2.measuredat_fio2 < (merged_fio2.measuredat + 15 * 60 * 1000))
]
merged_fio2.head()

Unnamed: 0,admissionid,itemid_pao2,pao2,unitid_pao2,measuredat,itemid_paco2,paco2,unitid_paco2,measuredat_fio2,fio2
154,0,9996,72.0,173,61920000,9990,43.0,173,60720000,0.35
178,0,9996,65.0,173,69120000,9990,45.0,173,69120000,0.5
179,0,9996,65.0,173,69120000,9990,45.0,173,69180000,0.9
203,0,9996,122.0,173,76320000,9990,42.0,173,76320000,0.9
227,0,9996,103.0,173,83520000,9990,48.0,173,83520000,0.9


### Severity Scores
SIRS sources:
- Temperature: [temperature.sql](https://github.com/AmsterdamUMC/AmsterdamUMCdb/blob/b6f295fbfbd9a1f8f9cfa71dab58c670120543d8/amsterdamumcdb/sql/common/temperature.sql)
- Heart rate: [heart_rate.sql](https://github.com/AmsterdamUMC/AmsterdamUMCdb/blob/b6f295fbfbd9a1f8f9cfa71dab58c670120543d8/amsterdamumcdb/sql/common/heart_rate.sql)
- Respiratory rate: [resp_rate.sql](https://github.com/AmsterdamUMC/AmsterdamUMCdb/blob/b6f295fbfbd9a1f8f9cfa71dab58c670120543d8/amsterdamumcdb/sql/common/resp_rate.sql)
- WBC/Leukocytes: [wbc.sql](https://github.com/AmsterdamUMC/AmsterdamUMCdb/blob/b6f295fbfbd9a1f8f9cfa71dab58c670120543d8/amsterdamumcdb/sql/common/wbc.sql)
- SIRS definition: https://www.mdcalc.com/calc/1096/sirs-sepsis-septic-shock-criteria

SOFA sources:
- PaO2/FiO2: [pO2_pCO2_FiO2.sql](https://github.com/AmsterdamUMC/AmsterdamUMCdb/blob/b6f295fbfbd9a1f8f9cfa71dab58c670120543d8/amsterdamumcdb/sql/common/pO2_pCO2_FiO2.sql)
- Ventilators: [mechanical_ventilation.sql](https://github.com/AmsterdamUMC/AmsterdamUMCdb/blob/b6f295fbfbd9a1f8f9cfa71dab58c670120543d8/amsterdamumcdb/sql/lifesupport/mechanical_ventilation.sql)
- Platelets: [platelets.sql](https://github.com/AmsterdamUMC/AmsterdamUMCdb/blob/b6f295fbfbd9a1f8f9cfa71dab58c670120543d8/amsterdamumcdb/sql/common/platelets.sql)
- Glasgow Coma Scale: [gcs.sql](https://github.com/AmsterdamUMC/AmsterdamUMCdb/blob/b6f295fbfbd9a1f8f9cfa71dab58c670120543d8/amsterdamumcdb/sql/common/gcs.sql)
- Bilirubin: [bilirubin.sql](https://github.com/AmsterdamUMC/AmsterdamUMCdb/blob/b6f295fbfbd9a1f8f9cfa71dab58c670120543d8/amsterdamumcdb/sql/common/bilirubin.sql)
- SOFA definition: https://www.mdcalc.com/calc/691/sequential-organ-failure-assessment-sofa-score

In [30]:
TEMPERATURE = [
    8658,  # --Temp Bloed
    8659,  # --Temperatuur Perifeer 2
    8662,  # --Temperatuur Perifeer 1
    13058, # --Temp Rectaal
    13059, # --Temp Lies
    13060, # --Temp Axillair
    13061, # --Temp Oraal
    13062, # --Temp Oor
    13063, # --Temp Huid
    13952, # --Temp Blaas
    16110  # --Temp Oesophagus
]

HEARTRATE = [
    6640   # --Hartfrequentie
]

RESP_RATE = [
    8873,  # --Ademfrequentie Evita: measurement by Evita ventilator, most accurate
    12266, # --Ademfreq.: measurement by Servo-i/Servo-U ventilator, most accurate
    8874   # --Ademfrequentie Monitor: measurement by patient monitor using ECG-impedance, less accurate  
]

WBC = [
    6779,  # --Leucocyten 10^9/l
    9965   # --Leuco's (bloed) 10^9/l
]

BANDS = [
    11586  # -- Staaf % (bloed)
]

## PaO2, PaCO2 and FiO2 were previously estimated :)

PLATELETS = [
    9964,  # --Thrombo's (bloed)
    6797,  # --Thrombocyten
    10409, # --Thrombo's citr. bloed (bloed)
    14252  # --Thrombo CD61 (bloed)   
]

BILIRUBIN = [
    6813,  # --Bili Totaal
    9945   # --Bilirubine (bloed)
]

MEANBP = [
    6642,  # --ABP gemiddeld
    6679,  # --Niet invasieve bloeddruk gemiddeld
    8843   # --ABP gemiddeld II
]

CREATININE = [
    6836,  # --Kreatinine µmol/l (erroneously documented as µmol)
    9941,  # --Kreatinine (bloed) µmol/l
    14216  # --KREAT enzym. (bloed) µmol/l   
]

For each chunk of the `numericitems.csv` with a few admissions (actually quite a few), we will extract the above parameters (i.e. temperature, heart rate, pCO2, leukocyte count, etc.) and infer their earliest values within the admission. We follow Roggeveen et al. and limit these values to at most 8 hours after the start of admission. We then compute the SIRS and SOFA scores from those parameters.

#### SIRS

In [31]:
def compute_sirs(resp_rate, paco2, temp, heart_rate, wbc, bands):
    """ Computes Systemic inflammatory response syndrome (SIRS) score given vital parameters 
    For details, see: http://www.mdcalc.com/calc/1096/sirs-sepsis-septic-shock-criteria
    
    :param *kwargs: dicts mapping from admissionids to vital parameters at the beginning of the admission (if known)
    """
    # which admissions have available the data to estimate SIRS?
    admissionids = temp.keys() & heart_rate.keys() & (resp_rate.keys() | paco2.keys()) & (wbc.keys() | bands.keys())
    
    # compute SIRS for eligible admissions
    out = dict()
    for admissionid in admissionids:
        score = int(temp[admissionid] < 36 or temp[admissionid] > 38)
        score += int(heart_rate[admissionid] > 90)
        score += int(resp_rate[admissionid] > 20) if admissionid in resp_rate else int(paco2[admissionid] < 32)
        score += int(wbc[admissionid] > 12 or wbc[admissionid] < 4) if admissionid in wbc else int(bands[admissionid] > 10)
        
        out[admissionid] = score
    return out

#### SOFA

In [32]:
def compute_sofa(po2, fio2, vent, platelets, gcs, bilirubin, meanbp, creatinine):
    """ Computes Sequential Organ Failure Assessment (SOFA) score given vital parameters 
    For details, see: https://files.asprtracie.hhs.gov/documents/aspr-tracie-sofa-score-fact-sheet.pdf
    
    :param *kwargs: dicts mapping from admissionids to vital parameters at the beginning of the admission (if known)
    """
    # which admissions have available the data to estimate SOFA?
    admissionids = po2.keys() & fio2.keys() & vent.keys() & platelets.keys() & bilirubin.keys() & meanbp.keys() & creatinine.keys() & gcs.keys()
    
    # compute SIRS for eligible admissions
    out = dict()
    for admissionid in admissionids:
        score = 0
        
        # respiratory
        pf_ratio = po2[admissionid] / fio2[admissionid]
        if pf_ratio < 100 and vent[admissionid]:
            score += 4
        elif pf_ratio < 200 and vent[admissionid]:
            score += 3
        elif pf_ratio < 300:
            score += 2
        elif pf_ratio < 400:
            score += 1
            
        # coagulation
        coag = platelets[admissionid]
        if coag < 20:
            score += 4
        elif coag < 50:
            score += 3
        elif coag < 100:
            score += 2
        elif coag < 150:
            score += 1
            
        # liver
        bili = bilirubin[admissionid] 
        if bili >= 12:
            score += 4
        elif 6 <= bili < 12:
            score += 3
        elif 2 <= bili < 6:
            score += 2
        elif 1.2 <= bili < 2:
            score += 1
            
        # cardiovascular
        # TODO: add vasopressor agents
        blood_pressure = meanbp[admissionid] 
        if blood_pressure < 70:
            score += 1
        
        # central nervous system
        gcs_score = gcs[admissionid]
        if gcs_score < 6:
            score += 4
        elif 6 <= gcs_score <= 9:
            score += 3
        elif 10 <= gcs_score <= 12:
            score += 2
        elif 13 <= gcs_score <= 14:
            score += 1
            
        # renal
        creat = creatinine[admissionid]
        if creat > 440:
            score += 4
        elif 300 <= creat < 440:
            score += 3
        elif 170 <= creat < 300:
            score += 2
        elif 110 <= creat < 170:
            score += 1
        
        out[admissionid] = score
    return out

#### \*Deep breath\*

In [33]:
sirs_on_admission = dict()
sofa_on_admission = dict()

for i, chunk in pbar(read_csv(r"D:/AmsterdamUMCdb-v1.0.2/numericitems.csv", usecols=['admissionid', 'itemid', 'value', 'measuredat', 'unitid'])):
    
    # which observations are within 12h of start admission?
    times = hours_in_admission(chunk.measuredat, chunk.admissionid)
    chunk = chunk[(times > 0) & (times < 12)]
    
    if len(chunk) == 0:
        continue
        
    #########
    #  SIRS
    #########
    
    # Respiratory rate (unit = breaths/min)
    resp_rate = chunk[chunk.itemid.isin(RESP_RATE) & (chunk.unitid == 15)]
    resp_rate = resp_rate.groupby('admissionid', sort=False).value.first().to_dict()
    
    # PaCO2, PaO2 (mmHg) and FiO2 (%)
    gas_tension = chunk.merge(merged_fio2, on='admissionid')
    gas_tension = gas_tension.groupby('admissionid', sort=False).first()
    pao2_ = gas_tension.pao2.to_dict()
    paco2_ = gas_tension.paco2.to_dict()
    fio2_ = gas_tension.fio2.to_dict()
        
    # Temperature (degrees C)
    temp = chunk[chunk.itemid.isin(TEMPERATURE) & (chunk.unitid == 59)]
    temp = temp.groupby('admissionid', sort=False).value.first().to_dict()
    
    # Heart rate (beats/min)
    heart_rate = chunk[chunk.itemid.isin(HEARTRATE) & (chunk.unitid == 15)]
    heart_rate = heart_rate.groupby('admissionid', sort=False).value.first().to_dict()
    
    # Leukocytes (10^9/l)
    wbc = chunk[chunk.itemid.isin(WBC) & (chunk.unitid == 101)]
    wbc = wbc.groupby('admissionid', sort=False).value.first().to_dict()
    
    # Bands (bloed %)
    bands = chunk[chunk.itemid.isin(BANDS)]
    bands = bands.groupby('admissionid', sort=False).value.first().to_dict()
        
    sirs = compute_sirs(resp_rate, paco2_, temp, heart_rate, wbc, bands)
    sirs_on_admission.update(sirs)
    
    #########
    #  SOFA
    #########
    
    # Mechanical ventilation
    vent = {a:a in admissions_with_vent for a in chunk.admissionid.unique()}
    
    # Platelets (10^9/l)
    platelets = chunk[chunk.itemid.isin(PLATELETS)]
    platelets = platelets.groupby('admissionid', sort=False).value.first().to_dict()
    
    # Bilirubin (µmol/l)
    bilirubin = chunk[chunk.itemid.isin(BILIRUBIN)]
    bilirubin = bilirubin.groupby('admissionid', sort=False).value.first().to_dict()
    
    # MAP (mmHg)
    map_ = chunk[chunk.itemid.isin(MEANBP)]
    map_ = map_.groupby('admissionid', sort=False).value.first().to_dict()
    
    # Creatinine (µmol/l)
    creatinine = chunk[chunk.itemid.isin(CREATININE)]
    creatinine = creatinine.groupby('admissionid', sort=False).value.first().to_dict()
    
    sofa = compute_sofa(pao2_, fio2_, vent, platelets, gcs_on_admission, bilirubin, map_, creatinine)
    sofa_on_admission.update(sofa)
    
    if i > MAX_CHUNKS:
        break

  2%|█▌                                                                            | 456/23106 [01:02<51:41,  7.30it/s]


In [38]:
sirs_on_admission[10]  # SIRS of admission 11 in first 12h

2

In [37]:
sofa_on_admission[10]  # SIRS of admission 10 in first 12h

6

### Age, Height, Weight, Gender

In [40]:
# Age/weight/height groups to their centroids (best guess)
AGE_GROUPS = {
    '18-39': 30,
    '40-49': 45,
    '50-59': 55,
    '60-69': 65,
    '70-79': 75, 
    '80+': 85  # Arbitrary
}

WEIGHT_GROUPS = {
    '59-': 55,
    '60-69': 65, 
    '70-79': 75,
    '80-89': 85,
    '90-99': 95,
    '100-109': 105,
    '110+': 115,
    'N/A': np.NaN # To be replaced with Dutch average weight (for men and women separately)
}
  
HEIGHT_GROUPS = {
    '159-': 155,
    '160-169': 165,
    '170-179': 175,
    '180-189': 185, 
    '190+': 195, 
    'N/A': np.NaN  # To be replaced with average height of men and women in NL
}

# Average weight/height of Dutch men/women
AVG_MALE_HEIGHT = 181 # cm
AVG_MALE_WEIGHT = 84  # kg
AVG_FEMALE_HEIGHT = 168
AVG_FEMALE_WEIGHT = 70

Using the above dictionaries we can now map from grouped demographics (`agegroup`, `heightgroup`, `weightgroup`) to a real value. When unknown use per-gender guesses of weight and height (better than a naive average over the population)

### Received prophylactic treatment

In [72]:
# TODO

### Reason for admission

Sources:
- [reason_for_admission.sql](https://github.com/AmsterdamUMC/AmsterdamUMCdb/blob/master/amsterdamumcdb/sql/diagnosis/reason_for_admission.sql)

### Putting it all together

In [47]:
demo_cols = ['admissionid', 'gender', 'agegroup', 'heightgroup', 'weightgroup']

demo_df = []
for i, chunk in read_csv(r"D:/AmsterdamUMCdb-v1.0.2/admissions.csv", usecols=demo_cols, chunksize=100000000):
    demo_chunk = pd.DataFrame({
        'icustay_id': chunk.admissionid,
        'age': chunk.agegroup.transform(lambda x: AGE_GROUPS[x] if x in AGE_GROUPS else np.NaN),
        'is_male': (chunk.gender == 'Man').astype(int),
        'height': chunk.heightgroup.transform(lambda x: HEIGHT_GROUPS[x] if x in HEIGHT_GROUPS else np.NaN),
        'weight': chunk.weightgroup.transform(lambda x: WEIGHT_GROUPS[x] if x in WEIGHT_GROUPS else np.NaN),
        'vent': chunk.admissionid.isin(admissions_with_vent).astype(int),
        'sirs': chunk.admissionid.transform(lambda x: sirs_on_admission[x] if x in sirs_on_admission else np.NaN),  # NaN if couldn't be estimated
        'sofa': chunk.admissionid.transform(lambda x: sofa_on_admission[x] if x in sofa_on_admission else np.NaN)
    })
    demo_df.append(demo_chunk)
    
demo_df = pd.concat(demo_df, axis=0).reset_index(drop=True)

#### Missing height/weights

Instead of using an overall average of weights/heights in the dataset to impute missing weight/height values, we opt for per-gender estimates of adult men and women in the Netherlands as this will likely give us more precise values.

In [48]:
# If weight/height unknown but gender is, used per-gender estimates
demo_df.loc[demo_df.height.isna() & demo_df.is_male, 'height'] = AVG_MALE_HEIGHT
demo_df.loc[demo_df.weight.isna() & demo_df.is_male, 'weight'] = AVG_MALE_WEIGHT
demo_df.loc[demo_df.height.isna() & ~demo_df.is_male, 'height'] = AVG_FEMALE_HEIGHT
demo_df.loc[demo_df.weight.isna() & ~demo_df.is_male, 'weight'] = AVG_FEMALE_WEIGHT

In [49]:
# save!
demo_df.to_csv(OUT_DIR + 'demographics_cohort.csv', index=False)

---

## Vitals

Sources:
- Vital parameters: [amsterdamumcdb/sql/common](https://github.com/AmsterdamUMC/AmsterdamUMCdb/tree/master/amsterdamumcdb/sql/common)

In [58]:
VITALS = {
    'HeartRate':  [6640],                                      # -- 'Hartfrequentie'
    'SysBP':      [6641, 6678, 8841],                          # -- 'ABP systolisch', 'Niet invasieve bloeddruk systolisch', 'ABP systolisch II', 
    'DiasBP':     [6643, 6680, 8842],                          # -- 'ABP diastolisch', 'Niet invasieve bloeddruk diastolisch', 'ABP diastolisch II'
    'MeanBP':     [6642, 6679, 8843],                          # -- 'ABP gemiddeld', 'Niet invasieve bloeddruk gemiddeld', 'ABP gemiddeld II'
    'Glucose':    [6833, 9557, 9947],                          # -- 'Glucose Bloed', 'Glucose Astrup', 'Glucose (bloed)'
    'SpO2':       [12311, 6709],                               # -- 'O2-Saturatie (bloed)', 'Saturatie (Monitor)'
    'TempC':      [8658, 8659, 8662, 13058, 13059,             # -- 'Temp Bloed', 'Temperatuur Perifeer 2', 'Temperatuur Perifeer 1', 'Temp Rectaal', 'Temp Lies',
                   13060, 13061, 13062, 13063, 13952, 16110],  # -- 'Temp Axillair', 'Temp Oraal', 'Temp Oor', 'Temp Huid', 'Temp Blaas', 'Temp Oesophagus' 
    'RespRate':   [8874, 8873, 9654, 7726, 12266],             # -- 'Ademfrequentie Monitor', 'Ademfrequentie Evita', 'Ademfreq.'
    'SvO2':       [8507, 12534, 12311],                        # -- 'SVO2', 'SvO2 Vigilance'
    # Remark: few measurements of SvO2 (missing some itemids?)
}

In [74]:
vitals_df = []

for i, chunk in pbar(read_csv(r"D:/AmsterdamUMCdb-v1.0.2/numericitems.csv", usecols=['admissionid', 'itemid', 'value', 'measuredat'])):
        
    # which observations are within 72h of start admission?
    times = hours_in_admission(chunk.measuredat, chunk.admissionid)
    chunk = chunk[(times > 0) & (times < 72)]
    
    for vital, vital_ids in VITALS.items():
        # Mask entries of vital
        is_vital = chunk.itemid.isin(vital_ids)
                
        if is_vital.any():
            
            # Store measurements of vital in new DataFrame
            vital_df = pd.DataFrame({
                'icustay_id': chunk.admissionid[is_vital].values,
                'charttime': to_timestamp(first_admission_start, ms=chunk[is_vital].measuredat),
                'vital_id': vital,
                'valuenum': chunk.value[is_vital].values,
            })
            
            vitals_df.append(vital_df)
        
    if i > MAX_CHUNKS:
        break
        
# Merge vital DataFrames
vitals_df = pd.concat(vitals_df, axis=0).reset_index(drop=True)
vitals_df.head()

  2%|█▌                                                                            | 456/23106 [00:40<33:52, 11.15it/s]


Unnamed: 0,icustay_id,charttime,vital_id,valuenum
0,0,2022-11-18 23:43:52.896987,HeartRate,83.0
1,0,2022-11-19 00:13:52.896987,HeartRate,84.0
2,0,2022-11-19 01:13:52.896987,HeartRate,82.0
3,0,2022-11-19 02:13:52.896987,HeartRate,78.0
4,0,2022-11-19 03:13:52.896987,HeartRate,73.0


In [75]:
vitals_df.value_counts('vital_id')

vital_id
RespRate     358668
HeartRate    340912
SpO2         326199
MeanBP       297163
SysBP        296720
DiasBP       296662
TempC         65140
Glucose        6405
SvO2           5247
dtype: int64

In [77]:
# save!
vitals_df.to_csv(OUT_DIR + 'vitals_cohort.csv', index=False)

---

## FiO2 (Fraction of Inspired Oxygen)

We already did this above to compute the SOFA score, therefore we will simply reuse those estimates here

In [86]:
fio2_df = estimated_fio2.rename({
    'admissionid': 'icustayid',
    'measuredat': 'charttime',
}, axis=1)

# convert charttime to timestamps
fio2_df['charttime'] = to_timestamp(first_admission_start, ms=fio2_df.charttime)

In [87]:
# save!
fio2_df.to_csv(OUT_DIR + 'fio2_cohort.csv', index=False)

---

## Lab Results
Sources: 

- Kalium, Platelets, etc.: [AmsterdamUMCdb/common](https://github.com/AmsterdamUMC/AmsterdamUMCdb/tree/master/amsterdamumcdb/sql/common)
- lots of hard work, blood, sweat and tears and enough coffee to take down an elephant

In [95]:
# Missing: PT
LABRESULTS = {
    'CALCIUM':     [6817, 9933],                     # -- 'Calcium', 'Calcium totaal (bloed)'
    'ION_CALCIUM': [10267],                          # -- 'Ca-ion (7.4) (bloed)'
    'ASAT':        [11990],                          # -- 'ASAT (bloed)'
    # Remark: PTT is not equivalent to aPTT nor PT, however used as a proxy in Roggeveen et al.
    'PTT':         [11944, 11948, 17982],            # -- 'APTT (bloed)', 'APTT Gecorrigeerd (bloed)', 'APTT (bloed)'
    'POTASSIUM':   [6835, 9927, 9556, 10285],        # -- 'Kalium', 'Kalium (bloed)', 'Kalium Astrup', 'K (onv.ISE) (bloed)'
    'PLATELET':    [9964, 6797, 10409, 14252],       # -- 'Thrombocyten', "Thrombo's (bloed)", "Thrombo's citr. bloed (bloed)", 'Thrombo CD61 (bloed)'
    'ANION GAP':   [9559, 8492],                     # -- 'Anion-Gap (bloed)', 'AnGap'
    'PAO2':        [7433, 9996, 21214],              # -- 'PaO2', 'PaO2 (bloed)', 'PaO2 (bloed) - kPa'
    'ALAT':        [6800, 11978],                    # -- 'ALAT', 'ALAT (bloed)'
    'WBC':         [6779, 9965],                     # -- 'Leucocyten', "Leuco's (bloed)"
    'BILIRUBIN':   [9945, 6813],                     # -- 'Bilirubine (bloed)', 'Bili Totaal'
    'SODIUM':      [12233, 9555, 9924, 10284],       # -- 'Natrium (overig)', 'Natrium Astrup', 'Natrium (bloed)', 'Na (onv.ISE) (bloed)'
    'CHLORIDE':    [14413],                          # -- Cl (onv.ISE) (bloed)
    'MAGNESIUM':   [9952],                           # -- 'Magnesium (bloed)'
    'LACTATE':     [10053],                          # -- 'Lactaat (bloed)'
    'PACO2':       [6846, 9990, 21213],              # -- 'PCO2', 'PCO2 (bloed)', 'PCO2 (bloed) - kPa'
    'GLUCOSE':     [6833, 9947],                     # -- 'Glucose Bloed', 'Glucose (bloed)'
    'CREATININE':  [9941, 14216],                    # -- 'Kreatinine (bloed)', 'KREAT enzym. (bloed)'
    'BICARBONATE': [6810, 9992],                     # -- 'HCO3 (astrups)', 'Act.HCO3 (bloed)'
    # Remark: ureum is not equivalent to BUN, but used as a proxy in Roggeveen et al.
    'BUN':         [9943],                           # -- 'Ureum (bloed)'
    'PH':          [6848, 12310],                    # -- 'pH', 'pH (bloed)'
    'ALBUMIN':     [9937],                           # -- 'Alb.Chem (bloed)'
    'BANDS':       [11586],                          # -- Staaf % (bloed)
    'HEMOGLOBIN':  [9960, 10286, 6778],              # -- 'Hb (bloed)', 'Hb(v.Bgs) (bloed)', 'Hemoglobine'
    'BaseExcess':  [9994],                           # -- 'B.E. (bloed)'
    'HEMATOCRIT':  [6777, 11423, 11545],             # -- 'Hematocriet', 'Ht (bloed)', 'Ht(v.Bgs) (bloed)'
    'D-DIMER':     [10393, 12163]                    # -- 'D-dimeren', '	D-DIMEER (bloed)'
}

In [92]:
labs_df = []

for i, chunk in pbar(read_csv(r"D:/AmsterdamUMCdb-v1.0.2/numericitems.csv", usecols=['admissionid', 'itemid', 'value', 'measuredat', 'unitid'])):
    
    # which observations are within 72h of start admission?
    times = hours_in_admission(chunk.measuredat, chunk.admissionid)
    chunk = chunk[(times > 0) & (times < 72)].copy()
    
    for lab, lab_ids in LABRESULTS.items():
        # Mask entries of one lab test
        is_lab = chunk.itemid.isin(lab_ids)
        
        if is_lab.any():
            
            # Rate conversion form kPa to mmHg
            lab_chunk = chunk[is_lab].copy()
            lab_chunk.loc[lab_chunk.unitid == 152, 'value'] *= lab_chunk.value * 7.50061683
            
            # Store lab measurements in new DataFrame
            lab_df = pd.DataFrame({
                'icustay_id': lab_chunk.admissionid.values,
                'charttime': to_timestamp(first_admission_start, ms=lab_chunk.measuredat),
                'lab_id': lab,
                'valuenum': lab_chunk.value,
            })
            
            labs_df.append(lab_df)
        
    if i > MAX_CHUNKS:
        break
        
# Merge lab result DataFrames
labs_df = pd.concat(labs_df, axis=0).reset_index(drop=True)
labs_df.head()

  2%|█▌                                                                            | 456/23106 [00:52<43:05,  8.76it/s]


Unnamed: 0,icustay_id,charttime,lab_id,valuenum
0,0,2022-11-19 00:07:52.896987,CALCIUM,2.28
1,0,2022-11-19 06:42:52.896987,CALCIUM,2.1
2,1,2022-11-18 18:26:52.896987,CALCIUM,2.01
3,1,2022-11-18 20:56:52.896987,CALCIUM,2.08
4,2,2022-11-18 18:22:52.896987,CALCIUM,2.02


In [93]:
labs_df.value_counts('lab_id')

lab_id
POTASSIUM      6404
SODIUM         6305
GLUCOSE        6300
HEMOGLOBIN     6031
HEMATOCRIT     5930
PACO2          5466
PAO2           5429
BICARBONATE    5370
PH             5360
BaseExcess     5225
ION_CALCIUM    4132
ANION GAP      4029
CHLORIDE       2957
PLATELET       1774
PTT            1620
LACTATE        1571
WBC            1423
CREATININE     1395
MAGNESIUM      1342
CALCIUM        1261
BUN             842
ALBUMIN         776
ALAT            648
ASAT            633
BILIRUBIN       468
BANDS             7
DDIMER            3
dtype: int64

In [101]:
# save!
labs_df.to_csv(OUT_DIR + 'labs_cohort.csv', index=False)

---
## Urine Output

In [97]:
URINE_OUTPUTS = [
    8794,  # UrineCAD
    8796,  # UrineSupraPubis
    8798,  # UrineSpontaan
    8800,  # UrineIncontinentie
    8803,  # UrineUP
    10743, # Nefrodrain li Uit
    10745, # Nefrodrain re Uit
    19921, # UrineSplint Li
    19922  # UrineSplint Re
]

In [98]:
urine_outputs_df = []

for i, chunk in pbar(read_csv(r"D:/AmsterdamUMCdb-v1.0.2/numericitems.csv", usecols=['admissionid', 'itemid', 'value', 'measuredat'])):
    
    # which observations are within 72h of start admission?
    times = hours_in_admission(chunk.measuredat, chunk.admissionid)
    chunk = chunk[(times > 0) & (times < 72)].copy()
    
    # which entries have urine output
    is_urine_output = chunk.itemid.isin(URINE_OUTPUTS)

    if is_urine_output.any():
        
        # Store lab measurements in new DataFrame
        urine_output_df = pd.DataFrame({
            'icustay_id': chunk.admissionid[is_urine_output].values,
            'charttime': to_timestamp(first_admission_start, ms=chunk.measuredat[is_urine_output]),
            'value': chunk.value[is_urine_output].values,
        })

        urine_outputs_df.append(urine_output_df)
        
    if i > MAX_CHUNKS:
        break
        
# Merge urine output DataFrames
urine_outputs_df = pd.concat(urine_outputs_df, axis=0).reset_index(drop=True)
urine_outputs_df.head()

1001it [00:40, 24.99it/s]


Unnamed: 0,icustay_id,charttime,value
0,0,2022-11-18 23:43:52.896987,90.0
1,0,2022-11-19 00:13:52.896987,310.0
2,0,2022-11-19 01:13:52.896987,360.0
3,0,2022-11-19 02:13:52.896987,180.0
4,0,2022-11-19 03:13:52.896987,120.0


In [100]:
# save!
urine_outputs_df.to_csv(OUT_DIR + 'urineoutput_cohort.csv', index=False)

---

## Vasopressors

Sources:
- Overview common vasopressor drugs for treatment of sepsis: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7333107/
- Vasopressors in UMCdb: [vasopressors_inotropes.ipynb](https://github.com/AmsterdamUMC/AmsterdamUMCdb/blob/master/concepts/lifesupport/vasopressors_inotropes.ipynb), [vasopressors_inotropes.sql](https://github.com/AmsterdamUMC/AmsterdamUMCdb/blob/master/amsterdamumcdb/sql/common/vasopressors_inotropes.sql)
- Norepinephrine-equivalents: [get_vassopressor_mv.sql](https://github.com/LucaMD/SRL/blob/master/SEPSIS/MIMIC_sql/get_vassopressor_mv.sql)

Exclude *inotropes* which may be used to increase heart rate

In [105]:
VASOPRESSORS = [
    7179,  # -- Dopamine (Inotropin)
    # 7178, -- Dobutamine (Dobutrex)           -> inotropic
    6818,  # -- Adrenaline (Epinefrine)        -> inotropic but also known to induce vasoconstriction
    7229,  # -- Noradrenaline (Norepinefrine)   
    # 7135, -- Isoprenaline (Isuprel)          -> inotropic
    # 7196, -- Enoximon (Perfan)               -> inotropic
    # 12467, -- Terlipressine (Glypressin)
    # 13490, -- Methyleenblauw IV (Methylthionide chloride)
    19929  # -- Fenylefrine
]

#### Extracting weight of patients
To compute the vasopressor dose in mcg/kg/min we need to know the weight of the patient in kg. We determine the approximate weight of the patient using the `weightgroup` entry in `admissions.csv` and the weight definitions in `WEIGHT_GROUPS` above.

In [122]:
patient_weight = dict()

for i, chunk in read_csv(r"D:/AmsterdamUMCdb-v1.0.2/admissions.csv", usecols=['admissionid', 'weightgroup', 'gender'], chunksize=10000000): # -> do all at once, this file is super small
    # Convert weightgroups, e.g. '70-79', to approximate weight (75)
    weights = chunk.weightgroup.transform(lambda adm: WEIGHT_GROUPS[adm] if adm in WEIGHT_GROUPS else np.NaN)
        
    # If weight is not known but gender is, use per-gender average of population in NL
    weights.loc[weights.isna() & (chunk.gender == 'Man')] = AVG_MALE_WEIGHT
    weights.loc[weights.isna() & (chunk.gender == 'Female')] = AVG_FEMALE_WEIGHT
    
    patient_weight = dict(zip(chunk.admissionid, weights))

In [123]:
patient_weight[75]

84.0

#### Computing vasopressor rate in mcg/kg/min

sources:
- [vasopressors_inotropes.sql](https://github.com/AmsterdamUMC/AmsterdamUMCdb/blob/master/amsterdamumcdb/sql/common/vasopressors_inotropes.sql)

ordercategoryid
 - 65 -> continuous IV perfusor
 
doserateperkg:
 - Whether dose is already per kg -> 0/1
 
doseunitid:
 - 10 -> mg
 - 11 -> µg
 
doserateunitid:
 - 4 -> min
 - 5 -> uur

In [127]:
vaso_cols = ['admissionid', 'itemid', 'start', 'stop', 'ordercategoryid', 'administeredunit', 'dose', 'doserateperkg', 'doseunitid', 'doserateunitid', 'rate']

vasopressors_df = []

for i, chunk in pbar(read_csv(r"D:/AmsterdamUMCdb-v1.0.2/drugitems.csv", usecols=vaso_cols)):
    
    # which treatments were administered starting before 72h after start admission?
    starttimes = hours_in_admission(chunk.start, chunk.admissionid)
    chunk = chunk[(starttimes > 0) & (starttimes < 72)].copy()
    
    # extract drugitems corresponding to vasopressors via continuous IV
    chunk = chunk[chunk.itemid.isin(VASOPRESSORS) & (chunk.rate > 0) & (chunk.ordercategoryid == 65)].copy()   
      
    # rate conversion
    per_kg = chunk.doserateperkg == 1
    is_mg = chunk.doseunitid == 10 
    per_hour = chunk.doserateunitid == 5
    weight_kg = chunk.admissionid.transform(lambda adm: patient_weight[adm] if adm in patient_weight else 80)
        
    chunk['gamma'] = chunk.dose.copy()
    chunk.loc[~per_kg, 'gamma'] = chunk.dose / weight_kg
    chunk.loc[is_mg, 'gamma'] = 1000 * chunk.dose
    chunk.loc[per_hour, 'gamma'] = chunk.dose / 60
    
    # çonvert to norepinephrine equivalent dose
    chunk.loc[chunk.itemid == 7179, 'gamma'] = 0.01 * chunk.dose   # Dopamine
    chunk.loc[chunk.itemid == 19929, 'gamma'] = 0.45 * chunk.dose  # Fenylefrine
    
    # convert to DataFrame
    vasopressor_df = pd.DataFrame({
        'icustay_id': chunk.admissionid,
        'starttime': to_timestamp(first_admission_start, ms=chunk.start),
        'endtime': to_timestamp(first_admission_start, ms=chunk.stop),
        'mcgkgmin': chunk.gamma
    })
    vasopressors_df.append(vasopressor_df)
    
    if i > MAX_CHUNKS:
        break
    
# merge
vasopressors_df = pd.concat(vasopressors_df, axis=0).reset_index(drop=True)
vasopressors_df.head()

100%|█████████████████████████████████████████████████████████████████████████▋| 22999/23106 [00:18<00:00, 1217.51it/s]


Unnamed: 0,icustay_id,starttime,endtime,mcgkgmin
0,1,2022-11-18 18:51:52.896987,2022-11-18 23:52:52.896987,0.08
1,2,2022-11-18 18:23:52.896987,2022-11-18 18:59:52.896987,0.006667
2,2,2022-11-18 18:59:52.896987,2022-11-18 19:19:52.896987,0.01
3,2,2022-11-18 19:19:52.896987,2022-11-18 19:34:52.896987,0.006667
4,2,2022-11-18 19:34:52.896987,2022-11-18 19:52:52.896987,0.003333


In [128]:
# save!
vasopressors_df.to_csv(OUT_DIR + 'vassopressors_mv_cohort.csv', index=False)

---

## Fluids

In [129]:
IV_CATEGORIES = [
    # 25, # Injecties Haematologisch
    55, # Infuus - Crystalloid
    # 27, # Injecties Overig
    # 23, # Injecties CZS/Sedatie/Analgetica
    # 67, # Injecties Hormonen/Vitaminen/Mineralen
    65, # 2. Spuitpompen
    # 24, # Injecties Circulatie/Diuretica
    15, # Injecties Antimicrobiele middelen
    17, # Infuus - Colloid
    61, # Infuus - Bloedproducten
]

In [None]:
fluid_cols = ['admissionid', 'itemid', 'item', 'start', 'stop', 'ordercategory', 'ordercategoryid', 'administered', 'administeredunitid']

iv_fluids_df = []

for i, chunk in tqdm(read_csv(r"D:/AmsterdamUMCdb-v1.0.2/drugitems.csv", usecols=fluid_cols)):
    # convert starttime and endtime to hours since start of admission
    chunk['start'] = time_within_admission(chunk.start, chunk.admissionid)
    chunk['stop'] = time_within_admission(chunk.stop, chunk.admissionid)
        
    # extract drugitems corresponding to intravenous infusion in ml
    chunk = chunk[chunk.ordercategoryid.isin(IV_CATEGORIES) & (chunk.administeredunitid == 6)].copy()
        
    # TODO: tonicity

    # convert to DataFrame
    iv_fluid_df = pd.DataFrame({
        'icustay_id': chunk.admissionid,
        'starttime': to_timestamp(first_admission_start, ms=chunk.start),
        'endtime': to_timestamp(first_admission_start, ms=chunk.stop),
        'itemid': -1, # -1 to prevent conflict with MIMIC itemids
        'ordercategoryname': chunk.ordercategory,
        'amountuom': 'ml',
        'amount': chunk.administered
    })
    iv_fluids_df.append(iv_fluid_df)
    
    if i > MAX_CHUNKS:
        break
    
# merge
iv_fluids_df = pd.concat(iv_fluids_df, axis=0).reset_index(drop=True)
iv_fluids_df.head()

In [None]:
# save!
iv_fluids_df.to_csv(OUT_DIR + 'inputevents_mv_cohort.csv')

--- 
## Statistics

In [None]:
print('Mortality rate: %.3f' % cohort_df.hospital_expire_flag.mean())