# 1. Description

This notebook will produce the sepsis group (4,226 cases) and the non-sepsis group (23,170 cases) described in the paper "Prediction of the ICU mortality based on the missing events.".

# 2. Before running...

Before proceeding the followings, plaease solve the python environment accordingly first. This program requires the following libraries.

In [1]:
import pandas as pd # 1.2.1
import numpy as np # 1.20.2

Then please put the eICU files at the appropriate directory so that the program reaches the input files. To test if you correctly set the input files, run the below. If you could, the cell would not end without errors.

In [2]:
df_patient = pd.read_csv("data/patient.csv")
df_apachePredVar = pd.read_csv("data/apachePredVar.csv")
df_apacheApsVar = pd.read_csv("data/apacheApsVar.csv")
df_apachePatientResult = pd.read_csv("data/apachePatientResult.csv")

For getting the eICU files, please see "https://www.usa.philips.com/healthcare/solutions/enterprise-telehealth/eri" or "https://eicu-crd.mit.edu/gettingstarted/access/"

# 3. IDs for the sepsis group (4,226)

## 3.1 Get all patients

In [3]:
# Load
df_patient = pd.read_csv("data/patient.csv")
len(df_patient)

200859

In [4]:
# Save
df_patient["patientunitstayid"].to_csv("ids/001_all_200859.csv", header=False, index=False)

## 3.2 Select the sepsis group

In [5]:
# Load
df_apachePredVar = pd.read_csv("data/apachePredVar.csv")
len(df_apachePredVar)

171177

In [6]:
# kick out the cases without admitdiagnosis.
df_apachePredVar=df_apachePredVar.dropna(subset=['admitdiagnosis'])
len(df_apachePredVar)

167438

In [7]:
# SEPSIS groups in eRI are recorded with name starting the word "SEPSIS"
df_apachePredVar.admitdiagnosis.head(10)

0       RHYTHATR
2      SEPSISUTI
3     SEPSISPULM
4     RESPARREST
5       ODSEDHYP
6     SEPSISPULM
7            CHF
8       S-VALVMI
9     S-FEMPGRAF
10        ASTHMA
Name: admitdiagnosis, dtype: object

In [8]:
# Select the cases whose "admitdiagnosis" starts with SEPSIS
df_apachePredVar[df_apachePredVar["admitdiagnosis"].str.contains("SEPSIS")]["admitdiagnosis"].unique()

array(['SEPSISUTI', 'SEPSISPULM', 'SEPSISUNK', 'SEPSISOTH', 'SEPSISGI',
       'SEPSISCUT', 'SEPSISGYN'], dtype=object)

In [9]:
# Select the sepsis group
df_sepsis = df_apachePredVar[df_apachePredVar["admitdiagnosis"].str.startswith("SEPSIS")]
len(df_apachePredVar)

167438

In [10]:
# Save
df_sepsis["patientunitstayid"].to_csv("ids/002_sepsis_21980.csv", header=False, index=False)

## 3.3 Select the cases where APS-related variables are not missing

In [11]:
# Load
df_apacheApsVar = pd.read_csv("data/apacheApsVar.csv")
len(df_apacheApsVar)

171177

In [12]:
# The definition of APS
aps=[
    "eyes",
    "motor",
    "verbal",
    "wbc",
    "temperature",
    "respiratoryrate",
    "sodium",
    "heartrate",
    "meanbp",
    "ph",
    "hematocrit",
    "pao2",
    "pco2",
    "fio2"
]

In [13]:
# Exclude the cases that has -1 value in any APS variables
for i in aps:
    df_apacheApsVar = df_apacheApsVar[df_apacheApsVar[i] != -1]
len(df_apacheApsVar)

30173

In [14]:
# Merge
df_sepsis_aps = pd.merge(df_sepsis, df_apacheApsVar, on="patientunitstayid").drop_duplicates()
len(df_sepsis_aps)

4672

In [15]:
# Save
df_sepsis_aps["patientunitstayid"].to_csv("ids/003_sepsis_4672.csv", header=False, index=False)

## 3.4 Select the cases where "actual mortality" is not missing

In [16]:
# Load
df_apachePatientResult = pd.read_csv("data/apachePatientResult.csv")
df_apachePatientResult = df_apachePatientResult[["patientunitstayid", "actualicumortality"]]

In [17]:
# Merge
df_sepsis_aps_mortality = pd.merge(df_sepsis_aps, df_apachePatientResult, on="patientunitstayid", how="inner").drop_duplicates()
len(df_sepsis_aps_mortality)

4226

In [18]:
# Save
df_sepsis_aps_mortality["patientunitstayid"].to_csv("ids/004_sepsis_aps_4226.csv", header=False, index=False)

# 4. IDs for the non-sepsis group (23,170)

In [19]:
# Load
df_apacheApsVar = pd.read_csv("data/apacheApsVar.csv")
len(df_apacheApsVar)

171177

In [20]:
# Exclude the cases that has -1 value in any APS variables
for i in aps:
    df_apacheApsVar = df_apacheApsVar[df_apacheApsVar[i] != -1]
len(df_apacheApsVar)

30173

In [21]:
# Load
df_apachePatientResult = pd.read_csv("data/apachePatientResult.csv")
df_apachePatientResult = df_apachePatientResult[["patientunitstayid", "actualicumortality"]]

In [22]:
# Merge
df_aps_mortality = pd.merge(df_apacheApsVar, df_apachePatientResult, on="patientunitstayid", how="inner").drop_duplicates()
len(df_sepsis_aps_mortality)

4226

In [23]:
# Select IDs
ids_aps = set(df_aps_mortality["patientunitstayid"])
len(ids_aps)

27396

In [24]:
# Select 4,226 IDs
ids_sepsis_aps = set(df_sepsis_aps_mortality["patientunitstayid"])
len(ids_sepsis_aps)

4226

In [25]:
# Get non-sepsis group IDs
ids_nonSepsis_aps = ids_aps - set(ids_sepsis_aps)
len(ids_nonSepsis_aps)

23170

In [26]:
# Save
df_non_sepsis = pd.DataFrame(list(ids_nonSepsis_aps))
df_non_sepsis.to_csv("ids/005_non_sepsis_aps_23170.csv", header=False, index=False)