## Getting Started

Links:
- Multimodality Reproducibility Study [[Google Docs](https://docs.google.com/document/d/1mALf80aaZ5XzacFPy8Dwy1GnyqkfOIS1C0CniKbhszw/edit?tab=t.0)]
- PyHealth Multimodal MIMIC4 Minimal Setup Script [[Github](https://github.com/sunlabuiuc/PyHealth/blob/master/examples/mortality_prediction/multimodal_mimic4_minimal.py)]
- Data Modules from Physionet [[Link](https://mimic.mit.edu/docs/iv/modules/)]
    - EHR [v2.2](https://physionet.org/content/mimiciv/2.2/)

In [2]:
# Change directory to package root
import os
PROJECT_ROOT = '/Users/wpang/Desktop/PyHealth'
os.chdir(PROJECT_ROOT)

In [3]:
# PyHealth Packages
from pyhealth.datasets import MIMIC4Dataset
from pyhealth.tasks import MultimodalMortalityPredictionMIMIC4

In [4]:
# Paths
EHR_ROOT = os.path.join(PROJECT_ROOT, "srv/local/data/physionet.org/files/mimiciv/2.2")
NOTE_ROOT = os.path.join(PROJECT_ROOT, "srv/local/data/physionet.org/files/mimic-iv-note/2.2")
CXR_ROOT = os.path.join(PROJECT_ROOT,"srv/local/data/physionet.org/files/mimic-cxr-jpg/2.0.0")
CACHE_DIR = os.path.join(PROJECT_ROOT,"srv/local/data/wp/pyhealth_cache")

In [5]:
dataset = MIMIC4Dataset(
        ehr_root=EHR_ROOT,
        ehr_tables=["patients", "admissions", "diagnoses_icd",
                    "procedures_icd", "prescriptions", "labevents"],
        note_root=NOTE_ROOT,
        note_tables=["discharge", "radiology"],
        cxr_root=CXR_ROOT,
        cxr_tables=["metadata", "negbio"],
        cache_dir=CACHE_DIR,
        num_workers=8
    )

Memory usage Starting MIMIC4Dataset init: 449.4 MB
Initializing mimic4 dataset from /Users/wpang/Desktop/PyHealth/srv/local/data/physionet.org/files/mimiciv/2.2|/Users/wpang/Desktop/PyHealth/srv/local/data/physionet.org/files/mimic-iv-note/2.2|/Users/wpang/Desktop/PyHealth/srv/local/data/physionet.org/files/mimic-cxr-jpg/2.0.0 (dev mode: False)
Initializing MIMIC4EHRDataset with tables: ['patients', 'admissions', 'diagnoses_icd', 'procedures_icd', 'prescriptions', 'labevents'] (dev mode: False)
Using default EHR config: /Users/wpang/Desktop/PyHealth/pyhealth/datasets/configs/mimic4_ehr.yaml
Memory usage Before initializing mimic4_ehr: 449.4 MB
Duplicate table names in tables list. Removing duplicates.
Initializing mimic4_ehr dataset from /Users/wpang/Desktop/PyHealth/srv/local/data/physionet.org/files/mimiciv/2.2 (dev mode: False)
Memory usage After initializing mimic4_ehr: 449.7 MB
Memory usage After EHR dataset initialization: 449.7 MB
Initializing MIMIC4NoteDataset with tables: ['di



Memory usage Before initializing mimic4_cxr: 696.6 MB
Initializing mimic4_cxr dataset from /Users/wpang/Desktop/PyHealth/srv/local/data/physionet.org/files/mimic-cxr-jpg/2.0.0 (dev mode: False)
Memory usage After initializing mimic4_cxr: 695.6 MB
Memory usage After CXR dataset initialization: 695.6 MB
Memory usage Completed MIMIC4Dataset init: 695.6 MB


In [None]:
# Apply multimodal task
task = MultimodalMortalityPredictionMIMIC4()
samples = dataset.set_task(task, cache_dir=f"{CACHE_DIR}/task", num_workers=8)

# Get and print sample
sample = samples[0]
print(sample)

Setting task MultimodalMortalityPredictionMIMIC4 for mimic4 base dataset...
Applying task transformations on data with 8 workers...
Combining data from ehr dataset
Scanning table: diagnoses_icd from /Users/wpang/Desktop/PyHealth/srv/local/data/physionet.org/files/mimiciv/2.2/hosp/diagnoses_icd.csv.gz
Joining with table: /Users/wpang/Desktop/PyHealth/srv/local/data/physionet.org/files/mimiciv/2.2/hosp/admissions.csv.gz
Scanning table: icustays from /Users/wpang/Desktop/PyHealth/srv/local/data/physionet.org/files/mimiciv/2.2/icu/icustays.csv.gz
Scanning table: admissions from /Users/wpang/Desktop/PyHealth/srv/local/data/physionet.org/files/mimiciv/2.2/hosp/admissions.csv.gz
