# Data Exploration

Exploring MIMIC waveform data.

---

## Identify ICU stays

Identify the ICU stays in the [MIMIC III Waveform Database](https://doi.org/10.13026/c2607m)

### Specify the required Python packages
- The WFDB package is imported using `import wfdb`

In [113]:
# Setup
import sys
import wfdb # The WFDB Toolbox

<div class="alert alert-block alert-warning"> <b>Resource:</b> You can find out more about the WFDB package <a href="https://physionet.org/content/wfdb-python/3.4.1/">here</a>. </div>

### Get a list of ICU stays in the database
- Use the [`get_record_list`](https://wfdb.readthedocs.io/en/latest/io.html#wfdb.io.get_record_list) function from the WFDB toolbox to get a list of ICU stays (here, corresponding to records) in the database.

In [114]:
database_name = 'mimic3wdb/1.0' # The name of the MIMIC III Waveform Database on Physionet (see URL: https://physionet.org/content/mimic3wdb/1.0/)
icustay_records = wfdb.get_record_list(database_name)
print("Done: Loaded list of {} ICU stays for '{}' database".format(len(icustay_records), database_name))

Done: Loaded list of 67830 ICU stays for 'mimic3wdb/1.0' database


- Display the first few records

In [115]:
print("First five ICU stays: {}".format(icustay_records[0:5]))

First five ICU stays: ['30/3000003/', '30/3000031/', '30/3000051/', '30/3000060/', '30/3000063/']


Note the formatting of these records: each starts with an intermediate directory ("30" in this case), followed by a record directory.

<div class="alert alert-block alert-info"> <b>Q:</b> Can you print the names of the last five ICU stays? <br> <b>Hint:</b> in Python, the last five elements can be specified using '[-5:]' </div>

---
## Extract metadata for an ICU stay

Each ICU stay contains metadata stored in a header file, named "\<ICU stay record name\>.hea"

### Specify the online directory containing an ICU stay's data

In this case, each ICU stay corresponds to a record.

In [116]:
icustay_no = 0 # specify the first record (noting that in Python the first index is 0)
icustay_record = icustay_records[icustay_no]
icustay_record_dir = database_name + '/' + icustay_record
print("Physionet directory specified for this ICU stay: {}".format(icustay_record_dir))

Physionet directory specified for this ICU stay: mimic3wdb/1.0/30/3000003/


### Specify the name of the ICU stay

Extract the ICU stay record name (e.g. '3000003') from the ICU stay record (e.g. '30/300003'):

In [117]:
icustay_record_name = icustay_record.split("/")[1]
print("ICU stay name: {}".format(icustay_record_name))

ICU stay name: 3000003


### Load the metadata for this ICU stay
- Use the [`rdheader`](https://wfdb.readthedocs.io/en/latest/io.html#wfdb.io.rdheader) function from the WFDB toolbox to load metadata from the record header file

In [118]:
icustay_record_data = wfdb.rdheader(icustay_record_name, pn_dir=icustay_record_dir, rd_segments=False)
print("Done: metadata loaded for ICU stay '{}' from header file at URL: {}".format(icustay_record_name, "https://physionet.org/content/" + record_dir + record_name + ".hea"))

Done: metadata loaded for ICU stay '3000003' from header file at URL: https://physionet.org/content/mimic3wdb/1.0/30/3000003/3000003.hea


---
## Inspect details of physiological signals recorded in this ICU stay
- Printing a few details of the signals from the extracted metadata

In [119]:
print("- Number of signals: {}".format(record_data.n_sig))
print("- Duration: {:.1f} hours".format(record_data.sig_len/(record_data.fs*60*60))) 
print("- Sampling frequency: {} Hz".format(record_data.fs))

- Number of signals: 2
- Duration: 0.0 hours
- Sampling frequency: 125 Hz


Note that:
- Not all signals may be present throughout the duration of the record
- All signals in MIMIC are sampled at 125 Hz.

---
## Inspect the segments making up an ICU stay
Each ICU stay is typically made up of several segments (which correspond to records)

- Inspect the files in this ICU stay

In [120]:
icustay_files = wfdb.get_record_list(icustay_record_dir)
print("Done: Loaded list of {} files for '{}' ICU stay".format(len(icustay_files), icustay_record_dir))

Done: Loaded list of 19 files for 'mimic3wdb/1.0/30/3000003/' ICU stay


Inspect the contents of the first two files:

In [121]:
print("The first file, '{}', contains the metadata for the ICU stay.".format(icustay_files[0]) )
print("The second file, '{}', contains the numerics data for the ICU stay.".format(icustay_files[1]) )

The first file, '3000003', contains the metadata for the ICU stay.
The second file, '3000003n', contains the numerics data for the ICU stay.


The remaining files contain the waveform data for the ICU stay, split into segments, with one file per segment.

In [122]:
icustay_segments = [s for s in icustay_files if "_" in s]
print("The remaining {} files: {}".format(len(icustay_segments), icustay_segments) )

The remaining 17 files: ['3000003_0001', '3000003_0002', '3000003_0003', '3000003_0004', '3000003_0005', '3000003_0006', '3000003_0007', '3000003_0008', '3000003_0009', '3000003_0010', '3000003_0011', '3000003_0012', '3000003_0013', '3000003_0014', '3000003_0015', '3000003_0016', '3000003_0017']


Note the format of the names of the files containing waveform data for each segment: record directory, "_", segment number

---
## Inspect an individual segment
### Read the metadata for this segment
- Read the metadata from the header file

In [123]:
segment_name = icustay_segments[0]
segment_metadata = wfdb.rdheader(record_name=segment_name, pn_dir=icustay_record_dir) 
print("Header metadata loaded for segment '{}' in ICU stay '{}'".format(segment_name, icustay_record_name))

Header metadata loaded for segment '3000003_0001' in ICU stay '3000003'


### Find out what signals are present, and for how long

In [124]:
print("This segment contains the following signals: {}".format(segment_metadata.sig_name))
print("The signals are measured in units of: {}".format(segment_metadata.units))

This segment contains the following signals: ['II', 'V']
The signals are measured in units of: ['mV', 'mV']


See [here](https://archive.physionet.org/mimic2/mimic2_waveform_overview.shtml#signals-125-samplessecond) for definitions of signal abbreviations.

<div class="alert alert-block alert-info"> <b>Q:</b> Which of these signals is still present in segment '3000003_0014'? </div>

All signals in a segment are time-aligned, measured at the same sampling frequency, and last the same duration:

In [125]:
print("All the signals are sampled at {} Hz".format(segment_metadata.fs))
print("and they last for {:.1f} minutes".format(segment_metadata.sig_len/(segment_metadata.fs*60)) )

All the signals are sampled at 125 Hz
and they last for 2.3 minutes
