# Data Exploration

Exploring MIMIC waveform data.

## Identify records

Identify the records in the [MIMIC III Waveform Database](https://doi.org/10.13026/c2607m)

### Specify the required Python packages
- The WFDB package is imported using `import wfdb`

In [60]:
# Setup
import sys
import wfdb # The WFDB Toolbox

### Get a list of records in the database
- Use the [`get_record_list`](https://wfdb.readthedocs.io/en/latest/io.html#wfdb.io.get_record_list) function from the WFDB toolbox to get a list of records in the database.

In [69]:
database_name = 'mimic3wdb' # The name of the MIMIC III Waveform Database on Physionet (see URL: https://physionet.org/content/mimic3wdb/)
records = wfdb.get_record_list(database_name)
print("Done: Loaded list of {} records for '{}' database".format(len(records), database_name))

Done: Loaded list of 67830 records for 'mimic3wdb' database


- Display the first few records

In [62]:
print("First ten records: {}".format(records[0:10]))
# you could also try: print("Last ten records: {}".format(records[-10:]))

First ten records: ['30/3000003/', '30/3000031/', '30/3000051/', '30/3000060/', '30/3000063/', '30/3000065/', '30/3000086/', '30/3000100/', '30/3000103/', '30/3000105/']


_Note the formatting of these records: each starts with an intermediate directory ('30' in this case), followed by a record directory._

## Extract metadata for a record

Each record contains metadata stored in a ".hea" file.

### Specify the online directory containing a record's data

In [63]:
record_no = 0 # specify the first record (noting that in Python the first index is 0)
record_dir = database_name + '/' + records[record_no]
print("Physionet directory specified for this record: {}".format(record_dir))

Physionet directory specified for this record: mimic3wdb/30/3000003/


### Specify the name of the record

In [64]:
record_name = records[record_no].split("/")[1] # This extracts the record name (e.g. '3000003') from the record
print("Record name: {}".format(record_name))

Record name: 3000003


### Load the metadata for this record
- Use the [`rdheader`](https://wfdb.readthedocs.io/en/latest/io.html#wfdb.io.rdheader) function from the WFDB toolbox to load metadata from the record header file

In [65]:
record_data = wfdb.rdheader(record_name, pn_dir=record_dir, rd_segments=False)
print("Done: metadata loaded from header file at URL: {}".format("https://physionet.org/content/" + record_dir.replace("mimic3wdb/", "mimic3wdb/1.0/") + record_name + ".hea"))

Done: metadata loaded from header file at URL: https://physionet.org/content/mimic3wdb/1.0/30/3000003/3000003.hea


## Inspect details of physiological signals in this record
- Printing a few details from the extracted metadata

In [66]:
print("- Number of signals: {}".format(record_data.n_sig))
print("- Duration: {:.1f} hours".format(record_data.sig_len/(record_data.fs*60*60))) 
print("- Sampling frequency: {} Hz".format(record_data.fs))

- Number of signals: 5
- Duration: 42.0 hours
- Sampling frequency: 125 Hz


Note that:
- Not all signals may be present throughout the duration of the record
- All signals in MIMIC are sampled at 125 Hz.

## Inspecting the segments in a record
Each record is typically made up of several segments

- Inspect the segments in this record

In [72]:
from pprint import pprint
print("This record contains the following {} segments:".format(len(record_data.seg_name)-1) ) # -1 because the first item is the layout.
print(record_data.seg_name)

This record contains the following 22 segments:
['3000003_layout', '3000003_0001', '3000003_0002', '3000003_0003', '3000003_0004', '3000003_0005', '3000003_0006', '3000003_0007', '~', '3000003_0008', '~', '3000003_0009', '~', '3000003_0010', '3000003_0011', '3000003_0012', '3000003_0013', '3000003_0014', '3000003_0015', '~', '3000003_0016', '3000003_0017', '~']


Note the format of the names of the segments: record directory, "_", segment number

## Inspecting individual segments