# Data Extraction

Extract data from the MIMIC-IV Waveform Database.

The **objectives** are:
- To extract the signals from one segment of a record
- To extract only the required duration of the required signals from this segment (_i.e._ 10 minutes of photoplethysmography and blood presure signals) 

<div class="alert alert-block alert-warning"> <b>Context:</b>
    In the <a href="https://wfdb.io/mimic_wfdb_tutorials/tutorial/notebooks/data-exploration.html">Data Exploration</a> tutorial we learnt how to identify segments of waveform data which are suitable for a particular research study (i.e. which have the required duration of the required signals). We extracted metadata for such a segment, providing high-level details of what is contained in the segment (e.g. which signals, their sampling frequency, and their duration). In this tutorial we will go a step further and extract signals in preparation for analysis.
</div>

---
## Setup
<div class="alert alert-block alert-warning"> <b>Resource:</b> These steps are taken from the <a href="https://wfdb.io/mimic_wfdb_tutorials/tutorial/notebooks/data-exploration.html">Data Exploration</a> tutorial. </div>

- Specify the required Python packages

In [7]:
import sys
from pathlib import Path

- Specify a particular version of the WFDB toolbox

In [9]:
!pip install wfdb==4.0.0

Collecting wfdb==4.0.0
  Using cached wfdb-4.0.0-py3-none-any.whl (161 kB)
Collecting SoundFile<0.12.0,>=0.10.0
  Downloading SoundFile-0.10.3.post1-py2.py3.cp26.cp27.cp32.cp33.cp34.cp35.cp36.pp27.pp32.pp33-none-macosx_10_5_x86_64.macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.whl (613 kB)
[K     |███▏                            | 61 kB 546 kB/s eta 0:00:021[31mERROR: Exception:
Traceback (most recent call last):
  File "/Users/petercharlton/anaconda3/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 171, in _merge_into_criterion
    crit = self.state.criteria[name]
KeyError: 'soundfile'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/petercharlton/anaconda3/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 438, in _error_catcher
    yield
  File "/Users/petercharlton/anaconda3/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 519, in read
    data = 

- Specify the settings for the MIMIC-IV database

In [2]:
wfdb.set_db_index_url('https://challenge.physionet.org/benjamin/db') # use the version of the WFDB toolbox which corresponds to MIMIC IV.
database_name = 'mimic4wdb/0.1.0' # The name of the MIMIC IV Waveform Demo Database on Physionet

- Provide a list of segments which meet the requirements for the study (NB: these are copied from the end of the [Data Exploration Tutorial](https://wfdb.io/mimic_wfdb_tutorials/tutorial/notebooks/data-exploration.html)).

In [3]:
segment_names = ['83404654_0005']
segment_dirs = ['mimic4wdb/0.1.0/p100/p10020306/83404654']

- Specify a segment from which to extract data

In [4]:
rel_segment_no = 0
rel_segment_name = segment_names[rel_segment_no]
rel_segment_dir = segment_dirs[rel_segment_no]
print("Specified segment '{}' in directory '{}'".format(rel_segment_name, rel_segment_dir))

Specified segment '83404654_0005' in directory 'mimic4wdb/0.1.0/p100/p10020306/83404654'


---
## Extract data for this segment

- Use the [`rdrecord`](https://wfdb.readthedocs.io/en/latest/io.html#wfdb.io.rdrecord) function from the WFDB toolbox to read the data for this segment.

In [6]:
# e.g. record_data = wfdb.rdrecord(record_name='83411188',pn_dir='mimic4wdb/0.1.0/p100/p10039708/83411188')
record_data = wfdb.rdrecord(record_name=rel_segment_name, pn_dir=rel_segment_dir) 
print("Data loaded from segment: {}".format(rel_segment_name))

KeyError: '516'

- Look at class type of the object in which the data are stored:

In [31]:
print("Data stored in class of type: {}".format(type(record_data)))

Data stored in class of type: <class 'wfdb.io.record.Record'>


<div class="alert alert-block alert-warning"> <b>Resource:</b> You can find out more about the class representing single segment WFDB records <a href="https://wfdb.readthedocs.io/en/stable/io.html?highlight=class#wfdb.io.Record">here</a>. </div>

- Find out about the signals which have been extracted

In [32]:
print("This segment contains waveform data for the following {} signals: {}".format(record_data.n_sig, record_data.sig_name))
print("The signals are sampled at {} Hz".format(record_data.fs))
print("They last for {:.1f} minutes".format(record_data.sig_len/(60*record_data.fs)))

This segment contains waveform data for the following 2 signals: ['II', 'V']
The signals are sampled at 125 Hz
They last for 2.3 minutes
