# Workshop 0️: Preparatory Materials

### Background



​
Pulse oximeters are medical devices used to assess peripheral arterial oxygen saturation ($SpO_2$) noninvasively. In contrast, the "gold standard" requires arterial blood to be drawn to measure the arterial oxygen saturation ($SaO_{2}$). Pulse oximeters currently on the market measure  in populations with darker skin tones with lower accuracy.

Pulse oximetry inaccuracies can fail to detect episodes of hidden hypoxemia, i.e., low $SaO_{2}$ with high $SpO_2$. Hidden hypoxemias can result in less treatment and increased mortality. Yet flawed, pulse oximeters remain ubiquitously used because of their ease of use; debiasing the underlying algorithms could alleviate the downstream repercussions of hidden hypoxemia.

​

## 1. Literature Review



Here is some recent literature that motivates this Case Study.

A comprehensive living literature compilation has been put together by [Open Oximetry](https://openoximetry.org/publications/). Explore it, if you are curious to learn more!

However, these 5 papers below are our main recommendations.

You have a summary below, but can access the full articles in the
[Google Drive Folder](https://drive.google.com/drive/folders/1fey9LDUynWk2ZgVKeFc9B2tfnrlyiR7c?usp=share_link).

If you are short on time, we recommend the first 2-3 papers as a must read, and the other ones are optional. 

### 🗎 [Paper 1](https://drive.google.com/file/d/14IirFD9SnVvmiD4W3KMXYmc2eQVwcSlB/view?usp=share_link) (letter)
**Racial Bias in Pulse Oximetry Measurement**



*This study of 48,097 pair of measures from University of Michigan report nearly three times more occult hypoxemia in Black patients compared to White patients. However, not all Black patients with pulse oxymetry of 92-96% had occult hypoxemia, with the correction factor not straightforward.*

Sjoding MW, Dickson RP, Iwashyna TJ, Gay SE, Valley TS. Racial Bias in Pulse Oximetry Measurement. N Engl J Med. 2020 Dec 17;383(25):2477-2478. doi: 10.1056/NEJMc2029240. Erratum in: N Engl J Med. 2021 Dec 23;385(26):2496. PMID: 33326721; PMCID: PMC7808260. https://www.nejm.org/doi/10.1056/NEJMc2029240

### 🗎 [Paper 2](https://drive.google.com/file/d/15H9rYmtFrHGE3BjqxpChZOQNgHLZsjs-/view?usp=share_link) (original research)
**Analysis of Discrepancies Between Pulse Oximetry and Arterial Oxygen Saturation Measurements by Race and Ethnicity and Association With Organ Dysfunction and Mortality**

*In this cross-sectional study of 5 databases with 87 971 patients,
significant disparities in pulse oximetry accuracy across racial and ethnic subgroups (ie, Asian, Black, Hispanic, and White individuals) were found, with higher rates of hidden hypoxemia associated with mortality, future organ dysfunction, and abnormal laboratory test results.*

Wong AI, Charpignon M, Kim H, et al. Analysis of Discrepancies Between Pulse Oximetry and Arterial Oxygen Saturation Measurements by Race and Ethnicity and Association With Organ Dysfunction and Mortality. JAMA Netw Open. 2021;4(11):e2131674. https://doi:10.1001/jamanetworkopen.2021.31674

### 🗎 [Paper 3](https://drive.google.com/file/d/1JbfOuWENSHAzn5f6ztTZXwnZGkUfbiZZ/view?usp=share_link) (original research)
**Assessment of Racial and Ethnic Differences in Oxygen Supplementation Among Patients in the Intensive Care Unit**

*In this cohort study of 3069 patients in the intensive care unit, Asian, Black, and Hispanic patients had a higher adjusted time-weighted average pulse oximetry reading and were administered significantly less supplemental oxygen for a given average hemoglobin oxygen saturation compared with White patients.*

Gottlieb ER, Ziegler J, Morley K, Rush B, Celi LA. Assessment of Racial and Ethnic Differences in Oxygen Supplementation Among Patients in the Intensive Care Unit. JAMA Intern Med. 2022;182(8):849–858. https://doi:10.1001/jamainternmed.2022.2587

### 🗎 [Paper 4](https://drive.google.com/file/d/1PUtDG5TWKvClljRHokx1fl60mzU7zCy5/view?usp=share_link) (review)

**Racial Disparity in Oxygen Saturation Measurements by Pulse Oximetry**

*In this article, we review available data regarding the accuracy of pulse oximeters for individuals with dark skin tones, as well as the clinical implications of device inaccuracy at both a patient and public health level. Moreover, we emphasize the urgent need to address the problem of pulse oximeter inaccuracy for individuals with dark skin tones and provide suggestions for the next steps to address the resulting racial health disparity.*

Jamali, H., Castillo, L. T., Morgan, C. C., Coult, J., Muhammad, J. L., Osobamiro, O. O., Parsons, E. C., & Adamson, R. (2022). Racial Disparity in Oxygen Saturation Measurements by Pulse Oximetry: Evidence and Implications. Annals of the American Thoracic Society, 19(12), 1951–1964. https://doi.org/10.1513/AnnalsATS.202203-270CME



### 🗎 [Paper 5](https://drive.google.com/file/d/1EmAe4a1tQPnWifVOhST4jmCDeD9C1Hx6/view?usp=share_link) (letter)

**Dynamic Errors in Pulse Oximetry Preclude Use of Correction Factor**

*In “Racial Disparity in Oxygen Saturation Measurements by Pulse Oximetry: Evidence and Implications” [Paper 4 of this workshop] the authors raise the possibility of implementing a skin-tone correction factor to compensate for the overestimation of arterial oxygen saturation ($SaO_2$) by pulse oximeters ($SpO_2$) among individuals with darker skin. We wish to highlight more recent data that suggest this strategy would not rectify the error but rather cement the inequity imposed by current technology and worsen disparities by disproportionately harming patients of color.*

(...) *More concerning, three-quarters of patients had bidirectional errors over time, such that pulse oximeters both under- and overestimated oxygen saturation for the same subject at different time points.*

Fawzy, A., Valbuena, V. S. M., Chesley, C. F., Wu, T. D., & Iwashyna, T. J. (2023). Dynamic Errors in Pulse Oximetry Preclude Use of Correction Factor. Annals of the American Thoracic Society, 20(2), 338–339. https://doi.org/10.1513/AnnalsATS.202210-872LE

## 2. Objective of the Datathon


The worldwide utilization of Pulse Oximeters demands urgent action to prevent further downstream harm. While new devices are being designed, a new approach to recalibrate existing devices is necessary, with the goal of mitigating racial-ethnic based underperformance. To the best of our knowledge, this has not been done before.

We aim to fill this gap by creating a correction model for $SpO_2$ using Machine Learning (ML) methods.

**The hypothesis is that recalibration can be achieved by leveraging an ML model that is fed with $SpO_2$ measurements, alongside with patient demographics, physiological data, and specific
treatment information.**

## 3. The Dataset

## [MIT Critical Datathon 2023: a MIMIC-IV Derived Dataset for Pulse Oximetry Correction Models](https://physionet.org/content/mit-critical-datathon-2023/1.0.0/)




​
This dataset supports the building of Pulse Oximetry Correction Models. Derived from MIMIC-IV v2.2, it includes 14,404 distinct patients admitted to the Intensive Care Unit (ICU) from 15,923 ICU stays. Paired  measurements are aligned with patient demographics, physiological data, and treatment information. There are 81,797  pairs in total, captured within 90 minutes, where each variable has a time delta relative to the  timestamp.

​More info in the [PhysioNet Project Page](https://physionet.org) (not published yet!)

*Johnson, A.E.W., Bulgarelli, L., Shen, L. et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data 10, 1 (2023). https://doi.org/10.1038/s41597-022-01899-x*

## 4. Data Access

### Prerequisites

The MIMIC datasets are access-controlled by [PhysioNet](https://physionet.org/).

Follow the instructions in the bottom of the PhysioNet Dataset page to get access to the data. Overall, you must:
- Be a credentialled PhysioNet user,
- Complete the appropriate institutional research training (CITI training in human research subject protection and HIPAA regulations) and get it verified by PhysioNet,
- Sign the Data Use Agreement


**Useful Materials**
- [MIMIC - Get Started Tutorial](https://mimic.mit.edu/docs/gettingstarted/)
- [CITI Course Instructions](https://physionet.org/about/citi-course/#:~:text=In%20order%20to%20become%20a,subject%20protections%20and%20HIPAA%20regulations.)

### Access Data

Please download zip file of the dataset from [MIT Critical Datathon 2023: a MIMIC-IV Derived Dataset for Pulse Oximetry Correction Models](https://physionet.org/content/mit-critical-datathon-2023/1.0.0/)

## 5. Data Load

### Understand the variables:

The first step is to understand what variables your dataset has and how these variables are distributed. The columns and data types are.

You can find the definition of each variables in `mimic_pulseOx_dictionary.csv`.

### Try to load data:

Load dataset from `mimic_pulseOx_data.csv`.

`You can load data through 2 options:`

### Option 1: Google Drive
(if you are using Google Colab)

`Mount Google Drive to access your files:`

In [1]:
# from google.colab import drive
# drive.mount('/content/drive')

In [2]:
# # check current directory
# %pwd

# # move to directory of ongoing project
# %cd /content/drive/MyDrive/"Colab Notebooks"/"Data_Science_For_Digital_Health_2025 - Datathon"/code/Python

`After placing your CSV in your Google Drive, run:`

In [3]:
# import pandas as pd

# # Code here !

# # Read the csv file
# csv_file_path = "../../data/mimic_pulseOx_data.csv"
# data = pd.read_csv(csv_file_path)

`Did it work? Inspect the Data:`

In [4]:
# # Inspect first 5 rows of data
# data.head(5)

### Option 2: In your Local Machine
(if you are not using Google Colab)

`Run`

In [5]:
import pandas as pd

# Code here !

# Read the csv file
csv_file_path = "../../data/mimic_pulseOx_data.csv"
data = pd.read_csv(csv_file_path)

In [6]:
# Inspect first 5 rows of data
data.head(5)

Unnamed: 0,subject_id,stay_id,SaO2_timestamp,SaO2,delta_SpO2,SpO2,hidden_hypoxemia,hadm_id,gender,sex_female,...,delta_mbp,mbp,delta_resp_rate,resp_rate,delta_temperature,temperature,delta_glucose,glucose,delta_heart_rhythm,heart_rhythm
0,10001884,37510196,2131-01-12 21:04:00,90.0,-4,89,0,26184834,F,1,...,-3,96.0,-4.0,19.5,-64.0,36.72,445.0,199.0,-4.0,SR (Sinus Rhythm)
1,10001884,37510196,2131-01-13 02:28:00,92.0,-28,94,0,26184834,F,1,...,-27,98.0,-28.0,22.0,92.0,36.56,121.0,199.0,-28.0,SR (Sinus Rhythm)
2,10002013,39060235,2160-05-18 16:03:00,99.0,0,99,0,23581541,F,1,...,-3,86.5,-2.0,14.0,-3.0,36.9,0.0,155.0,-3.0,SR (Sinus Rhythm)
3,10002013,39060235,2160-05-18 17:42:00,96.0,-42,97,0,23581541,F,1,...,18,73.0,18.0,23.0,18.0,36.7,0.0,149.0,18.0,ST (Sinus Tachycardia)
4,10002013,39060235,2160-05-18 21:32:00,97.0,-32,98,0,23581541,F,1,...,28,86.0,28.0,18.0,28.0,37.5,0.0,141.0,28.0,SR (Sinus Rhythm)
