# Deep Learning for Healthcare Team 55 Project
### Zeeshan Haidry, Hamza Mahmood, Nithin Nathan

Team 55 GitHub Repo: https://github.com/zeeshanhaidry/cs598dlh-team55

Video: https://mediaspace.illinois.edu/media/t/1_x24myjzn

Project based on:
Fayyaz H, Strang A, Beheshti R. Bringing At-home Pediatric Sleep Apnea Testing Closer to Reality: A Multi-modal Transformer Approach. Proc Mach Learn Res. 2023 Aug;219:167-185. PMID: 38344396; PMCID: PMC10854997.

Original GitHub: https://github.com/healthylaife/Pediatric-Apnea-Detection

In [None]:
# instead of drive, we will be using uofi box

# from google.colab import drive
# drive.mount('/content/drive')

In [None]:

# download dependencies

!pip install biosppy
!pip install boxsdk
!pip install mne==1.0
!pip install tensorflow
!pip install tensorflow-addons
!pip install gdown

Collecting biosppy
  Downloading biosppy-2.2.1-py2.py3-none-any.whl (149 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m149.3/149.3 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
Collecting shortuuid (from biosppy)
  Downloading shortuuid-1.0.13-py3-none-any.whl (10 kB)
Collecting mock (from biosppy)
  Downloading mock-5.1.0-py3-none-any.whl (30 kB)
Installing collected packages: shortuuid, mock, biosppy
Successfully installed biosppy-2.2.1 mock-5.1.0 shortuuid-1.0.13
Collecting boxsdk
  Downloading boxsdk-3.9.2-py2.py3-none-any.whl (139 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.2/139.2 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
Collecting requests-toolbelt>=0.4.0 (from boxsdk)
  Downloading requests_toolbelt-1.0.0-py2.py3-none-any.whl (54 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.5/54.5 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: requests-toolbelt, boxsdk
Successfully ins

In [None]:
# authenticate to connect to uofi box

# Data was uploaded to box from sleepdata website. (Link provided in data section)
# Commented to allow remaining of project to run since secrets might not be setup.

from google.colab import userdata
from boxsdk import Client, OAuth2, CCGAuth
from boxsdk.object import file, folder
from pprint import pformat
import json

# CLIENT_ID = userdata.get('clientid2')
# CLIENT_SECRET = userdata.get('clientsecret2')
# ACCESS_TOKEN = userdata.get('token2')

# oauth2 = OAuth2(CLIENT_ID, CLIENT_SECRET, access_token=ACCESS_TOKEN)
# client = Client(oauth2)

# Introduction
This is an introduction to your report, you should edit this text/mardown section to compose. In this text/markdown, you should introduce:

*   Background of the problem
  * what type of problem: disease/readmission/mortality prediction,  feature engineeing, data processing, etc
  * what is the importance/meaning of solving the problem
  * what is the difficulty of the problem
  * the state of the art methods and effectiveness.
*   Paper explanation
  * what did the paper propose
  * what is the innovations of the method
  * how well the proposed method work (in its own metrics)
  * what is the contribution to the reasearch regime (referring the Background above, how important the paper is to the problem).

---

Obstructive sleep apnea hypopnea syndrome (OSAHS) is a breathing disorder where breathing is obstructed while sleeping (Loughlin Et al, 1996). Sleep apnea affects 1%-5% of children in the United States and can lead to other health illnesses if left untreated (Loughlin Et al, 1996; Marcus et al, 2012). Currently, at-home diagnostic tools for sleep apnea are only available for adults, leaving room for models to be created to address the needs for children (Fayyaz et al, 2023). The current state-of-the-art sleep apnea detection models created for adults (CNN (Chang et al., 2020), SE-MSCNN (Chen et al., 2022), CNN+LSTM (Zarei et al., 2022), Hybrid Transformer (Hu et al., 2022)) are effective but cannot be used for children because the sleep data differs between the two, and OSAHS symptoms for children require more attention (Choi et al, 2010; Gipson et al, 2019).

Polysomnography is commonly used to diagnose OSAHS. This process is used to collect various signals while sleeping such as brain activity (EEG), eye movement (EOG), heart rhythm (ECG), blood oxygen saturation (SpO2), blood CO2 levels (ETCO2), and air flow. Although polysomnography is the best method to diagnose OSAHS, it is complex, costly, intrusive, and requires clinician involvement (Spielmanns et al, 2019). Because of these issues, it is not easy for children and their families to use polysomnography to detect OSAHS at home. To address these issues, Fayyaz et al. propose a transformer-based model to help detect OSAHS in children.  Additionally, they compared using all the available polysomnography modalities to only a subset of the available modalities in the model, which is important because a subset of modalities may be significantly easier to collect at home, so finding if a subset performs as well as all polysomnography modalities data increases the feasibility of at-home detection. In terms of metrics like F1 score and AUROC, the proposed transformer-based model outperforms the current state-of-the-art sleep apnea detection models, and an additional edge of factoring demographic data into the modalities improves the proposed model’s performance even further.



# Scope of Reproducibility:

List hypotheses from the paper you will test and the corresponding experiments you will run.
---

The present study is based on the primary hypothesis that it is possible to achieve adult-level performance in detecting OSAHS. Specifically, through a custom transformer-based neural network, and its input in the form of preprocessed ECG and SPO2 signals, we hypothesize that we can effectively study and classify apnea-hypopnea in children.

# Methodology

This methodology is the core of your project. It consists of run-able codes with necessary annotations to show the expeiment you executed for testing the hypotheses.

The methodology contains three subsections for our experiment: **environment**, **data**, and **model** in your experiment.

# Environment

Several libraries/packages were used in our experiment. **Python 3** (Version Python 3.10 specifically) was the primary programming language and version used; other libraries/packages used are described below:

---

- boxsdk.Client
- boxsdk.OAuth2
- boxsdk.CCGAuth
- boxsdk.object.file
- boxsdk.object.folder

**The boxsdk package was used to make API calls to Box API. Package was very helpful with downloading files for pre-processing & reuploading processed data for later use with models.**

---

- csv
- xml.etree.ElementTree

**These two packages were used to convert XML files to TSV files before they could be preprocessed.**

---

- gdown

**The gdown package was used to download pre-trained model and loaded data for evaluation purposes.**

---

- glob

**The glob package was used to iterate through root path and search for all .edf files.**

---

- keras
- keras.Model
-	keras.callbacks.LearningRateScheduler
-	keras.EarlyStopping
-	keras.activations.sigmoid
-	keras.activations.relu
-	keras.layers.Dense
-	keras.layers.Input
-	keras.layers.Conv1D
-	keras.layers.SeparableConvolution1D
-	keras.layers.concatenate
-	keras.layers.Layer
-	keras.layers.MultiHeadAttention
-	keras.layers.Add
-	keras.layers.LayerNormalization
-	keras.layers.Dropout
-	keras.layers.GlobalAveragePooling1D
-	keras.regularizers.L2
-	keras.losses.BinaryCrossentropy

**Keras is an open-source library that provides functionality to build and train deep learning models. The above packages were all used in the building and training of our transformer model.**

---

- mne == 1.0

**The mne package was used to work with several types of neurophysiological data. Specifically in our project, the package was used to load study and annotation data during the preprocessing phase. A specific version was used to match was what used in the original paper.**

---

- numpy

**The numpy package was used to load and save data as .npz files to be fed into the transformer model.**

---

- os

**The os package was used for several miscellaneous operations, such as making new directories, access different paths, and rename directories, just to name a few.**

---

- PediatricApneaDetection.data.chat.preprocessing
-	PediatricApneaDetection.data.chat.dataloader
-	PediatricApneaDetection.metrics.Result

**These packages were borrowed from the original paper and modified to run with our code due to hardcoded values that were used in the original paper.**

---

- tensorflow
-	tensorflow-addons
-	tensorflow_addons
-	tensorflow.python.client.device_lib

**Tensorflow is an open-source library used for computation and calculation of data by several machine learning and artificial intelligence models. The above packages were used throughout the building, training, and testing of our transformer model.**


In [None]:
# we will be re-using the authors code for data pre-processing and loading, we will also extract some code for the model
!git clone https://github.com/healthylaife/Pediatric-Apnea-Detection.git

Cloning into 'Pediatric-Apnea-Detection'...
remote: Enumerating objects: 75, done.[K
remote: Counting objects: 100% (75/75), done.[K
remote: Compressing objects: 100% (73/73), done.[K
remote: Total 75 (delta 33), reused 0 (delta 0), pack-reused 0[K
Receiving objects: 100% (75/75), 31.40 KiB | 1.01 MiB/s, done.
Resolving deltas: 100% (33/33), done.


In [None]:
# rename cloned github folder for easier access to its functions

import os
source = 'Pediatric-Apnea-Detection/'
dest = 'PediatricApneaDetection/'
os.rename(source, dest)

OSError: [Errno 39] Directory not empty: 'Pediatric-Apnea-Detection/' -> 'PediatricApneaDetection/'

In [None]:
# import packages needed for data preprocessing, loading, and model training/testing

import numpy as np
from google.colab import drive
import os

import tensorflow as tf
import tensorflow_addons as tfa

import keras
from keras import Model
from keras.callbacks import LearningRateScheduler, EarlyStopping
from keras.activations import sigmoid, relu
from keras.layers import Dense, Input, Conv1D, SeparableConvolution1D, concatenate, Layer, MultiHeadAttention, Add, LayerNormalization, Dropout, GlobalAveragePooling1D
from keras.regularizers import L2
from keras.losses import BinaryCrossentropy

from sklearn.utils import shuffle

In [None]:
#check if gpu available.
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 10660083710439385681
xla_global_id: -1
]


##  Data
Data includes raw data (MIMIC III tables), descriptive statistics (our homework questions), and data processing (feature engineering).
  * Source of the data: where the data is collected from; if data is synthetic or self-generated, explain how. If possible, please provide a link to the raw datasets.
  * Statistics: include basic descriptive statistics of the dataset like size, cross validation split, label distribution, etc.
  * Data process: how do you munipulate the data, e.g., change the class labels, split the dataset to train/valid/test, refining the dataset.
  * Illustration: printing results, plotting figures for illustration.
  * You can upload your raw dataset to Google Drive and mount this Colab to the same directory. If your raw dataset is too large, you can upload the processed dataset and have a code to load the processed dataset.

---

Data Download Instructions:
The data is collected from the National Sleep Research Resource, an NHLBI-supported repository responsible for sharing large amounts of sleep data from various cohorts, clinical trials, and other data sources to advance sleep and circadian science. The two datasets we used can be found at these links: https://sleepdata.org/datasets/chat and https://sleepdata.org/datasets/nchsdb. Please note that we each individually needed to complete a survey intake form and get it approved by the organization in order to get access to this data.  

The original authors utilized two different datasets, one from a Childhood Adenotonsillectomy Trial (CHAT)
And the other from the NCH Sleep Data Bank (NCHSDB).
The CHAT dataset is roughly 969 GB in size and was collected from 1,243 subjects ages 5-9 over a period of 5 years (2007-2012). The NCHSDB dataset is roughly 2.07 TB in size and was collected from 3,673 subjects ages 0-58 over a period of 2 years (2017-2019).
Both datasets were pre-processed to only include necessary attributes before being loaded into the model for training and testing.

Due to Colab only allowing for 70GB of hard drive space, we limited our report to only compare the CHAT dataset to the original paper's results.

The raw data has been uploaded to a Box account for storage since the University of Illinois provides us with unlimited Box storage. Because of the limited space in Colab, we repeatedly downloaded a subset of the data, pre-processed it, uploaded the processed data, and cleared the workspace. This was beneficial because the processed dataset was about 10% the size of the raw. So to maximize how much data was used in training/testing, we prepared approximately 73 GB of processed data (~200 files, ~700GB raw data).

When it came to training the model, we realized the free version of Colab only offered 12GB of RAM, which would not be enough for the size of our data. We upgraded to the premium version of Colab that allowed for ~50GB of RAM. To make sure our data was able to loaded into memory, we only used 44GB of processed data when training and testing our models. As shown below, this was the right decision because training the 6-signal model used 41GB/51Gb of available RAM. Using more data would lead to the environment crashing.

<div>
<img src="https://drive.google.com/uc?export=view&id=1mFozLUT2fKusICCougiB1knnS6FMT5Ud"/>
</div>

Two file types are used for the CHAT data construction, a \*.edf file containing time-series for multiple signals and a \*-nsrr.xml that contains annotations of the dataset. These annotations described events that happened during the study such as obstructive apnea, central apnea, hypopnea, SpO2 desaturation, EtCO2 artifact, limb movements, etc. These events include event type/concept (description), start time (onset), and duration.

For this model, we only consider apnea (Obstructive and Central grouped together) and hypopnea events.

To run this notebook, the CHAT data from the sleep data site will need to be uploaded to the /chatdata/ path. Only the \*.edf and \*-nsrr.xml files are needed from the chat/polysomnography/edfs and chat/polysomnography/annoations-events-nsrr subdirectories in https://sleepdata.org/datasets/chat/ .

In [None]:
#Commented to allow rest of the nodebook to run. (No Data processing/loading/training for Submission)
# We have the data loaded and saved, so we can avoid using compute time to download processed data and create the datasets.
# Commented below. Instead we will just download "chatloader" to download the usable dataset for moedl training.

def downloadFromBox(filepath, file_content):
  with open(filepath, "wb") as binary_file:
    binary_file.write(file_content)

fields = [
    'type',
    'id',
    'name',
]

#First Download processed data from box. Store filenames in list so we don't reupload same file.
chat_out = '/content/chatprocessed/'
os.makedirs(os.path.dirname(chat_out), exist_ok=True)
# folder_chatprocessed = client.folder(folder_id='261105873173').get_items(fields=fields)
#Only download 119 processed files because that is all that fits in Colab and enough room for data loading + model.
# i = 0
# for item in folder_chatprocessed:
#   if(i>119):
#     break
#   print(f'download "{item.name}"')
#   file_content = client.file(item.id).content()
#   downloadFromBox(chat_out + item.name, file_content)
#   i+=1

# processed_data = os.listdir(chat_out)

#get files already pre-processed so we don't do it again
# processed_data = []
# for item in folder_chatprocessed:
#   print(f' "{item.name}"')
#   processed_data.append(item.name)
# print(processed_data)
# print("total processed: " + str(len(processed_data)))

#Second, download raw + annot data from BOX that is not already downloaded, not in badFiles list and is not in in processed (So we don't have to re-process data)
chat_data = '/chatdata/'
os.makedirs(os.path.dirname(chat_data), exist_ok=True)
curr_downloaded = os.listdir(chat_data)

#empirically known bad files (such as missing signals)
badFiles = ["chat-baseline-300567.edf","chat-baseline-300554.edf","chat-baseline-300452.edf","chat-baseline-300013.edf","chat-baseline-300108.edf","chat-baseline-300051.edf","chat-baseline-300078.edf","chat-baseline-300037.edf","chat-baseline-300195.edf","chat-baseline-300206.edf","chat-baseline-300310.edf","chat-baseline-300203.edf","chat-baseline-300277.edf","chat-baseline-300260.edf","chat-baseline-300189.edf","chat-baseline-300379.edf"]

#Do not need any more raw data - all data we can fit in instance is already processed.
#Commented out below.

# folder_raw = client.folder(folder_id='257515840362').get_items(fields=fields)
# folder_annot = client.folder(folder_id='257513450272').get_items(fields=fields)
filenames = curr_downloaded

# i = 0
# for item in folder_raw:
#   if i>=60:
#     break
#   if not any(processed.startswith(item.name.split('.')[0]) for processed in processed_data) and item.name not in curr_downloaded and item.name not in badFiles:
#     print(f'download "{item.name}"')
#     file_content = client.file(item.id).content()
#     downloadFromBox(chat_data + item.name, file_content)
#     filenames.append(item.name)
#     i = i + 1
# print(filenames)
# for item in folder_annot:
#   if (item.name.split('-nsrr')[0] + '.edf') in filenames and item.name not in curr_downloaded:
#     print(f'download "{item.name}"')
#     file_content = client.file(item.id).content()
#     downloadFromBox(chat_data + item.name, file_content)

### Data Descriptions

Below is a screeenshot showing the data columns from one of the CHAT dataset's annotation -nssr.xml file. As you can see, there is a signal value recorded for a given start time and duration.


<div>
<img src="https://drive.google.com/uc?export=view&id=1bjYVAm0lduaG1F227PZHmPt0uH3x9HNZ" width="400" height="350"/>
</div>


Additionally, the below code prints out snippet of what the raw .edf file data looks like.




In [None]:
# commenting out so it doesnt throw error "file does not exist"
# --------------------
# import mne
# file = "/chatdata/chat-baseline-300008.edf"
# data = mne.io.read_raw_edf(file)
# raw_data = data.get_data()
# # you can get the metadata included in the file and a list of all channels:
# info = data.info
# channels = data.ch_names

Extracting EDF parameters from /chatdata/chat-baseline-300008.edf...
EDF file detected
Setting channel info structure...
Creating raw.info structure...


####Raw data signals

In [None]:
# raw_data # this is raw signal data

array([[-4.50505837e-05, -4.86258624e-05, -4.24165772e-05, ...,
        -5.09971622e-04, -5.09976474e-04, -5.09981432e-04],
       [-5.30242466e-05, -6.03679164e-05, -6.27936028e-05, ...,
        -5.11971445e-04, -5.11976290e-04, -5.11981293e-04],
       [-4.56335088e-05, -5.31779551e-05, -5.48853706e-05, ...,
        -5.11971510e-04, -5.11976382e-04, -5.11981359e-04],
       ...,
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [ 4.00000000e+01,  4.00000000e+01,  4.00000000e+01, ...,
         4.00000000e+01,  4.00000000e+01,  4.00000000e+01]])

####Raw data shape -- 39 channels and each have 33256448 values, at least for this data file sample

In [None]:
# raw_data.shape

(39, 33256448)

In [None]:
# info # metadata info

0,1
Measurement date,"January 01, 1985 21:03:39 GMT"
Experimenter,Unknown
Digitized points,0 points
Good channels,39 EEG
Bad channels,
EOG channels,Not available
ECG channels,Not available
Sampling frequency,1024.00 Hz
Highpass,0.00 Hz
Lowpass,512.00 Hz


###All "channels" in the given raw data file

In [None]:
# print(channels)

['Cchin', 'F3', 'F4', 'C3', 'C4', 'O1', 'O2', 'T3', 'T4', 'M1', 'M2', 'E1', 'E2', 'ECG1', 'ECG2', 'ECG3', 'Lleg1', 'Lleg2', 'Rleg1', 'Rleg2', 'LChin', 'RChin', 'Airflow', 'ABD', 'Chest', 'Snore', 'Sum', 'Position', 'OxSTAT', 'Pulse', 'SAO2', 'CannulaFlow', 'Cap', 'C-Pres', 'EtCO2', 'Pleth', 'Light', 'ManPos', 'DHR']


## Annotation file conversion + Pre-processing
Annotation File is required to be in tsv format in the paper's provided code.
Since the Sleep data site only had this in XML, we had to convert it and change column names as shown below.

The Pre-processing code does the following:


*   Loads study using raw \*.edf and annotations \*-nsrr.tsv file using mne library. Annoations file is read in as a dataframe. (pandas read_cdv)
*   Checks if required channels are avaiable in the study. If not, it is discarded.
*   Finds event ids of apnea and hypopnea events in annoations
*   Select specific channels from raw file.
*   Signals are divided into equal length epochs (authors chose **30 EPOCH_LENGTH**),
*   Epochs are resampled to a **frequency of 128**.
*   For each Epoch, the intersection between the apnea events and hypopnea events are found in seconds, and appended to a labels_apnea and labels_hypopnea array. Essentially, these labels contains seconds of apnea and hypopnea, respectively, for each epoch.
*   The numpy array containing data, labels_apnea, and labels_hypopnea are saved.




In [None]:
import pandas as pd
import io
import xml.etree.ElementTree as ET
import csv

def convert_xml_to_tsv(xml_file):

  # parse the xml file
  tree = ET.parse(xml_file)
  root = tree.getroot()

  # grab the relevant fields from the xml - Start (onset) , Duration (Duration), EventType (Description)
  fields = ['onset', 'duration', 'description']

  # create new tsv file
  tsv_file_name = xml_file.replace(".xml", ".tsv")

  # create csv writer object
  csv_writer = csv.writer(open(tsv_file_name, 'w'), delimiter='\t')

  # write the header row
  csv_writer.writerow(fields)

  # iterate over the xml elements and extract the data we want
  for element in root:

      if element.tag == "ScoredEvents":

        # this is all events
        for event in element:

          # this is single event
          for attr in event:

            if attr.tag == "Start":
              onset = attr.text

            if attr.tag == "Duration":
              duration = attr.text

            if attr.tag == "EventConcept":
              description = attr.text.split("|")[0]

          # for field in fields
          row = [onset, duration, description]

          # write row to csv file
          csv_writer.writerow(row)

In [None]:
#Commented to allow rest of notebook to run. (No Data processing/loading/training for Submission)
#No more files need to be processed: commenting this out.
# import glob
# import mne
# from PediatricApneaDetection.data.chat import preprocessing

# root = "/chatdata/"
# OUT_FOLDER = ''

# for edf_file in glob.glob(root + "*.edf"):
#     filename = edf_file.replace('/chatdata/','')
#     if not any(processed.startswith(filename.split('.')[0]) for processed in processed_data):
#         print("preprocessing " + edf_file)

#         annot_file = edf_file.replace(".edf", "-nsrr.xml")
#         convert_xml_to_tsv(annot_file)
#         annot_file_tsv = edf_file.replace(".edf", "-nsrr.tsv")

#         # preprocess data
#         shape = preprocessing.preprocess((edf_file, annot_file_tsv), preprocessing.identity, OUT_FOLDER)

#         print(f"final preprocessing shape: {shape}")

preprocessing 10 files took ~15min

preprocessing 30 files took ~37 min

preprocessing 50 files took ~57min

preprocessing 60 files took 1hr+

In [None]:
# Commented to allow rest of notebook to run. (No Data processing/loading/training for Submission)
# moves npz files to processed folder

# root = '/content/'
# chat_out = '/content/chatprocessed/'
# os.makedirs(os.path.dirname(chat_out), exist_ok=True)

# for npz_file in glob.glob(root + "*.npz"):

#   print(npz_file)

#   dest = chat_out + npz_file.replace("/content/","").replace("\\","")

#   print(dest)

#   os.rename(npz_file, dest)


/content/\chat-baseline-300575.edf_221_164.npz
/content/chatprocessed/chat-baseline-300575.edf_221_164.npz
/content/\chat-baseline-300555.edf_1125_1336.npz
/content/chatprocessed/chat-baseline-300555.edf_1125_1336.npz
/content/\chat-baseline-300585.edf_320_899.npz
/content/chatprocessed/chat-baseline-300585.edf_320_899.npz
/content/\chat-baseline-300571.edf_500_885.npz
/content/chatprocessed/chat-baseline-300571.edf_500_885.npz
/content/\chat-baseline-300496.edf_931_791.npz
/content/chatprocessed/chat-baseline-300496.edf_931_791.npz
/content/\chat-baseline-300594.edf_74_30.npz
/content/chatprocessed/chat-baseline-300594.edf_74_30.npz
/content/\chat-baseline-300563.edf_31_184.npz
/content/chatprocessed/chat-baseline-300563.edf_31_184.npz
/content/\chat-baseline-300566.edf_91_146.npz
/content/chatprocessed/chat-baseline-300566.edf_91_146.npz
/content/\chat-baseline-300579.edf_134_87.npz
/content/chatprocessed/chat-baseline-300579.edf_134_87.npz
/content/\chat-baseline-300573.edf_18_178.n

In [None]:
#upload pre-processed to box for reusability.

# for npz_file in glob.glob(chat_out + "*.npz"):
#   #if not already uploaded
#   filename = npz_file.replace('/content/chatprocessed/','')
#   print(filename.split('.')[0])
#   if not any(processed.startswith(filename.split('.')[0]) for processed in processed_data):
#     client.folder(folder_id='261105873173').upload(npz_file)
#     processed_data.append(filename)
#     print("uploaded " + npz_file)

chat-baseline-300573
uploaded /content/chatprocessed/chat-baseline-300573.edf_18_178.npz
chat-baseline-300563
uploaded /content/chatprocessed/chat-baseline-300563.edf_31_184.npz
chat-baseline-300575
uploaded /content/chatprocessed/chat-baseline-300575.edf_221_164.npz
chat-baseline-300555
uploaded /content/chatprocessed/chat-baseline-300555.edf_1125_1336.npz
chat-baseline-300561
uploaded /content/chatprocessed/chat-baseline-300561.edf_52_236.npz
chat-baseline-300566
uploaded /content/chatprocessed/chat-baseline-300566.edf_91_146.npz
chat-baseline-300579
uploaded /content/chatprocessed/chat-baseline-300579.edf_134_87.npz
chat-baseline-300496
uploaded /content/chatprocessed/chat-baseline-300496.edf_931_791.npz
chat-baseline-300585
uploaded /content/chatprocessed/chat-baseline-300585.edf_320_899.npz
chat-baseline-300571
uploaded /content/chatprocessed/chat-baseline-300571.edf_500_885.npz
chat-baseline-300560
uploaded /content/chatprocessed/chat-baseline-300560.edf_1085_819.npz
chat-baselin

## CHAT Data Loading
Chat Data Loading is done as dollowed:

*   Path is provided containing processed \*.npz files (from previous steps)
*   Divide studies into folds (5 folds used here). For example, if 10 processed files are in the folder, each fold will have data from 2 processed files.
*   For each study in each fold, the signals, apnea labels, and hypopnea labels are loaded.
*   Then, the apnea labels and hypopnea labels are combined to y_c.
*   To reduce the size of the data and improve model training performance, negative sampling is conducted. This is done by getting indexes for where y_c == 0 (negative samples) and where y_c>0 (positive samples). Then, a ratio between number of positive_samples and negative_samples is used to determine how many negative samples should be kept. The index of the kept negative_samples is stored in negative_survived as shown below. Only the indexes in negative_survived and positive_samples are kept in the data. This was done to ensure data quantity similarity between the positive and negative classes since there were more records associated with the negative class (no hypopnea).
*   Extract_rri is used to ensure ECG signal has equal length data points to other signals (EPOCH_LENGTH * FREQ). This is 30\*128=3840, which can be found in the model input size shown later.



In [None]:
# due to hardcoded values, we had to copy the dataloader code and change it slightly to be able to run it
# from - https://github.com/healthylaife/Pediatric-Apnea-Detection/blob/main/data/chat/dataloader.py

import glob
import os
import random
import numpy as np
import pandas as pd
from scipy.signal import resample
from biosppy.signals.ecg import hamilton_segmenter, correct_rpeaks
from biosppy.signals import tools as st
from scipy.interpolate import splev, splrep

from PediatricApneaDetection.data.chat import dataloader

SIGS = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
s_count = len(SIGS)

FREQ = 128
EPOCH_LENGTH = 30
ECG_SIG = 8

def load_data(path):
    # demo = pd.read_csv("../misc/result.csv")
    # ahi = pd.read_csv(r"C:\Data\AHI.csv")
    # ahi_dict = dict(zip(ahi.Study, ahi.AHI))
    root_dir = os.path.expanduser(path)
    file_list = os.listdir(root_dir)
    print(file_list)
    length = len(file_list)

    ################################### Fold the data based on number of respiratory events #########################
    study_event_counts = [i for i in range(0, length)]
    folds = []
    for i in range(5):
        folds.append(study_event_counts[i::5])

    x = []
    y_apnea = []
    y_hypopnea = []
    counter = 0
    for idx, fold in enumerate(folds):
        first = True
        for patient in fold:
            rri_succ_counter = 0
            rri_fail_counter = 0
            counter += 1
            print(counter)
            # for study in glob.glob(PATH + patient[0] + "_*"):
            study_data = np.load(path + file_list[patient - 1])
            signals = study_data['data']
            labels_apnea = study_data['labels_apnea']
            labels_hypopnea = study_data['labels_hypopnea']

            # identifier = study.split('\\')[-1].split('_')[0] + "_" + study.split('\\')[-1].split('_')[1]
            # demo_arr = demo[demo['id'] == identifier].drop(columns=['id']).to_numpy().squeeze()

            y_c = labels_apnea + labels_hypopnea
            neg_samples = np.where(y_c == 0)[0]
            pos_samples = list(np.where(y_c > 0)[0])
            ratio = len(pos_samples) / len(neg_samples)
            neg_survived = []
            for s in range(len(neg_samples)):
                if random.random() < ratio:
                    neg_survived.append(neg_samples[s])
            samples = neg_survived + pos_samples
            signals = signals[samples, :, :]
            labels_apnea = labels_apnea[samples]
            labels_hypopnea = labels_hypopnea[samples]

            data = np.zeros((signals.shape[0], EPOCH_LENGTH * FREQ, s_count + 2))
            for i in range(signals.shape[0]):  # for each epoch
                # data[i, :len(demo_arr), -3] = demo_arr
                data[i, :, -1], data[i, :, -2], status = dataloader.extract_rri(signals[i, ECG_SIG, :], FREQ,
                                                                     float(EPOCH_LENGTH))

                if status:
                    rri_succ_counter += 1
                else:
                    rri_fail_counter += 1

                for j in range(s_count):  # for each signal
                    data[i, :, j] = signals[i, SIGS[j], :]

            if first:
                aggregated_data = data
                aggregated_label_apnea = labels_apnea
                aggregated_label_hypopnea = labels_hypopnea
                first = False
            else:
                aggregated_data = np.concatenate((aggregated_data, data), axis=0)
                aggregated_label_apnea = np.concatenate((aggregated_label_apnea, labels_apnea), axis=0)
                aggregated_label_hypopnea = np.concatenate((aggregated_label_hypopnea, labels_hypopnea), axis=0)
            print(rri_succ_counter, rri_fail_counter)

        x.append(aggregated_data)
        y_apnea.append(aggregated_label_apnea)
        y_hypopnea.append(aggregated_label_hypopnea)

    return x, y_apnea, y_hypopnea


In [None]:
# remove file to avoid issues in dataload
%rmdir /content/chatprocessed/.ipynb_checkpoints

rmdir: failed to remove '/content/chatprocessed/.ipynb_checkpoints': No such file or directory


In [None]:
#Commented to allow rest of notebook to run (No Data processing/loading/training for Submission)

# PATH = chat_out
# OUT_PATH = '/content/chatloader/'
# os.makedirs(os.path.dirname(OUT_PATH), exist_ok=True)

# load data
# x, y_apnea, y_hypopnea = load_data(PATH)
# # save data into .npz file
# for i in range(5):
#       print(x[i].shape, y_apnea[i].shape, y_hypopnea[i].shape)
#       np.savez_compressed(OUT_PATH + "chat_" + str(i), x=x[i], y_apnea=y_apnea[i], y_hypopnea=y_hypopnea[i])

# np.savez_compressed(OUT_PATH + "chat_1", x=x, y_apnea=y_apnea, y_hypopnea=y_hypopnea) #doesn't work because of mismatching shapes after first dimension



['chat-baseline-300315.edf_18_968.npz', 'chat-baseline-300295.edf_91_210.npz', 'chat-baseline-300242.edf_737_1500.npz', 'chat-baseline-300008.edf_186_103.npz', 'chat-baseline-300264.edf_132_275.npz', 'chat-baseline-300352.edf_52_167.npz', 'chat-baseline-300282.edf_99_661.npz', 'chat-baseline-300312.edf_113_83.npz', 'chat-baseline-300343.edf_210_160.npz', 'chat-baseline-300271.edf_220_815.npz', 'chat-baseline-300215.edf_1022_2853.npz', 'chat-baseline-300026.edf_305_783.npz', 'chat-baseline-300176.edf_170_885.npz', 'chat-baseline-300058.edf_23_180.npz', 'chat-baseline-300041.edf_300_264.npz', 'chat-baseline-300133.edf_89_256.npz', 'chat-baseline-300186.edf_310_355.npz', 'chat-baseline-300066.edf_169_135.npz', 'chat-baseline-300038.edf_77_305.npz', 'chat-baseline-300036.edf_486_607.npz', 'chat-baseline-300349.edf_172_429.npz', 'chat-baseline-300072.edf_240_345.npz', 'chat-baseline-300224.edf_639_1994.npz', 'chat-baseline-300019.edf_196_337.npz', 'chat-baseline-300052.edf_189_360.npz', 'ch

In [None]:
# !zip -r chatprocessed.zip chatprocessed/
# %rm -r chatprocessed/
# !zip -r chatloader.zip chatloader/

  adding: chatloader/ (stored 0%)
  adding: chatloader/chat_4.npz (deflated 0%)
  adding: chatloader/chat_3.npz (deflated 0%)
  adding: chatloader/chat_0.npz (deflated 0%)
  adding: chatloader/chat_2.npz (deflated 0%)
  adding: chatloader/chat_1.npz (deflated 0%)


In [None]:
# client.folder(folder_id='261223530618').upload("chatloader.zip")

<Box File - 1516312798979 (chatloader.zip)>

### Preprocessing **command** to download data from Box to local Colab Drive storage to get the data for further use in the notebook.

In [None]:
#Download pre-made chatloader to avoid recreating it.
# file_content = client.file("1516312798979").content()
# downloadFromBox("chatloader.zip", file_content)

#This is a subset of the data used so that graders can run the model tests. Also included pre-trained models. Dataset is significantly smaller than what we used to test.
!gdown --fuzzy https://drive.google.com/file/d/1CAtL7c4q1VpSUMeadIwyJIuL2_6aSiz9/view?usp=sharing
!unzip chatloader.zip
!gdown --fuzzy https://drive.google.com/file/d/196ZMPkv8Q0RKy4N0KwSB1i0WGoHSX6_0/view?usp=sharing
!unzip model_final.zip

Downloading...
From (original): https://drive.google.com/uc?id=1CAtL7c4q1VpSUMeadIwyJIuL2_6aSiz9
From (redirected): https://drive.google.com/uc?id=1CAtL7c4q1VpSUMeadIwyJIuL2_6aSiz9&confirm=t&uuid=53498735-794f-429b-820a-d9bb913c14ba
To: /content/chatloader.zip
100% 457M/457M [00:12<00:00, 35.6MB/s]
Archive:  chatloader.zip
   creating: chatloader/
  inflating: chatloader/chat_2.npz   
  inflating: chatloader/chat_1.npz   
  inflating: chatloader/chat_4.npz   
  inflating: chatloader/chat_3.npz   
  inflating: chatloader/chat_0.npz   
Downloading...
From (original): https://drive.google.com/uc?id=196ZMPkv8Q0RKy4N0KwSB1i0WGoHSX6_0
From (redirected): https://drive.google.com/uc?id=196ZMPkv8Q0RKy4N0KwSB1i0WGoHSX6_0&confirm=t&uuid=6bf30f08-4976-40fb-b358-29c71999e593
To: /content/model_final.zip
100% 112M/112M [00:03<00:00, 34.3MB/s]
Archive:  model_final.zip
   creating: model/
   creating: model/model_5dropout_ECGSPO21/
   creating: model/model_5dropout_ECGSPO21/assets/
 extracting: model

##   Model

For reference, the model is described in the author's original paper -- It is published by National Institutes of Health. The citation is provided again below:

Fayyaz H, Strang A, Beheshti R. Bringing At-home Pediatric Sleep Apnea Testing Closer to Reality: A Multi-modal Transformer Approach. Proc Mach Learn Res. 2023 Aug;219:167-185. PMID: 38344396; PMCID: PMC10854997.
https://github.com/healthylaife/Pediatric-Apnea-Detection/tree/main

---

The model includes the model definition which usually is a class, model training, and other necessary parts.
  * Model architecture: layer number/size/type, activation function, etc
  * Training objectives: loss function, optimizer, weight of each loss term, etc
  * Others: whether the model is pretrained, Monte Carlo simulation for uncertainty analysis, etc
  * The code of model should have classes of the model, functions of model training, model validation, etc.
  * If your model training is done outside of this notebook, please upload the trained model here and develop a function to load and test it.

---

Similar to what’s documented in the original paper, we will be implementing a model consisting of four components: segmentor, tokenizer, transformer, and multi-layer perceptron.

The segmentor will divide signals into equal-length epochs and forward them to the tokenizer. (This is done through pre-processing/dataloading steps shown previously.)

The tokenizer will construct tokenized representations of the segmentor’s output. Once these tokens have been generated, they will be passed to the transformer.
*   The tokenizer will handle regular and irregular time series data as well as data in tabular format. For consistency, data from all three formats will be resampled using a desired frequency (shown in pre-processing/dataloading).
*   Tokenizing can be seen in the model code below, between Input(...) and before looping through transformer layers. Note that the Input shape is (Freq*Epoch_length, Num_signals). This input shapes needs to match what was created in preprocessing + dataloading steps.




The transformer will be constructed using five encoder modules. Each encoder module will consist of multi-head attention and a position-wise feed-forward network, supplemented by residual and normalization layers. The inspiration for each encoder module came from already established transformer architecture (Vaswani et al. 2017a); note that the decoder component from this architecture is not used as it is typically used for generative tasks and thus is not needed for our model. The multi-head attention will consist of concatenated attention heads and a final fully connected layer to facilitate the model’s ability to focus on information across various representation sub-spaces. In terms of the position-wise feed- forward network, it will be comprised of one fully connected layer followed by a ReLU activation unit and then another fully connected layer. Output from the transformer unit will be forwarded to the multi-layer perceptron for analysis and prediction.

The multi-layer perceptron will be a two-layer fully connected network for forecasting the likelihood of an apnea-hypopnea event happening within a given epoch. The initial and subsequent layers of this network will consist of 256 and 128 neurons, respectively. Our model will use binary cross-entropy to determine loss.

In [None]:
# this function is provided in the paper's github repo but we included it here for
# clarity/describing key components
# from create_transformer_model - https://github.com/healthylaife/Pediatric-Apnea-Detection/blob/main/models/transformer.py

class Patches(Layer):
    def __init__(self, patch_size):
        super(Patches, self).__init__()
        self.patch_size = patch_size

    def call(self, input):
        input = input[:, tf.newaxis, :, :]
        batch_size = tf.shape(input)[0]
        patches = tf.image.extract_patches(
            images=input,
            sizes=[1, 1, self.patch_size, 1],
            strides=[1, 1, self.patch_size, 1],
            rates=[1, 1, 1, 1],
            padding="VALID",
        )
        patch_dims = patches.shape[-1]
        patches = tf.reshape(patches,
                             [batch_size, -1, patch_dims])
        return patches

class PatchEncoder(Layer):
    def __init__(self, num_patches, projection_dim, l2_weight):
        super(PatchEncoder, self).__init__()
        self.projection_dim = projection_dim
        self.l2_weight = l2_weight
        self.num_patches = num_patches
        self.projection = Dense(units=projection_dim, kernel_regularizer=L2(l2_weight),
                                bias_regularizer=L2(l2_weight))
        self.position_embedding = tf.keras.layers.Embedding(
            input_dim=num_patches, output_dim=projection_dim)

    def call(self, patch):
        positions = tf.range(start=0, limit=self.num_patches, delta=1)
        encoded = self.projection(patch)# + self.position_embedding(positions)
        return encoded

def mlp(x, hidden_units, dropout_rate, l2_weight):
    for _, units in enumerate(hidden_units):
        x = Dense(units, activation=None, kernel_regularizer=L2(l2_weight), bias_regularizer=L2(l2_weight))(x)
        x = tf.nn.gelu(x)
        x = Dropout(dropout_rate)(x)
    return x

def create_transformer_model(input_shape, num_patches,
                             projection_dim, transformer_layers,
                             num_heads, transformer_units, mlp_head_units,
                             num_classes, drop_out, reg, l2_weight, demographic=False):
    #not sure if we need to bother with regresion.
    if reg:
        activation = None
    else:
        activation = 'sigmoid'
    inputs = Input(shape=input_shape)
    patch_size = input_shape[0] / num_patches
    normalized_inputs = tfa.layers.InstanceNormalization(axis=-1, epsilon=1e-6, center=False, scale=False,
                                                             beta_initializer="glorot_uniform",
                                                             gamma_initializer="glorot_uniform")(inputs)
    patches = Patches(patch_size=patch_size)(normalized_inputs)
    encoded_patches = PatchEncoder(num_patches=num_patches, projection_dim=projection_dim, l2_weight=l2_weight)(patches)
    for i in range(transformer_layers):
        x1 = encoded_patches # LayerNormalization(epsilon=1e-6)(encoded_patches) # TODO
        attention_output = MultiHeadAttention(
            num_heads=num_heads, key_dim=projection_dim, dropout=drop_out, kernel_regularizer=L2(l2_weight),  # i *
            bias_regularizer=L2(l2_weight))(x1, x1)
        x2 = Add()([attention_output, encoded_patches])
        x3 = LayerNormalization(epsilon=1e-6)(x2)
        x3 = mlp(x3, transformer_units, drop_out, l2_weight)  # i *
        encoded_patches = Add()([x3, x2])
    x = LayerNormalization(epsilon=1e-6)(encoded_patches)
    x = GlobalAveragePooling1D()(x)
    #x = Concatenate()([x, demo])
    features = mlp(x, mlp_head_units, 0.0, l2_weight)

    logits = Dense(num_classes, kernel_regularizer=L2(l2_weight), bias_regularizer=L2(l2_weight),
                   activation=activation)(features)

    return tf.keras.Model(inputs=inputs, outputs=logits)

In [None]:
# model + training code is similar to https://github.com/healthylaife/Pediatric-Apnea-Detection/blob/main/train.py with slight adjustments and documentation

# Commented to let rest of model to run. (No Data processing/loading/training for Submission). We will be loading model from drive.
# model = create_model((128 * 30, 3)) #draft mistake
# model = create_transformer_model(input_shape, num_patches,
#                              projection_dim, transformer_layers,
#                              num_heads, transformer_units, mlp_head_units,
#                              num_classes, drop_out, reg, l2_weight, demographic=False):

#Our first model we are only using 3 channels.
input_shape = (128 * 30, 3)
num_patches = 30

transformer_layers = 5
num_heads = 4
transformer_units = 32
reg = False
drop_out = 0.25
l2_weight = 0.001

model = create_transformer_model(input_shape, num_patches,
                                projection_dim=transformer_units, transformer_layers=transformer_layers,
                                num_heads=num_heads,
                                transformer_units = [transformer_units*2, transformer_units],
                                mlp_head_units=[256, 128],
                                num_classes=1, drop_out=drop_out, reg=reg, l2_weight=l2_weight, demographic=False)
loss_func = BinaryCrossentropy()
optimizer = "adam"

#Prevent over-fitting on same fold.
def lr_schedule(epoch, lr):

    if epoch > 50 and (epoch - 1) % 5 == 0:
        lr *= 0.5

    return lr

#Training

The training procedure contains the following steps:
*   Load data from each fold and append to list.
*   For each fold, set y = 1 for any seconds of apnea/hypopnea events and adjust x to only contain required signals (currently only ["ECG", "SPO2"])
*   For each fold, we generate x_train and y_train based on data from all other folds. Then, the model is trainined on this set for 100 epochs. The epochs can be stopped early from early_stopper if loss isn't improving. lr_scheduler is used to reduce learning rate after 50 epochs to avoid over-fitting in the fold.
*   A model for each fold is created and saved.

We will start by training and testing a model using 2 signals only (3 channels): ECG, and SPO2.

####Some Hyperparameters
- Number of transformer layers = 5
- Dropout = 0.25
- Cross validation fold size = 5

####Computation requirements
The training and testing code was run on the T4 GPU provided by the Colab environment.

Each epoch's average runtime was just a bit over 1 second and the total number of training epochs was 100.

Total number of trials -- in the draft, we used a smaller datasize, but in the final we performed training and testing on more data. We trained the model a few times each time increasing the number of input files, making sure the GPU was able to handle it.

The training code is found right in the cell below.


In [None]:
# training function for model

def train(config, fold):
  FOLD = fold
  x = []
  y = []
  for i in range(FOLD):
    data = np.load(config["data_path"] + str(i) + ".npz", allow_pickle=True)
    x.append(data['x'])
    y.append(data['y_apnea'] + data['y_hypopnea'])

  #x for specific channels
  # print(x.shape)
  x_chan = []
  #  np.zeros( (x.shape[0],x.shape[1],x.shape[2], len(config["channels"])))

  print(len(x))
  for i in range(FOLD):
    x[i], y[i] = shuffle(x[i], y[i])
    x[i] = np.nan_to_num(x[i], nan=-1)
    y[i] = np.where(y[i] >= 1, 1, 0)
    print(x[i].shape)
    #Select specific channels from data.
    x_chan.append(x[i][:, :, config["channels"]])
    print(x_chan[i].shape)

  print("training")
  for fold in range(FOLD):
    x_train, y_train = None, None
    for i in range(FOLD):
      if i != fold:
        if isinstance(x_train, np.ndarray):
          # x_train = x[i]
          # y_train = y[i]
          x_train = np.concatenate((x_train, x_chan[i]))
          y_train = np.concatenate((y_train, y[i]))
        else:
          # x_train = np.concatenate((x_train, x[i]))
          # y_train = np.concatenate((y_train, y[i]))
          x_train = x_chan[i]
          y_train = y[i]
    print(x_train.shape)
    print(y_train.shape)
    model.compile(optimizer=optimizer, loss=loss_func,metrics=[keras.metrics.Precision(), keras.metrics.Recall()])

    # Early stopping stops training when
    # the training loss is no longer going down by much, so it's not worth it to continue training
    early_stopper = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
    lr_scheduler = LearningRateScheduler(lr_schedule)
    model.fit(x=x_train, y=y_train, batch_size=512, epochs=config["epochs"], validation_split=0.1,
                    callbacks=[early_stopper, lr_scheduler])
    model.save(config["model_path"] + config["model_name"] + str(fold))
    keras.backend.clear_session()

  print("training complete")


###Train Model for ECG and SPO2 (2 Signals)

In [None]:
# from - https://github.com/healthylaife/Pediatric-Apnea-Detection/blob/main/main_chat.py with slight modifications
# Commented to allow for rest of notebook to run. (No Data processing/loading/training for Submission)

data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

# sig_dict_chat = {
#     "EOG": [0, 1],
#     "EEG": [4, 5],
#     "ECG": [15,16],
#     "Resp": [9, 10],
#     "SPO2": [13],
#     "CO2": [14],
# }

# channel_list_chat = [
#     ["ECG", "SPO2"],
#     # ["EOG","EEG","ECG","Resp","SPO2","CO2"]
# ]

# for ch in channel_list_chat:
#     chs = []
#     chstr = ""
#     for name in ch:
#         chstr += name
#         chs = chs + sig_dict_chat[name]
#     print(chstr, chs)
#     config = {
#         "data_path": data_path + 'chat_',
#         "model_path": model_path,
#         "model_name": "model_nochanges_"+ chstr,
#         "regression": False,
#         "epochs": 100,
#         "channels": chs,
#     }
#     train(config, 5)

ECGSPO2 [15, 16, 13]
5
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)
(3506, 3840, 3)
training
(14033, 3840, 3)
(14033,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 



(13473, 3840, 3)
(13473,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100




(12966, 3840, 3)
(12966,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100




(12555, 3840, 3)
(12555,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100




(13001, 3840, 3)
(13001,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100




training complete


###Testing 2 signal model

In [None]:
#Similar to https://github.com/healthylaife/Pediatric-Apnea-Detection/blob/main/test.py with some modications + comments
from PediatricApneaDetection.metrics import Result

def test(config, fold):
  FOLD = fold
  x = []
  y = []
  for i in range(FOLD):
    data = np.load(config["data_path"] + str(i) + ".npz", allow_pickle=True)
    x.append(data['x'])
    y.append(data['y_apnea'] + data['y_hypopnea'])

  #x for specific channels
  x_chan = []

  for i in range(FOLD):
    x[i], y[i] = shuffle(x[i], y[i])
    x[i] = np.nan_to_num(x[i], nan=-1)
    y[i] = np.where(y[i] >= 1, 1, 0)
    print(x[i].shape)
    #Select specific channels from data.
    x_chan.append(x[i][:, :, config["channels"]])
    print(x_chan[i].shape)

  print("test starting")
  result = Result()
  for i in range(FOLD):
    x_test = x_chan[i]
    y_test = y[i]
    model = tf.keras.models.load_model(config["model_path"] + config["model_name"] + str(i), compile=False)

    predict = model.predict(x_test)
    y_score = predict
    y_predict = np.where(predict > 0.5, 1, 0)

    result.add(y_test, y_predict, y_score)

  result.print()
  result.save(config["model_name"] + ".txt", config)

  del data, x_test, y_test, model, predict, y_score, y_predict

In [None]:
#Test chat data

data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

sig_dict_chat = {
    "EOG": [0, 1],
    "EEG": [4, 5],
    "ECG": [15,16],
    "Resp": [9, 10],
    "SPO2": [13],
    "CO2": [14],
}

channel_list_chat = [
    ["ECG", "SPO2"],
]

for ch in channel_list_chat:
    chs = []
    chstr = ""
    for name in ch:
        chstr += name
        chs = chs + sig_dict_chat[name]
    print(chstr, chs)
    config = {
        "data_path": data_path + 'chat_',
        "model_path": model_path,
        "model_name": "model_nochanges_"+ chstr,
        "channels": chs,
    }
    test(config, 5)

ECGSPO2 [15, 16, 13]
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)




(3506, 3840, 3)
test starting
















[83.67016976556184, 85.33289386947924, 79.63852019203614, 85.3997975708502, 83.37136337706788] 
[84.06275805119736, 85.02350570852921, 77.87750791974656, 83.49657198824681, 79.75077881619937] 
[82.83157038242473, 85.08064516129032, 83.00506471581318, 87.66066838046272, 88.83747831116253] 
[84.49799196787149, 85.57567917205692, 76.24716553287982, 83.208769307424, 78.05289814293754] 
[83.44262295081968, 85.05206583809203, 80.35957504767093, 85.52796588913971, 84.04924760601915] 
[91.66795742775822, 91.98799538176911, 87.12880762884782, 91.951460377112, 90.44789585322121] 
[90.50532562601704, 91.08212051423195, 85.78370576284499, 91.15527727700152, 89.01497585364044] 
Accuracy: 83.48 -+ 2.094 
Precision: 82.04 -+ 2.745 
Recall: 85.48 -+ 2.422 
Specifity: 81.52 -+ 3.688 
F1: 83.69 -+ 1.817 
AUROC: 90.64 -+ 1.842 
AUPRC: 89.51 -+ 2.015 
$ 83.5 \pm 2.1$& $82.0 \pm 2.7$& $85.5 \pm 2.4$& $83.7 \pm 1.8$& $90.6 \pm 1.8$& 


#Evaluation

As seen in the cell's output above, a couple of metrics are outputted from the testing code; the evaluation code is adapted from the author's code. Namely, accuracy, precision, recall, specificity, F1, AUROC, and AUPRC are evaluated.

A brief decription is given for each metric below:

- Accuracy - percentage measuring how many classifications did the model get correct.
- Precision - measures the accuracy of positive predictions made by the model. It calculates the ratio of true positive predictions to all positive predictions made by the model.
- Recall - measures the ability of the model to capture all the positive instances in the dataset. It calculates the ratio of true positive predictions to all actual positive instances in the dataset.
- Specificity - measures the accuracy of negative predictions made by the model. It calculates the ratio of true negative predictions to all negative predictions made by the model.
- F1 - harmonic mean of precision and recall to give a single score.
- AUROC - stands for area under the receiver-operating characteristic. The true positive rate (sensitivity) is plotted against the false positive rate (1 - specificity). The higher the value the better.
- AUPRC - stands for area under the precision recall curve. Precision is plotted against recall. The higher the value the better.

**These metrics will similarly be outputted for the experiments below and be compared to the 2 signal model's output above.**

#Results and Experiments Below

###Experiment: Training 6 Signals Model


In [None]:
#10 Channels used for the 6 signals (EOG, EEG, ECG, Resp, SPO2, and CO2).
#Adjust Input Accordingly, (3 -> 10) from previous 2 signal model.
input_shape = (128 * 30, 10)
num_patches = 30

transformer_layers = 5
num_heads = 4
transformer_units = 32
reg = False
drop_out = 0.25
l2_weight = 0.001

model = create_transformer_model(input_shape, num_patches,
                                projection_dim=transformer_units, transformer_layers=transformer_layers,
                                num_heads=num_heads,
                                transformer_units = [transformer_units*2, transformer_units],
                                mlp_head_units=[256, 128],
                                num_classes=1, drop_out=drop_out, reg=reg, l2_weight=l2_weight, demographic=False)

In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

# sig_dict_chat = {
#     "EOG": [0, 1],
#     "EEG": [4, 5],
#     "ECG": [15,16],
#     "Resp": [9, 10],
#     "SPO2": [13],
#     "CO2": [14],
# }

# channel_list_chat = [
#     ["EOG","EEG","ECG","Resp","SPO2","CO2"]
# ]

# for ch in channel_list_chat:
#     chs = []
#     chstr = ""
#     for name in ch:
#         chstr += name
#         chs = chs + sig_dict_chat[name]
#     print(chstr, chs)
#     config = {
#         "data_path": data_path + 'chat_',
#         "model_path": model_path,
#         "model_name": "model_nochanges_"+ chstr,
#         "regression": False,
#         "epochs": 100,
#         "channels": chs,
#     }
#     train(config, 5)

EOGEEGECGRespSPO2CO2 [0, 1, 4, 5, 15, 16, 9, 10, 13, 14]
5
(2474, 3840, 17)
(2474, 3840, 10)
(3034, 3840, 17)
(3034, 3840, 10)
(3541, 3840, 17)
(3541, 3840, 10)
(3952, 3840, 17)
(3952, 3840, 10)
(3506, 3840, 17)
(3506, 3840, 10)
training
(14033, 3840, 10)
(14033,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100




(13473, 3840, 10)
(13473,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100




(12966, 3840, 10)
(12966,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100




(12555, 3840, 10)
(12555,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100




(13001, 3840, 10)
(13001,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100




training complete


In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

sig_dict_chat = {
    "EOG": [0, 1],
    "EEG": [4, 5],
    "ECG": [15,16],
    "Resp": [9, 10],
    "SPO2": [13],
    "CO2": [14],
}

channel_list_chat = [
    ["EOG","EEG","ECG","Resp","SPO2","CO2"]
]

for ch in channel_list_chat:
    chs = []
    chstr = ""
    for name in ch:
        chstr += name
        chs = chs + sig_dict_chat[name]
    print(chstr, chs)
    config = {
        "data_path": data_path + 'chat_',
        "model_path": model_path,
        "model_name": "model_nochanges_"+ chstr,
        "channels": chs,
    }
    test(config, 5)

EOGEEGECGRespSPO2CO2 [0, 1, 4, 5, 15, 16, 9, 10, 13, 14]
(2474, 3840, 17)
(2474, 3840, 10)
(3034, 3840, 17)
(3034, 3840, 10)
(3541, 3840, 17)
(3541, 3840, 10)
(3952, 3840, 17)
(3952, 3840, 10)
(3506, 3840, 17)




(3506, 3840, 10)
test starting
















[79.66855295068714, 84.27818061964403, 84.66534877153347, 85.32388663967612, 84.56930975470623] 
[76.15273775216139, 82.76085547634479, 81.13017154389506, 82.39202657807309, 80.52415210688592] 
[86.00488201790073, 85.81989247311827, 90.48958919527294, 89.25449871465295, 90.63042220936957] 
[73.4136546184739, 82.79430789133248, 78.79818594104309, 81.5146985550573, 78.67191896454699] 
[80.779518532671, 84.26261959749257, 85.55466879489227, 85.68608094768015, 85.27891156462584] 
[87.55719378735446, 91.78481756596975, 91.82828074017077, 91.57224777545942, 91.0882027370491] 
[85.26341794049439, 90.42547665950818, 89.23613177535422, 88.13724284150621, 87.73540429889051] 
Accuracy: 83.70 -+ 2.045 
Precision: 80.59 -+ 2.364 
Recall: 88.44 -+ 2.119 
Specifity: 79.04 -+ 3.227 
F1: 84.31 -+ 1.836 
AUROC: 90.77 -+ 1.626 
AUPRC: 88.16 -+ 1.724 
$ 83.7 \pm 2.0$& $80.6 \pm 2.4$& $88.4 \pm 2.1$& $84.3 \pm 1.8$& $90.8 \pm 1.6$& 


## Discussion of experiment - Comparing the 2 and 6 signal models:

ECG, SPO2 Model:
*   F1: 83.69 -+ 1.817
*   AUROC: 90.64 -+ 1.842

EOG, EEG, ECG, Resp, SPO2, CO2 Model:
*   F1: 84.31 -+ 1.836
*   AUROC: 90.77 -+ 1.626

As shown, the F1 and AUROC values is only 1-2% higher in the 6-signal model. This aligns with the original author's findings as discussed in results section. Since there is not much improvement using all 6 and it being more practical to collect ECG and SPO2 at home versus all signals, we will run hyperparameter and ablation experiments on the 2 signal model.


###Train/Test Hyperparameters

###Experiment - Dropout Test.

Initially was 0.25. Running model with dropout of 0 and then .5

Dropout: 0

In [None]:
input_shape = (128 * 30, 3) #input changed back to 3 channels.
num_patches = 30

transformer_layers = 5
num_heads = 4
transformer_units = 32
reg = False
drop_out = 0 #changed
l2_weight = 0.001

model = create_transformer_model(input_shape, num_patches,
                                projection_dim=transformer_units, transformer_layers=transformer_layers,
                                num_heads=num_heads,
                                transformer_units = [transformer_units*2, transformer_units],
                                mlp_head_units=[256, 128],
                                num_classes=1, drop_out=drop_out, reg=reg, l2_weight=l2_weight, demographic=False)

In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

# sig_dict_chat = {
#     "EOG": [0, 1],
#     "EEG": [4, 5],
#     "ECG": [15,16],
#     "Resp": [9, 10],
#     "SPO2": [13],
#     "CO2": [14],
# }

# channel_list_chat = [
#     ["ECG", "SPO2"],
# ]

# for ch in channel_list_chat:
#     chs = []
#     chstr = ""
#     for name in ch:
#         chstr += name
#         chs = chs + sig_dict_chat[name]
#     print(chstr, chs)
#     config = {
#         "data_path": data_path + 'chat_',
#         "model_path": model_path,
#         "model_name": "model_0dropout_"+ chstr,
#         "regression": False,
#         "epochs": 100,
#         "channels": chs,
#     }
#     train(config, 5)

ECGSPO2 [15, 16, 13]
5
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)
(3506, 3840, 3)
training
(14033, 3840, 3)
(14033,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 



(13473, 3840, 3)
(13473,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100




(12966, 3840, 3)
(12966,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100




(12555, 3840, 3)
(12555,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100




(13001, 3840, 3)
(13001,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100




training complete


In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

sig_dict_chat = {
    "EOG": [0, 1],
    "EEG": [4, 5],
    "ECG": [15,16],
    "Resp": [9, 10],
    "SPO2": [13],
    "CO2": [14],
}

channel_list_chat = [
    ["ECG", "SPO2"],
]

for ch in channel_list_chat:
    chs = []
    chstr = ""
    for name in ch:
        chstr += name
        chs = chs + sig_dict_chat[name]
    print(chstr, chs)
    config = {
        "data_path": data_path + 'chat_',
        "model_path": model_path,
        "model_name": "model_0dropout_"+ chstr,
        "channels": chs,
    }
    test(config, 5)

ECGSPO2 [15, 16, 13]
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)




(3506, 3840, 3)
test starting
















[83.54890864995957, 84.37705998681608, 80.45749788195425, 85.90587044534414, 84.16999429549344] 
[84.13621262458472, 85.15950069348128, 79.15099408919936, 85.62628336755647, 81.3568376068376] 
[82.42473555736372, 82.52688172043011, 82.89251547552054, 85.75835475578405, 88.0855986119144] 
[84.65863453815261, 86.15782664941786, 78.00453514739229, 86.04882909815646, 80.3601575689364] 
[83.27168105219894, 83.82252559726963, 80.97855964815832, 85.69226817364499, 84.58761455151347] 
[91.19622509566337, 91.49261361265285, 87.9512337668138, 91.9877600634284, 91.00261909698274] 
[89.86421604815239, 90.88401382921899, 87.05273866730381, 90.54911904518397, 89.74822417328838] 
Accuracy: 83.69 -+ 1.793 
Precision: 83.09 -+ 2.463 
Recall: 84.34 -+ 2.240 
Specifity: 83.05 -+ 3.284 
F1: 83.67 -+ 1.572 
AUROC: 90.73 -+ 1.427 
AUPRC: 89.62 -+ 1.351 
$ 83.7 \pm 1.8$& $83.1 \pm 2.5$& $84.3 \pm 2.2$& $83.7 \pm 1.6$& $90.7 \pm 1.4$& 


Dropout: 0.5. This will ablate twice as many nodes than in the first transformer model (with dropout 0.25) we tested, and hence serves as an important ablation study. Additional ablation study is conducted further down in the notebook.

In [None]:
input_shape = (128 * 30, 3) #input changed back to 3 channels.
num_patches = 30

transformer_layers = 5
num_heads = 4
transformer_units = 32
reg = False
drop_out = 0.5 #changed
l2_weight = 0.001

model = create_transformer_model(input_shape, num_patches,
                                projection_dim=transformer_units, transformer_layers=transformer_layers,
                                num_heads=num_heads,
                                transformer_units = [transformer_units*2, transformer_units],
                                mlp_head_units=[256, 128],
                                num_classes=1, drop_out=drop_out, reg=reg, l2_weight=l2_weight, demographic=False)

In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

# sig_dict_chat = {
#     "EOG": [0, 1],
#     "EEG": [4, 5],
#     "ECG": [15,16],
#     "Resp": [9, 10],
#     "SPO2": [13],
#     "CO2": [14],
# }

# channel_list_chat = [
#     ["ECG", "SPO2"]
# ]

# for ch in channel_list_chat:
#     chs = []
#     chstr = ""
#     for name in ch:
#         chstr += name
#         chs = chs + sig_dict_chat[name]
#     print(chstr, chs)
#     config = {
#         "data_path": data_path + 'chat_',
#         "model_path": model_path,
#         "model_name": "model_5dropout_"+ chstr,
#         "regression": False,
#         "epochs": 100,
#         "channels": chs,
#     }
#     train(config, 5)

ECGSPO2 [15, 16, 13]
5
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)
(3506, 3840, 3)
training
(14033, 3840, 3)
(14033,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 



(13473, 3840, 3)
(13473,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100




(12966, 3840, 3)
(12966,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100




(12555, 3840, 3)
(12555,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100




(13001, 3840, 3)
(13001,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100




training complete


In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

sig_dict_chat = {
    "EOG": [0, 1],
    "EEG": [4, 5],
    "ECG": [15,16],
    "Resp": [9, 10],
    "SPO2": [13],
    "CO2": [14],
}

channel_list_chat = [
    ["ECG", "SPO2"],
]

for ch in channel_list_chat:
    chs = []
    chstr = ""
    for name in ch:
        chstr += name
        chs = chs + sig_dict_chat[name]
    print(chstr, chs)
    config = {
        "data_path": data_path + 'chat_',
        "model_path": model_path,
        "model_name": "model_5dropout_"+ chstr,
        "channels": chs,
    }
    test(config, 5)

ECGSPO2 [15, 16, 13]
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)




(3506, 3840, 3)
test starting
















[83.75101050929669, 84.31114040870138, 80.00564812199944, 84.94433198380567, 83.25727324586423] 
[83.37368845843423, 81.70426065162907, 77.36815156169995, 83.02348336594912, 82.25988700564972] 
[84.0520748576078, 87.63440860215054, 85.03095104108047, 87.24935732647815, 84.21052631578947] 
[83.45381526104417, 81.11254851228978, 74.94331065759637, 82.71051320378675, 82.3297692740574] 
[83.71150729335494, 84.56549935149157, 81.01876675603218, 85.08398094760591, 83.22377822234924] 
[91.21334810356151, 91.43275570671452, 87.2983014252409, 91.44610828680594, 90.16320290792346] 
[89.36658219557626, 90.02628470594267, 85.51510273940235, 90.06881690443224, 88.87322465739358] 
Accuracy: 83.25 -+ 1.719 
Precision: 81.55 -+ 2.169 
Recall: 85.64 -+ 1.517 
Specifity: 80.91 -+ 3.078 
F1: 83.52 -+ 1.409 
AUROC: 90.31 -+ 1.579 
AUPRC: 88.77 -+ 1.687 
$ 83.3 \pm 1.7$& $81.5 \pm 2.2$& $85.6 \pm 1.5$& $83.5 \pm 1.4$& $90.3 \pm 1.6$& 


###Discussion of experiment - dropout

The table below summarizes the impact on model performance with No dropout as well as twice the dropout rate of 0.5.

<div>
<img src="https://drive.google.com/uc?export=view&id=11cBNlp4eMD7JI8BsvAAns35-xIH_xdhT"/>
</div>

Interestingly we see that no dropout leads to better performance across almost all metrics. This could be the case as all nodes in the neural network together perform better and still have good generalization power. However, having no dropout means having no regularization technique, and so, could potentially lead to some overfitting.


###Experiment - Learning rate scheduler

The initial model made it so that the learning rate decreased by 50% after 50 epochs and additional 50% every 5 epochs after.

We want to compare models with no learning rate adjustments and one that starts at 25 epochs instead of 50.

**No Learning schedule below:**

In [None]:
input_shape = (128 * 30, 3)
num_patches = 30

transformer_layers = 5
num_heads = 4
transformer_units = 32
reg = False
drop_out = 0.25
l2_weight = 0.001

model = create_transformer_model(input_shape, num_patches,
                                projection_dim=transformer_units, transformer_layers=transformer_layers,
                                num_heads=num_heads,
                                transformer_units = [transformer_units*2, transformer_units],
                                mlp_head_units=[256, 128],
                                num_classes=1, drop_out=drop_out, reg=reg, l2_weight=l2_weight, demographic=False)
loss_func = BinaryCrossentropy()
optimizer = "adam"

#Prevent over-fitting on same fold.
def lr_schedule(epoch, lr):
    return lr

In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

# sig_dict_chat = {
#     "EOG": [0, 1],
#     "EEG": [4, 5],
#     "ECG": [15,16],
#     "Resp": [9, 10],
#     "SPO2": [13],
#     "CO2": [14],
# }

# channel_list_chat = [
#     ["ECG", "SPO2"]
# ]

# for ch in channel_list_chat:
#     chs = []
#     chstr = ""
#     for name in ch:
#         chstr += name
#         chs = chs + sig_dict_chat[name]
#     print(chstr, chs)
#     config = {
#         "data_path": data_path + 'chat_',
#         "model_path": model_path,
#         "model_name": "model_0lrs_"+ chstr,
#         "regression": False,
#         "epochs": 100,
#         "channels": chs,
#     }
#     train(config, 5)

ECGSPO2 [15, 16, 13]
5
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)
(3506, 3840, 3)
training
(14033, 3840, 3)
(14033,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100




(13473, 3840, 3)
(13473,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100




(12966, 3840, 3)
(12966,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100




(12555, 3840, 3)
(12555,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100




(13001, 3840, 3)
(13001,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100




training complete


In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

sig_dict_chat = {
    "EOG": [0, 1],
    "EEG": [4, 5],
    "ECG": [15,16],
    "Resp": [9, 10],
    "SPO2": [13],
    "CO2": [14],
}

channel_list_chat = [
    ["ECG", "SPO2"],
]

for ch in channel_list_chat:
    chs = []
    chstr = ""
    for name in ch:
        chstr += name
        chs = chs + sig_dict_chat[name]
    print(chstr, chs)
    config = {
        "data_path": data_path + 'chat_',
        "model_path": model_path,
        "model_name": "model_0lrs_"+ chstr,
        "channels": chs,
    }
    test(config, 5)

ECGSPO2 [15, 16, 13]
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)




(3506, 3840, 3)
test starting
















[82.90218270008084, 84.14634146341463, 80.4010166619599, 85.60222672064778, 82.82943525385055] 
[84.44444444444444, 87.43494423791822, 79.28610059491618, 85.39094650205762, 80.64165307232192] 
[80.39056143205858, 79.03225806451613, 82.49859313449635, 85.34704370179949, 85.77212261422787] 
[85.38152610441767, 89.06856403622251, 78.28798185941042, 85.84952665670154, 79.96623522791221] 
[82.36765318882868, 83.02153194493471, 80.86045228902373, 85.36898945744407, 83.12780269058297] 
[91.03473291048655, 91.69440039505348, 87.90791124178054, 91.98417364417341, 90.25612600828073] 
[89.49346610240492, 90.78534714965475, 87.28368343781358, 90.62761158007426, 89.04605145336507] 
Accuracy: 83.18 -+ 1.716 
Precision: 83.44 -+ 3.029 
Recall: 82.61 -+ 2.654 
Specifity: 83.71 -+ 3.987 
F1: 82.95 -+ 1.456 
AUROC: 90.58 -+ 1.460 
AUPRC: 89.45 -+ 1.267 
$ 83.2 \pm 1.7$& $83.4 \pm 3.0$& $82.6 \pm 2.7$& $82.9 \pm 1.5$& $90.6 \pm 1.5$& 


**Learning rate schedule starting at 25 epochs.**

In [None]:
input_shape = (128 * 30, 3)
num_patches = 30

transformer_layers = 5
num_heads = 4
transformer_units = 32
reg = False
drop_out = 0.25
l2_weight = 0.001

model = create_transformer_model(input_shape, num_patches,
                                projection_dim=transformer_units, transformer_layers=transformer_layers,
                                num_heads=num_heads,
                                transformer_units = [transformer_units*2, transformer_units],
                                mlp_head_units=[256, 128],
                                num_classes=1, drop_out=drop_out, reg=reg, l2_weight=l2_weight, demographic=False)
loss_func = BinaryCrossentropy()
optimizer = "adam"

#Prevent over-fitting on same fold.
def lr_schedule(epoch, lr):

    if epoch > 25 and (epoch - 1) % 5 == 0:
        lr *= 0.5

    return lr

In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

# sig_dict_chat = {
#     "EOG": [0, 1],
#     "EEG": [4, 5],
#     "ECG": [15,16],
#     "Resp": [9, 10],
#     "SPO2": [13],
#     "CO2": [14],
# }

# channel_list_chat = [
#     ["ECG", "SPO2"]
# ]

# for ch in channel_list_chat:
#     chs = []
#     chstr = ""
#     for name in ch:
#         chstr += name
#         chs = chs + sig_dict_chat[name]
#     print(chstr, chs)
#     config = {
#         "data_path": data_path + 'chat_',
#         "model_path": model_path,
#         "model_name": "model_25lrs_"+ chstr,
#         "regression": False,
#         "epochs": 100,
#         "channels": chs,
#     }
#     train(config, 5)

ECGSPO2 [15, 16, 13]
5
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)
(3506, 3840, 3)
training
(14033, 3840, 3)
(14033,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 



(13473, 3840, 3)
(13473,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100




(12966, 3840, 3)
(12966,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100




(12555, 3840, 3)
(12555,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100




(13001, 3840, 3)
(13001,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100




training complete


In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

sig_dict_chat = {
    "EOG": [0, 1],
    "EEG": [4, 5],
    "ECG": [15,16],
    "Resp": [9, 10],
    "SPO2": [13],
    "CO2": [14],
}

channel_list_chat = [
    ["ECG", "SPO2"],
]

for ch in channel_list_chat:
    chs = []
    chstr = ""
    for name in ch:
        chstr += name
        chs = chs + sig_dict_chat[name]
    print(chstr, chs)
    config = {
        "data_path": data_path + 'chat_',
        "model_path": model_path,
        "model_name": "model_25lrs_"+ chstr,
        "channels": chs,
    }
    test(config, 5)

ECGSPO2 [15, 16, 13]
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)




(3506, 3840, 3)
test starting
















[82.94260307194826, 85.00329597890574, 80.59870093194013, 85.4251012145749, 83.71363377067884] 
[84.16596104995766, 86.1948142957253, 79.20685959271168, 86.19777895293495, 82.1309655937847] 
[80.87876322213181, 82.66129032258065, 83.17388857625211, 83.80462724935732, 85.59861191440139] 
[84.97991967871485, 87.25743855109961, 78.00453514739229, 86.99551569506725, 81.87957231288688] 
[82.48962655601659, 84.3910806174957, 81.14191600329399, 84.98435870698644, 83.82894364202775] 
[91.33418948372824, 92.00768719831964, 88.05676463044419, 91.70366442387376, 90.40758577973872] 
[89.99353325682134, 91.07876060697824, 87.14165388985897, 90.46401956243955, 88.60939313919211] 
Accuracy: 83.54 -+ 1.717 
Precision: 83.58 -+ 2.655 
Recall: 83.22 -+ 1.536 
Specifity: 83.82 -+ 3.488 
F1: 83.37 -+ 1.386 
AUROC: 90.70 -+ 1.428 
AUPRC: 89.46 -+ 1.415 
$ 83.5 \pm 1.7$& $83.6 \pm 2.7$& $83.2 \pm 1.5$& $83.4 \pm 1.4$& $90.7 \pm 1.4$& 


### Discussion of learning rate schedule experiment

The difference in model performance based on different learning rate schedules is summarized in the table below.

As you can see, the learning rate that starts decreasing at an earlier epoch stage actually performs better than the other options. But, the original author's learning rate scheduling algorithm outperforms in some metrics.


<div>
<img src="https://drive.google.com/uc?export=view&id=1MBbzNvq9A_zoXf9eMoSFzuFHPZoP5nl5"/>
</div>



###Experiment - number of epochs

We will now compare using 100 epochs (already tested above in inital model above) to 200 epochs and  400 epochs.

In [None]:
#Same as initial model
input_shape = (128 * 30, 3)
num_patches = 30

transformer_layers = 5
num_heads = 4
transformer_units = 32
reg = False
drop_out = 0.25
l2_weight = 0.001

model = create_transformer_model(input_shape, num_patches,
                                projection_dim=transformer_units, transformer_layers=transformer_layers,
                                num_heads=num_heads,
                                transformer_units = [transformer_units*2, transformer_units],
                                mlp_head_units=[256, 128],
                                num_classes=1, drop_out=drop_out, reg=reg, l2_weight=l2_weight, demographic=False)
loss_func = BinaryCrossentropy()
optimizer = "adam"

#Prevent over-fitting on same fold.
def lr_schedule(epoch, lr):

    if epoch > 50 and (epoch - 1) % 5 == 0:
        lr *= 0.5

    return lr

###**200 epochs:**

In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

# sig_dict_chat = {
#     "EOG": [0, 1],
#     "EEG": [4, 5],
#     "ECG": [15,16],
#     "Resp": [9, 10],
#     "SPO2": [13],
#     "CO2": [14],
# }

# channel_list_chat = [
#     ["ECG", "SPO2"]
# ]

# for ch in channel_list_chat:
#     chs = []
#     chstr = ""
#     for name in ch:
#         chstr += name
#         chs = chs + sig_dict_chat[name]
#     print(chstr, chs)
#     config = {
#         "data_path": data_path + 'chat_',
#         "model_path": model_path,
#         "model_name": "model_200epochs_"+ chstr,
#         "regression": False,
#         "epochs": 200, #changed to 200
#         "channels": chs,
#     }
#     train(config, 5)

ECGSPO2 [15, 16, 13]
5
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)
(3506, 3840, 3)
training
(14033, 3840, 3)
(14033,)
Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200




(13473, 3840, 3)
(13473,)
Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200




(12966, 3840, 3)
(12966,)
Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200




(12555, 3840, 3)
(12555,)
Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200




(13001, 3840, 3)
(13001,)
Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200




training complete


In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

sig_dict_chat = {
    "EOG": [0, 1],
    "EEG": [4, 5],
    "ECG": [15,16],
    "Resp": [9, 10],
    "SPO2": [13],
    "CO2": [14],
}

channel_list_chat = [
    ["ECG", "SPO2"],
]

for ch in channel_list_chat:
    chs = []
    chstr = ""
    for name in ch:
        chstr += name
        chs = chs + sig_dict_chat[name]
    print(chstr, chs)
    config = {
        "data_path": data_path + 'chat_',
        "model_path": model_path,
        "model_name": "model_200epochs_"+ chstr,
        "channels": chs,
    }
    test(config, 5)

ECGSPO2 [15, 16, 13]
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)




(3506, 3840, 3)
test starting
















[83.30638641875505, 85.36585365853658, 79.77972324202203, 84.89372469635627, 83.62806617227609] 
[83.83084577114428, 85.1278600269179, 77.76033490319205, 82.9423264907136, 81.36882129277566] 
[82.2620016273393, 85.01344086021506, 83.62408553742262, 87.24935732647815, 86.63967611336032] 
[84.33734939759037, 85.70504527813712, 75.90702947845806, 82.6108619830593, 80.6978052898143] 
[83.03901437371664, 85.07061197041021, 80.58568329718004, 85.04134302179905, 83.92156862745097] 
[90.9763055476585, 91.71013646037642, 87.32978841508466, 91.40145736708153, 90.03106007519122] 
[89.18326775654197, 90.20537623923101, 85.59015980529287, 89.55404635501469, 88.3405913579643] 
Accuracy: 83.39 -+ 1.963 
Precision: 82.21 -+ 2.537 
Recall: 84.96 -+ 1.851 
Specifity: 81.85 -+ 3.414 
F1: 83.53 -+ 1.657 
AUROC: 90.29 -+ 1.584 
AUPRC: 88.57 -+ 1.609 
$ 83.4 \pm 2.0$& $82.2 \pm 2.5$& $85.0 \pm 1.9$& $83.5 \pm 1.7$& $90.3 \pm 1.6$& 


###**400 epochs:**

In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

# sig_dict_chat = {
#     "EOG": [0, 1],
#     "EEG": [4, 5],
#     "ECG": [15,16],
#     "Resp": [9, 10],
#     "SPO2": [13],
#     "CO2": [14],
# }

# channel_list_chat = [
#     ["ECG", "SPO2"]
# ]

# for ch in channel_list_chat:
#     chs = []
#     chstr = ""
#     for name in ch:
#         chstr += name
#         chs = chs + sig_dict_chat[name]
#     print(chstr, chs)
#     config = {
#         "data_path": data_path + 'chat_',
#         "model_path": model_path,
#         "model_name": "model_400epochs_"+ chstr,
#         "regression": False,
#         "epochs": 400, #changed to 400
#         "channels": chs,
#     }
#     train(config, 5)

ECGSPO2 [15, 16, 13]
5
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)
(3506, 3840, 3)
training
(14033, 3840, 3)
(14033,)
Epoch 1/400
Epoch 2/400
Epoch 3/400
Epoch 4/400
Epoch 5/400
Epoch 6/400
Epoch 7/400
Epoch 8/400
Epoch 9/400
Epoch 10/400
Epoch 11/400
Epoch 12/400
Epoch 13/400
Epoch 14/400
Epoch 15/400
Epoch 16/400
Epoch 17/400
Epoch 18/400
Epoch 19/400
Epoch 20/400
Epoch 21/400
Epoch 22/400
Epoch 23/400
Epoch 24/400
Epoch 25/400
Epoch 26/400




(13473, 3840, 3)
(13473,)
Epoch 1/400
Epoch 2/400
Epoch 3/400
Epoch 4/400
Epoch 5/400
Epoch 6/400
Epoch 7/400
Epoch 8/400
Epoch 9/400
Epoch 10/400
Epoch 11/400
Epoch 12/400
Epoch 13/400
Epoch 14/400
Epoch 15/400
Epoch 16/400




(12966, 3840, 3)
(12966,)
Epoch 1/400
Epoch 2/400
Epoch 3/400
Epoch 4/400
Epoch 5/400
Epoch 6/400
Epoch 7/400
Epoch 8/400
Epoch 9/400
Epoch 10/400
Epoch 11/400
Epoch 12/400
Epoch 13/400
Epoch 14/400
Epoch 15/400
Epoch 16/400
Epoch 17/400
Epoch 18/400
Epoch 19/400
Epoch 20/400
Epoch 21/400
Epoch 22/400




(12555, 3840, 3)
(12555,)
Epoch 1/400
Epoch 2/400
Epoch 3/400
Epoch 4/400
Epoch 5/400
Epoch 6/400
Epoch 7/400
Epoch 8/400
Epoch 9/400
Epoch 10/400
Epoch 11/400
Epoch 12/400
Epoch 13/400
Epoch 14/400
Epoch 15/400
Epoch 16/400
Epoch 17/400
Epoch 18/400
Epoch 19/400




(13001, 3840, 3)
(13001,)
Epoch 1/400
Epoch 2/400
Epoch 3/400
Epoch 4/400
Epoch 5/400
Epoch 6/400
Epoch 7/400
Epoch 8/400
Epoch 9/400
Epoch 10/400
Epoch 11/400
Epoch 12/400
Epoch 13/400




training complete


In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

sig_dict_chat = {
    "EOG": [0, 1],
    "EEG": [4, 5],
    "ECG": [15,16],
    "Resp": [9, 10],
    "SPO2": [13],
    "CO2": [14],
}

channel_list_chat = [
    ["ECG", "SPO2"],
]

for ch in channel_list_chat:
    chs = []
    chstr = ""
    for name in ch:
        chstr += name
        chs = chs + sig_dict_chat[name]
    print(chstr, chs)
    config = {
        "data_path": data_path + 'chat_',
        "model_path": model_path,
        "model_name": "model_400epochs_"+ chstr,
        "channels": chs,
    }
    test(config, 5)

ECGSPO2 [15, 16, 13]
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)




(3506, 3840, 3)
test starting
















[83.34680679062248, 85.72841133816743, 79.8079638520192, 84.59008097165992, 83.3998859098688] 
[85.99118942731278, 85.3315472203617, 77.37113402061856, 85.38135593220339, 80.13662637940094] 
[79.41415785191212, 85.61827956989248, 84.46820483961733, 82.87917737789203, 88.20127241179873] 
[87.2289156626506, 85.83441138421733, 75.11337868480726, 86.24813153961136, 78.7281935846933] 
[82.57191201353638, 85.47467292854746, 80.76405703524348, 84.11166188364206, 83.97577092511014] 
[91.42830067217609, 92.24194591662148, 87.49692148478225, 91.84791532976485, 90.37879751975065] 
[89.75743349699317, 90.97601461985424, 85.92852064797634, 90.54725734577242, 88.91609773907493] 
Accuracy: 83.37 -+ 1.987 
Precision: 82.84 -+ 3.459 
Recall: 84.12 -+ 2.923 
Specifity: 82.63 -+ 4.821 
F1: 83.38 -+ 1.598 
AUROC: 90.68 -+ 1.708 
AUPRC: 89.23 -+ 1.792 
$ 83.4 \pm 2.0$& $82.8 \pm 3.5$& $84.1 \pm 2.9$& $83.4 \pm 1.6$& $90.7 \pm 1.7$& 


###Discussion of epochs experiment

The table below summarizes how increasing the number of epochs impacts model performance. Bolded are the highest values within each column. As you can see, not all metrics improve as the number of epochs increases.

The authors had also done similar studies and chose 200 as the best number of epochs. In our case, we choose 100 epochs to proceed with further in the notebook as it actually shows better performance than 200 epochs and trains in less time.

<div>
<img src="https://drive.google.com/uc?export=view&id=15YFd2MK6_F6UQ4r7hMt464XtgpycJXVE"/>
</div>


##Ablation Study
We wanted to see how changing number of transformer layer affects the model. The author used 5 layers, but we will be testing 1, 3, and 10 layers.

### 1 layer

In [None]:
input_shape = (128 * 30, 3)
num_patches = 30

transformer_layers = 1 #changed to single layer
num_heads = 4
transformer_units = 32
reg = False
drop_out = 0.25
l2_weight = 0.001

model = create_transformer_model(input_shape, num_patches,
                                projection_dim=transformer_units, transformer_layers=transformer_layers,
                                num_heads=num_heads,
                                transformer_units = [transformer_units*2, transformer_units],
                                mlp_head_units=[256, 128],
                                num_classes=1, drop_out=drop_out, reg=reg, l2_weight=l2_weight, demographic=False)
loss_func = BinaryCrossentropy()
optimizer = "adam"

#Prevent over-fitting on same fold.
def lr_schedule(epoch, lr):

    if epoch > 50 and (epoch - 1) % 5 == 0:
        lr *= 0.5

    return lr

In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

# sig_dict_chat = {
#     "EOG": [0, 1],
#     "EEG": [4, 5],
#     "ECG": [15,16],
#     "Resp": [9, 10],
#     "SPO2": [13],
#     "CO2": [14],
# }

# channel_list_chat = [
#     ["ECG", "SPO2"]
# ]

# for ch in channel_list_chat:
#     chs = []
#     chstr = ""
#     for name in ch:
#         chstr += name
#         chs = chs + sig_dict_chat[name]
#     print(chstr, chs)
#     config = {
#         "data_path": data_path + 'chat_',
#         "model_path": model_path,
#         "model_name": "model_1trasformerlayer_"+ chstr,
#         "regression": False,
#         "epochs": 100,
#         "channels": chs,
#     }
#     train(config, 5)

ECGSPO2 [15, 16, 13]
5
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)
(3506, 3840, 3)
training
(14033, 3840, 3)
(14033,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100




(13473, 3840, 3)
(13473,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100




(12966, 3840, 3)
(12966,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100




(12555, 3840, 3)
(12555,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100




(13001, 3840, 3)
(13001,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100




training complete


In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

sig_dict_chat = {
    "EOG": [0, 1],
    "EEG": [4, 5],
    "ECG": [15,16],
    "Resp": [9, 10],
    "SPO2": [13],
    "CO2": [14],
}

channel_list_chat = [
    ["ECG", "SPO2"],
]

for ch in channel_list_chat:
    chs = []
    chstr = ""
    for name in ch:
        chstr += name
        chs = chs + sig_dict_chat[name]
    print(chstr, chs)
    config = {
        "data_path": data_path + 'chat_',
        "model_path": model_path,
        "model_name": "model_1trasformerlayer_"+ chstr,
        "channels": chs,
    }
    test(config, 5)

ECGSPO2 [15, 16, 13]
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)




(3506, 3840, 3)
test starting
















[82.90218270008084, 84.54185893210283, 78.76306128212369, 84.48886639676113, 82.80091272104963] 
[84.80138169257341, 85.85503166783955, 76.62337662337663, 85.05263157894737, 80.07478632478633] 
[79.90235964198536, 81.98924731182797, 83.00506471581318, 83.08483290488432, 86.69751301330248] 
[85.86345381526105, 86.9987063389392, 74.48979591836735, 85.84952665670154, 79.00956668542487] 
[82.27901131126939, 83.87762117566174, 79.68665586169638, 84.05721716514954, 83.25465148569842] 
[90.8246166112783, 91.10368936833173, 86.42925093503919, 91.1202308629309, 89.46736999635144] 
[89.54340177442536, 89.02047582501295, 83.89965620232837, 89.09087747733659, 86.39791293109691] 
Accuracy: 82.70 -+ 2.104 
Precision: 82.48 -+ 3.563 
Recall: 82.94 -+ 2.203 
Specifity: 82.44 -+ 4.881 
F1: 82.63 -+ 1.598 
AUROC: 89.79 -+ 1.787 
AUPRC: 87.59 -+ 2.152 
$ 82.7 \pm 2.1$& $82.5 \pm 3.6$& $82.9 \pm 2.2$& $82.6 \pm 1.6$& $89.8 \pm 1.8$& 


### 3 layers

In [None]:
input_shape = (128 * 30, 3)
num_patches = 30

transformer_layers = 3 #changed to 3 layers
num_heads = 4
transformer_units = 32
reg = False
drop_out = 0.25
l2_weight = 0.001

model = create_transformer_model(input_shape, num_patches,
                                projection_dim=transformer_units, transformer_layers=transformer_layers,
                                num_heads=num_heads,
                                transformer_units = [transformer_units*2, transformer_units],
                                mlp_head_units=[256, 128],
                                num_classes=1, drop_out=drop_out, reg=reg, l2_weight=l2_weight, demographic=False)
loss_func = BinaryCrossentropy()
optimizer = "adam"

#Prevent over-fitting on same fold.
def lr_schedule(epoch, lr):

    if epoch > 50 and (epoch - 1) % 5 == 0:
        lr *= 0.5

    return lr

In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

# sig_dict_chat = {
#     "EOG": [0, 1],
#     "EEG": [4, 5],
#     "ECG": [15,16],
#     "Resp": [9, 10],
#     "SPO2": [13],
#     "CO2": [14],
# }

# channel_list_chat = [
#     ["ECG", "SPO2"]
# ]

# for ch in channel_list_chat:
#     chs = []
#     chstr = ""
#     for name in ch:
#         chstr += name
#         chs = chs + sig_dict_chat[name]
#     print(chstr, chs)
#     config = {
#         "data_path": data_path + 'chat_',
#         "model_path": model_path,
#         "model_name": "model_3trasformerlayer_"+ chstr,
#         "regression": False,
#         "epochs": 100,
#         "channels": chs,
#     }
#     train(config, 5)

ECGSPO2 [15, 16, 13]
5
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)
(3506, 3840, 3)
training
(14033, 3840, 3)
(14033,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 



(13473, 3840, 3)
(13473,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100




(12966, 3840, 3)
(12966,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100




(12555, 3840, 3)
(12555,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100




(13001, 3840, 3)
(13001,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100




training complete


In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

sig_dict_chat = {
    "EOG": [0, 1],
    "EEG": [4, 5],
    "ECG": [15,16],
    "Resp": [9, 10],
    "SPO2": [13],
    "CO2": [14],
}

channel_list_chat = [
    ["ECG", "SPO2"],
]

for ch in channel_list_chat:
    chs = []
    chstr = ""
    for name in ch:
        chstr += name
        chs = chs + sig_dict_chat[name]
    print(chstr, chs)
    config = {
        "data_path": data_path + 'chat_',
        "model_path": model_path,
        "model_name": "model_3trasformerlayer_"+ chstr,
        "channels": chs,
    }
    test(config, 5)

ECGSPO2 [15, 16, 13]
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)




(3506, 3840, 3)
test starting
















[83.91269199676637, 85.39881344759394, 79.8079638520192, 85.60222672064778, 83.59954363947519] 
[83.8079739625712, 85.28021607022282, 77.42768595041323, 84.43443443443444, 82.30683090705487] 
[83.8079739625712, 84.87903225806451, 84.3556555993247, 86.73521850899742, 85.02024291497976] 
[84.0160642570281, 85.89909443725745, 75.22675736961452, 84.50423517688091, 82.21722003376478] 
[83.8079739625712, 85.07915122937015, 80.74333423108, 85.56936342886128, 83.64153627311522] 
[90.83265527529157, 91.57538009987618, 86.93589797577256, 92.13490572200384, 90.34885382366352] 
[88.8484297199888, 90.12326158903589, 85.13001954696466, 90.77013983768445, 88.30344409804339] 
Accuracy: 83.66 -+ 2.083 
Precision: 82.65 -+ 2.787 
Recall: 84.96 -+ 0.985 
Specifity: 82.37 -+ 3.762 
F1: 83.77 -+ 1.681 
AUROC: 90.37 -+ 1.821 
AUPRC: 88.64 -+ 1.960 
$ 83.7 \pm 2.1$& $82.7 \pm 2.8$& $85.0 \pm 1.0$& $83.8 \pm 1.7$& $90.4 \pm 1.8$& 


### 10 layers

In [None]:
input_shape = (128 * 30, 3)
num_patches = 30

transformer_layers = 10 #changed to 10 layers
num_heads = 4
transformer_units = 32
reg = False
drop_out = 0.25
l2_weight = 0.001

model = create_transformer_model(input_shape, num_patches,
                                projection_dim=transformer_units, transformer_layers=transformer_layers,
                                num_heads=num_heads,
                                transformer_units = [transformer_units*2, transformer_units],
                                mlp_head_units=[256, 128],
                                num_classes=1, drop_out=drop_out, reg=reg, l2_weight=l2_weight, demographic=False)
loss_func = BinaryCrossentropy()
optimizer = "adam"

#Prevent over-fitting on same fold.
def lr_schedule(epoch, lr):

    if epoch > 50 and (epoch - 1) % 5 == 0:
        lr *= 0.5

    return lr

In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

# sig_dict_chat = {
#     "EOG": [0, 1],
#     "EEG": [4, 5],
#     "ECG": [15,16],
#     "Resp": [9, 10],
#     "SPO2": [13],
#     "CO2": [14],
# }

# channel_list_chat = [
#     ["ECG", "SPO2"]
# ]

# for ch in channel_list_chat:
#     chs = []
#     chstr = ""
#     for name in ch:
#         chstr += name
#         chs = chs + sig_dict_chat[name]
#     print(chstr, chs)
#     config = {
#         "data_path": data_path + 'chat_',
#         "model_path": model_path,
#         "model_name": "model_10trasformerlayer_"+ chstr,
#         "regression": False,
#         "epochs": 100,
#         "channels": chs,
#     }
#     train(config, 5)

ECGSPO2 [15, 16, 13]
5
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)
(3506, 3840, 3)
training
(14033, 3840, 3)
(14033,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 



(13473, 3840, 3)
(13473,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100




(12966, 3840, 3)
(12966,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100




(12555, 3840, 3)
(12555,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100




(13001, 3840, 3)
(13001,)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100




training complete


In [None]:
data_path = '/content/chatloader/'
model_path = '/content/model/'
os.makedirs(os.path.dirname(model_path), exist_ok=True)

sig_dict_chat = {
    "EOG": [0, 1],
    "EEG": [4, 5],
    "ECG": [15,16],
    "Resp": [9, 10],
    "SPO2": [13],
    "CO2": [14],
}

channel_list_chat = [
    ["ECG", "SPO2"],
]

for ch in channel_list_chat:
    chs = []
    chstr = ""
    for name in ch:
        chstr += name
        chs = chs + sig_dict_chat[name]
    print(chstr, chs)
    config = {
        "data_path": data_path + 'chat_',
        "model_path": model_path,
        "model_name": "model_10trasformerlayer_"+ chstr,
        "channels": chs,
    }
    test(config, 5)

ECGSPO2 [15, 16, 13]
(2474, 3840, 17)
(2474, 3840, 3)
(3034, 3840, 17)
(3034, 3840, 3)
(3541, 3840, 17)
(3541, 3840, 3)
(3952, 3840, 17)
(3952, 3840, 3)
(3506, 3840, 17)




(3506, 3840, 3)
test starting
















[84.03395311236864, 85.92617007251152, 80.48573849195144, 85.98178137651821, 84.45521962350256] 
[84.69217970049917, 86.11300204220558, 78.31074035453598, 86.50918635170603, 84.37862950058071] 
[82.83157038242473, 85.01344086021506, 84.52447945976364, 84.73007712082263, 84.03701561596299] 
[85.22088353413655, 86.80465717981889, 76.41723356009071, 87.19481813652217, 84.86212718064154] 
[83.75154257507198, 85.55968887385863, 81.29905277401895, 85.61038961038963, 84.20747609388583] 
[91.5688139049281, 92.61404735077689, 87.60829036172713, 92.18895818363237, 91.31102289293209] 
[90.13142981548908, 91.87645144016732, 86.57728616166679, 91.07255823789431, 90.01140651543938] 
Accuracy: 84.18 -+ 2.002 
Precision: 84.00 -+ 2.958 
Recall: 84.23 -+ 0.767 
Specifity: 84.10 -+ 3.943 
F1: 84.09 -+ 1.574 
AUROC: 91.06 -+ 1.785 
AUPRC: 89.93 -+ 1.810 
$ 84.2 \pm 2.0$& $84.0 \pm 3.0$& $84.2 \pm 0.8$& $84.1 \pm 1.6$& $91.1 \pm 1.8$& 


In [None]:
!zip -r model.zip model/

  adding: model/ (stored 0%)
  adding: model/model_1trasformerlayer_ECGSPO22/ (stored 0%)
  adding: model/model_1trasformerlayer_ECGSPO22/assets/ (stored 0%)
  adding: model/model_1trasformerlayer_ECGSPO22/fingerprint.pb (stored 0%)
  adding: model/model_1trasformerlayer_ECGSPO22/keras_metadata.pb (deflated 92%)
  adding: model/model_1trasformerlayer_ECGSPO22/saved_model.pb (deflated 89%)
  adding: model/model_1trasformerlayer_ECGSPO22/variables/ (stored 0%)
  adding: model/model_1trasformerlayer_ECGSPO22/variables/variables.data-00000-of-00001 (deflated 7%)
  adding: model/model_1trasformerlayer_ECGSPO22/variables/variables.index (deflated 69%)
  adding: model/model_1trasformerlayer_ECGSPO24/ (stored 0%)
  adding: model/model_1trasformerlayer_ECGSPO24/assets/ (stored 0%)
  adding: model/model_1trasformerlayer_ECGSPO24/fingerprint.pb (stored 0%)
  adding: model/model_1trasformerlayer_ECGSPO24/keras_metadata.pb (deflated 92%)
  adding: model/model_1trasformerlayer_ECGSPO24/saved_model.p

####Discussion of ablation experiment
As seen in the cells above, the performance is best for 10 layers, albeit not by much across all metrics. For instance, increasing from 5 layers to 10 layers has a percentage increase of .83% for accuracy and of .46% for AUROC. This is a bit surprising as we expect more transformer layers to be able to understand more complexities in the data. However, in the case of precision for example, it increases from 82.04 to 84.00 as the layers are increased from 5 to 10, thereby giving a bigger jump in performance. Overall, as observed by the results, more layers does not necessarily mean greatly better performance across all metrics.


Below is a table summarizing these metrics for the different layer configurations. Values that are bolded are the highest values within each column. All of these results reported below are for experiments that had all factors the same besides the number of layers (i.e. all the number of epochs were 100, learning rate schedule was the same...).

<div>
<img src="https://drive.google.com/uc?export=view&id=1W8NgzUSerGnUf_rA5GVPwUf6UZQDXqsg"/>
</div>

One notable metric highlighted yellow is the Recall for 5 transformer layers. This indicates that out of all the positive apnea records, the model with 5 transformer layers had the best ability to detect these positive instances.


# Results
In this section, you should finish training your model training or loading your trained model. That is a great experiment! You should share the results with others with necessary metrics and figures.

Please test and report results for all experiments that you run with:

*   specific numbers (accuracy, AUC, RMSE, etc)
*   figures (loss shrinkage, outputs from GAN, annotation or label of sample pictures, etc)


##Testing + Results Output
After every experiment conducted above, there is a data table associated with its respective model performances.

Each Model is tested using that fold's data (since it wasn't used in the training of that model). x_test and y_test are setup similar to the training, except we only use the respective fold's data.

We use the author's metrics code to generate the results of the model. Each model's results are added to the results object. Once all fold's results are completed, the metrics are calculated and outputted.

##Model Comparison

In [None]:
# compare your model with others
# you don't need to re-run all other experiments, instead, you can directly refer the metrics/numbers in the paper

To highlight the main model's result, it is included here below:

Our transformer model using ECG + SpO2 signals with 100 epochs on CHAT dataset had the following results:
* F1: 83.69
* AUROC: 90.64


The original paper's ECG + SpO2  had the following F1 and AUROC using the CHAT Dataset (Found in Table 4 of the paper):
* F1: 82.5(0.7)
* AUROC: 89.4(0.7)

These metrics are summarized in this table below.


(**Note that the authors just reported F1 and AUROC, not all the other metrics defined in the Evaluation section**).

<div>
<img src="https://drive.google.com/uc?export=view&id=1EyBiVyumg6lIuxuqxsqqtDFuUQ1ALWWY"/>
</div>

It is interesting that although our model performance is similar to the author's model performance, it is actually somewhat better.
We hypothesize this is because we used a different amount of data as compared to the author's. Perhaps the model was able to better generalize on our subset of the data.

Ultimately, with respect to the original hypothesis to predict hyopnea-apnea in children, our implementation of the paper's transformer model performs well, on par with the author's model performance.

# Discussion/Analyses

In this section,you should discuss your work and make future plan. The discussion should address the following questions:
  * Make assessment that the paper is reproducible or not.
  * Explain why it is not reproducible if your results are kind negative.
  * Describe “What was easy” and “What was difficult” during the reproduction.
  * Make suggestions to the author or other reproducers on how to improve the reproducibility.
  * What will you do in next phase.

---

After looking through all the individual components of the original paper, we conclude that the paper is reproducible. For the draft, the model performance was not the same as what the original paper shows because we were only processing a small subset of the data, however we once we utilized more data, we achieved the results on par with what is described in the original paper. In fact, we got closer to the author's results by re-implementing the model correctly instead of using the hybrid-transformer as we discussed in the models section in **Draft Mistake**.

The overall experience of reproducing the paper was satisfying as it had sections that were easy as well as difficult.

The easiest section was understanding the overall flow of data throughout the high-level overview of the model. The authors' code in GitHub provided a great way to understand their implementation details.

However, the actual implementation of the model was difficult because there were practical decisions that needed to be made. One such decision was where to store the data for easy use of preprocessing, training, and testing; we decided to upload all data to a Box account due to its ability to store large sizes of data and allow for easy connection and retrieval of data for machine learning tasks. Likewise, another difficult decision was determining resources that could be used to handle all the machine learning operations. The free version of Google Colab was used, which we found did not have enough RAM for training our model. We had to upgrade to Colab Pro to gain access to a higher RAM machine, which still did not have enough for the full processed dataset. We ended up using a subset of the data for training.

One final difficult decision was whether we should utilize the same functions provided in the original paper’s GitHub; we decided that we could not use the same functions exactly as they were given due to some hard-coded values in those functions and so, instead we made some slight modifications to be able to work with our setup. In the process of reproducing the original paper, several helper functions needed to be implemented and many version-specific code libraries were utilized. In the future, these helper functions being available and explicit indications of the versions of libraries used will help improve reproducibility and allow individuals to focus more time on advancing the model and seeing better results.

To the authors, we have a couple of suggestions. First, it would be helpful to make the code more generic for reproducibility. For instance, we had to manually copy and paste some functions that had hardcoded local data paths. If these data paths are accounted for in a config file, it would make those functions readily reproducible. Secondly, the authors should include a pip requirements file that shows which versions of packages they used; in one instance, we were met with an error during preprocessing because the latest "mne" library version we installed did not have the same functionality as the older "mne" version that the authors used. Thus, we had to install the specific version the authors used to get the code to work. FInally, we suggest that more comments in the code that links the ideas from the paper to the code would be helpful for a reader to bridge the gap between the theoretical solution to the solution in code. It was at first somewhat challenging to understand the segmentation and tokenization steps, so clearly outlining what the preprocessing steps are doing will help readers understand the data methodology better.

#Final Draft Plans (completed)
In the final submission phase, we changed the model we are using to the author's model to correct the mistake used in the first draft.

We also used more CHAT data to train and test the model. We were not able to use all the data due to Colab size limitations, however we used more than what we did in the first draft.

We also trained models using all 6 signals as well as discussed in the paper (EOG, EEG, ECG, Resp, SpO2, CO2).

We compared our 2 Signal (ECG, SpO2) model to the authors, as well as to the 6 signal model. We additionally performed other experiments involving changing dropout, learning rate scheduler algorithm, and the number of epochs.

As part of our ablations in our proposal, we removed transformer layers to see how it affected the model performance.

Ultimately, the data and model implementations were reproducible and produced similar performance to that of the author's results.




# References

1. Fayyaz H, Strang A, Beheshti R. Bringing At-home Pediatric Sleep Apnea Testing Closer to Reality: A Multi-modal Transformer Approach. Proc Mach Learn Res. 2023 Aug;219:167-185. PMID: 38344396; PMCID: PMC10854997.

2. Choi Ji Ho, Kim Eun Joong, Choi June, Kwon Soon Young, Kim Tae Hoon, Lee Sang Hag, Lee Heung Man, Shin Choi, and Lee Seung Hoon. Obstructive sleep apnea syndrome: a child is not just a small adult. Annals of Otology, Rhinology & Laryngology, 119(10): 656–661, 2010.

3. Gipson Kevin, Lu Mengdi, and Kinane T Bernard. Sleep-disordered breathing in children. Pediatrics in review, 40(1):3, 2019.

4. Loughlin GM, Brouillette RT, Brooke LJ, Carroll JL, Chipps BE, England SJ, Ferber P, Ferraro NF, Gaultier C, Givan DC, et al. Standards and indications for cardiopulmonary sleep studies in children. American journal of respiratory and critical care medicine, 153 (2):866–878, 1996.

5. Marcus Carole L, Brooks Lee J, Ward Sally Davidson, Draper Kari A, Gozal David, Halbower Ann C, Jones Jacqueline, Lehmann Christopher, Schechter Michael S, Sheldon Stephen, et al. Diagnosis and management of childhood obstructive sleep apnea syndrome. Pediatrics, 130(3):e714–e755, 2012.

6. Spielmanns Marc, Bost David, Windisch Wolfram, Alter Peter, Greulich Tim, Nell Christoph, Storre Jan Henrik, Koczulla Andreas Rembert, and Boeselt Tobias. Measuring sleep quality and
efficiency with an activity monitoring device in comparison to polysomnography. Journal of clinical medicine research, 11(12):825, 2019.

7. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017b.

8. Hu Shuaicong, Cai Wenjie, Gao Tijie, and Wang Mingjie. A hybrid transformer model for obstructive sleep apnea detection based on self-attention mechanism using single-lead ecg. IEEE Transactions on Instrumentation and Measurement, 71:1–11, 2022.



**Draft Mistake (corrected in final submission) ** The model we created is actually the hybrid-transformer (Hu et al., 2022) that the author's compared their original model to. When checking the author's GitHub Repository, it looks like the last configuration they setup was for the comparison to the hybrid-transformer, which led to our confusion. Due to first draft deadline, we could not  train the author's model in time. For the final draft, we trained and tested the author's original model (provided in line 109 https://github.com/healthylaife/Pediatric-Apnea-Detection/blob/main/models/models.py).

In [None]:
# this function is provided in the paper's github repo but we included it here for clarity/describing key components
# from create_hybrid_transformer_model - https://github.com/healthylaife/Pediatric-Apnea-Detection/blob/main/models/transformer.py



#input shape used in paper (didn't work): ((60 * 32, 3))
#Used ((128*30,3)) which matches dataloader.
def create_model(input_shape):
    transformer_units = [32,32]
    transformer_layers = 2
    num_heads = 4
    l2_weight = 0.001
    drop_out= 0.25
    mlp_head_units = [256, 128]
    num_patches= 30
    projection_dim= 32

    input1 = Input(shape=input_shape)
    conv11 = Conv1D(16, 256)(input1)
    conv12 = Conv1D(16, 256)(input1)
    conv13 = Conv1D(16, 256)(input1)

    pwconv1 = SeparableConvolution1D(32, 1)(input1)
    pwconv2 = SeparableConvolution1D(32, 1)(pwconv1)

    conv21 = Conv1D(16, 256)(conv11)
    conv22 = Conv1D(16, 256)(conv12)
    conv23 = Conv1D(16, 256)(conv13)

    concat = concatenate([conv21, conv22, conv23], axis=-1)
    concat = Dense(64, activation=relu)(concat)
    concat = Dense(64, activation=sigmoid)(concat)
    concat = SeparableConvolution1D(32,1)(concat)
    concat = concatenate([concat, pwconv2], axis=1)

    ####################################################################################################################
    patch_size = input_shape[0] / num_patches

    normalized_inputs = tfa.layers.InstanceNormalization(axis=-1, epsilon=1e-6, center=False, scale=False,
                                                            beta_initializer="glorot_uniform",
                                                            gamma_initializer="glorot_uniform")(concat)

    patches = Patches(patch_size=patch_size)(normalized_inputs)
    encoded_patches = PatchEncoder(num_patches=num_patches, projection_dim=projection_dim, l2_weight=l2_weight)(patches)

    for i in range(transformer_layers):
        x1 = encoded_patches # LayerNormalization(epsilon=1e-6)(encoded_patches) # TODO
        attention_output = MultiHeadAttention(
            num_heads=num_heads, key_dim=projection_dim, dropout=drop_out, kernel_regularizer=L2(l2_weight),  # i *
            bias_regularizer=L2(l2_weight))(x1, x1)
        x2 = Add()([attention_output, encoded_patches])
        x3 = LayerNormalization(epsilon=1e-6)(x2)
        x3 = mlp(x3, transformer_units, drop_out, l2_weight)  # i *
        encoded_patches = Add()([x3, x2])

    x = LayerNormalization(epsilon=1e-6)(encoded_patches)
    x = GlobalAveragePooling1D()(x)
    #x = Concatenate()([x, demo])
    features = mlp(x, mlp_head_units, 0.0, l2_weight)

    logits = Dense(1, kernel_regularizer=L2(l2_weight), bias_regularizer=L2(l2_weight),
                   activation='sigmoid')(features)

    ####################################################################################################################

    model = Model(inputs=input1, outputs=logits)
    return model

