# An interpretable prediction system for the Site Of Origin (SOO) of Outflow Tract Ventricular Arrhythmias (OTVAs)

## 0. Starting point

See first the [README](README.md) for a general introduction to the project. We will start analyzing what we have as input, and how will we approach the tasks given our restrictions (dataset, interpretability, fast inference).

### 0.1. About the dataset

We were provided with a dataset with anonymous data from Teknon Medical Center, Barcelona, Spain. As explained in the [release](https://github.com/uripont/arrhythmia-origin-predictor/releases/download/dataset/dataset_arrhythmias.zip), the dataset contains electrocardiogram (ECG) recordings and demographic information from patients with Outflow Tract Ventricular Arrhythmias (OTVAs). Each case includes several 12-lead ECG signal segment (of 2.5 seconds), patient demographic data (age, sex, height, weight,...), and clinician-validated labels indicating the arrhythmia's Site of Origin (SOO). The SOO labels include Left Ventricular Outflow Tract (LVOT), Right Ventricular Outflow Tract (RVOT), and for LVOT cases, further classification into Right Coronary Cusp (RCC) or aortomitral commissure origins among others. 

The following code retrieves the dataset from the release and unzips it into `dataset/` folder, from which we will later load the data and perform the preprocessing. This allows 1:1 reproducibility of the results by performing this notebook's logic on the same input data.


In [None]:
# Standard library packages:
import os
import urllib.request
import zipfile
import shutil

# Third-party packages:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#TODO: add missing here and on requirements.txt

 
# If running this notebook produces import errors, please install missing packages:
# pip install -r requirements.txt
# 
# Consider using a virtual environment:
# python -m venv venv
# source venv/bin/activate  # On Windows: venv\Scripts\activate

In [2]:
# URL of the dataset zip file
dataset_url = "https://github.com/uripont/arrhythmia-origin-predictor/releases/download/dataset/dataset_arrhythmias.zip"

In [3]:
# Local paths
zip_path = "dataset_arrhythmias.zip"
extract_dir = "dataset"

# Create dataset directory if it doesn't exist
os.makedirs(extract_dir, exist_ok=True)

# Download the zip file
if not os.path.exists(zip_path):
    print(f"Downloading dataset from {dataset_url}...")
    urllib.request.urlretrieve(dataset_url, zip_path)
    print(f"Download complete. Saved to {zip_path}")
else:
    print(f"Dataset zip file already exists at {zip_path}")

# Extract the contents
print(f"Extracting files to {extract_dir}/")
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_dir)
print(f"Dataset is ready in the '{extract_dir}' directory.")


Dataset zip file already exists at dataset_arrhythmias.zip
Extracting files to dataset/
Dataset is ready in the 'dataset' directory.


### 0.2. Patient cases

## 1. Our approach

## 2. Demographic data preprocessing

## 3. ECG signal preprocessing

## 4. Model A: Dimensionality reduction for ECG signals

## 5. Interpreting Model A

## 5. Preparing data for training for the two tasks

## 6. Model B: Classification of the SOO

## 7. Interpreting Model B

## 8. Model C: Classification of the sub-regions

## 9. Interpreting Model C

## 10. Final model evaluation and exporting for inference
