## LSTM Intermdate model Learning & Testing
### Implementation of LSTM
- split with patient-id
- z-score normalization
- windows flod 
- downsampling to 100hz
- better classification classes
### Implementation of SMOTE model pipeline
1. **Irregular rhythms** extractor and convertor to numeric value
2. **RR-Interval** based features
   - Detect R-peaks
   - Time Measure between (RR Interval)
   - stable vs unstable explainability
***capture regular rythms = low variability***
***Atrial fibrillation chaotic variability***
3. **HRV** Heart Rate Variability 
    - Short-term variability
    - Long-term variability
    - Autonomic irregularity
    - ***Normal vs AF (explodes, tight)***
4. **P-Wave absence** atrial instability proxies
    - consitince of atrial activity before QRS
    - similarity of atrial segments in cross beats
    - Interpretation ***stable vs unstable (SR, AF)***
5. **Entropy** predictable of ryhtms and complexity
    - AF ***(high randomness, highentropy)***
    - Normal ***(Repetivie, low entropy)***


### PTB-XL ECG
 - **Path A:** Raw ECG → LSTM → Rhythm learning
 - **Path B:** Engineered features → SMOTE → Classical classifier


In [2]:
import os
import pandas as pd
import ast
import random
from collections import Counter
import wfdb
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from pathlib import Path


df  = pd.read_csv("../data/ptbxl_database.csv")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)


print(torch.__file__)
print(torch.__version__)
print(torch.version.cuda)
print(torch.cuda.is_available())

Using device: cuda
c:\Users\arjan\Documents\GitHub\SEARCH_AF_detection_OsloMet_BachelorGroup\venv\Lib\site-packages\torch\__init__.py
2.5.1+cu121
12.1
True


# Statical data

### Cincial Interperation of PTBxl dataset
**PTBxl** dataset conains various clinical labeling
The most important labling frature of the model AF-detection: 
- the model can consist the normal sinus which can be in different values
- the model detect aF where consistly focus if AF represent in recording regarding other diganosis
- the model explainable the result and give the clinical interperation of detected AF diagnosis with reasoning of irregual rythm in the recording

In [6]:
df["scp_dict"] = df["scp_codes"].apply(ast.literal_eval)
all_labels = (
    df["scp_dict"]
    .apply(lambda d: d.keys())
    .explode()
    .unique()
)

print("All unique SCP labels:")
print(sorted(all_labels))


df["label_set"] = df["scp_dict"].apply(
    lambda d: tuple(sorted(d.keys()))
)
print("All unique label sets:")
print(df["label_set"].unique())


All unique SCP labels:
['1AVB', '2AVB', '3AVB', 'ABQRS', 'AFIB', 'AFLT', 'ALMI', 'AMI', 'ANEUR', 'ASMI', 'BIGU', 'CLBBB', 'CRBBB', 'DIG', 'EL', 'HVOLT', 'ILBBB', 'ILMI', 'IMI', 'INJAL', 'INJAS', 'INJIL', 'INJIN', 'INJLA', 'INVT', 'IPLMI', 'IPMI', 'IRBBB', 'ISCAL', 'ISCAN', 'ISCAS', 'ISCIL', 'ISCIN', 'ISCLA', 'ISC_', 'IVCD', 'LAFB', 'LAO/LAE', 'LMI', 'LNGQT', 'LOWT', 'LPFB', 'LPR', 'LVH', 'LVOLT', 'NDT', 'NORM', 'NST_', 'NT_', 'PAC', 'PACE', 'PMI', 'PRC(S)', 'PSVT', 'PVC', 'QWAVE', 'RAO/RAE', 'RVH', 'SARRH', 'SBRAD', 'SEHYP', 'SR', 'STACH', 'STD_', 'STE_', 'SVARR', 'SVTAC', 'TAB_', 'TRIGU', 'VCLVH', 'WPW']
All unique label sets:
[('LVOLT', 'NORM', 'SR') ('NORM', 'SBRAD') ('NORM', 'SR') ...
 ('1AVB', 'ABQRS', 'AMI', 'IMI', 'LAFB', 'SR')
 ('ABQRS', 'IMI', 'ISCLA', 'PVC', 'SR') ('NDT', 'PVC', 'STACH', 'VCLVH')]


### Clinical interperation of AFIB representation in PTBxl dataset
- if AFIB value is 100 the recording taken intend to detect AFIB
- if AFIB value is 0 the recoring taken intend to diganosis another illness where patien already have Atrial Fibrilation spike value

In [None]:
afib_any = df["scp_dict"].apply(
    lambda d: "AFIB" in d )

print("Number of AFIB any:", afib_any.sum())

afib_100  = df["scp_dict"].apply(
    lambda d: "AFIB" in d and d["AFIB"] == 100 )
print("Number of AFIB 100:", afib_100.sum())

afib_75_99  = df["scp_dict"].apply(
    lambda d: "AFIB" in d and 75 <= d["AFIB"] < 100 )
print("Number of AFIB 75-99:", afib_75_99.sum())


Number of AFIB any: 1514
Number of AFIB 100: 48
Number of AFIB 75-99: 0


In [14]:
norm_any = df["scp_dict"].apply(
    lambda d: "NORM" in d )
norm_100 = df["scp_dict"].apply(
    lambda d: "NORM" in d and d["NORM"] == 100 )
norm_75_99 = df["scp_dict"].apply(
    lambda d: "NORM" in d and 75 <= d["NORM"]
    < 100 )
norm_40_74 = df["scp_dict"].apply(
    lambda d: "NORM" in d and 40 <= d["NORM"]
    < 75 )
norm_0_40 = df["scp_dict"].apply(
    lambda d: "NORM" in d and 0 < d["NORM"]
    < 40 )
print("Number of NORM any:", norm_any.sum())
print("Number of NORM 100:", norm_100.sum())
print("Number of NORM 75-99:", norm_75_99.sum())
print("Number of NORM 40-74:", norm_40_74.sum())
print("Number of NORM 0-40:", norm_0_40.sum())

Number of NORM any: 9528
Number of NORM 100: 7185
Number of NORM 75-99: 1761
Number of NORM 40-74: 506
Number of NORM 0-40: 76
