# <u> **Deep Learning for ECG Analysis - Residual CNNs & Interpretability** <u>

# **➤ Summary** 

### **Introduction**

This notebook presents a method to render an ECG deep-learning model interpretable to physicians. By ensuring that its decisions are transparent, clinicians and hospitals are more likely to trust and adopt it. Moreover, this approach aligns closely with the requirements of the EU AI Act.

Using the MIT-BIH Arrhythmia dataset to classify heartbeats, we faced two main challenges: class imbalance and beats that are not aligned in time.  
We address these issues with two different preprocessing pipelines and a residual 1-D CNN. The final model is highly accurate, and SHAP values reveal which moments in each beat guide its predictions.


### **➤➤ Key Take Aways:**

- ✅ **Goal** – Build an interpretable ECG classifier that meets clinical and regulatory needs.

- ✅ **Dataset** – MIT-BIH Arrhythmia Dataset for Rhythmological Classification (109 446 beats, 5 classes, sampled at 125 Hz). 

- ✅ **Model** –  Deep-learning model with task-optimized architecture highlighting three Residual Blocks in combination with two Dense and Dropout layers

- ✅ **Class Balancing** – Oversampling and Data Augmentation of Minority Classes -  Maintaining physiological ECG curve integrity while adding noise using Gaussian jitter

- ✅ **Data Challenges** 
  1. Strong class Imbalance with underrepresentation of class 1 & 3 
  2. In contrast to individual signals, the mean signals by class do not allow clear correlation to the different phases of the heart cycle.

- ✅ **Two Pre-processors** - Goal: Achieve Global Interpretability through P-QRS-T Signal Alignment
  1. *High Data Alteration* – deletes 22 columns and left-shifts most signals for perfect R-peak alignment.  
  2. *Low Data Alteration* – keeps all columns and right-shifts signals, losing less information.  
 
- ✅ **Performance** 

  – Original Data: ≈ 99 % test accuracy; weighted F1-score 0.9886 - excellent model performance. 

  – Preprocessor 1 : ≈ 97 % test accuracy; weighted F1-score 0.9728 - significant Drop in recall of minority Classes. 

  – Preprocessor 2 : ≈ 98 % test accuracy; weighted F1-score 0.9791 - improved Recall and overall F1-Score for minority Classes while maintaining high performance for the majority classes

    ➤ Preprocessor 2 represents an excellent compromise between interpretability and overall model performance.

- ✅ **Explainability** – SHAP shows which parts of the P, QRS, and T waves drive each prediction — now allowing direct correlation of SHAP values to the different phases of the heart cycle

---
## **✅ 1. The Model** 
---

### <u>**Model Architecture:**</u>

<img src="../visualizations/model_architecture.png" alt="Model Architecture" width="1500"/>

=== Figure 1 ===

#### **Main Features**
- **Input Layer:** accepts ECG segments of length 187 (shape 187 × 1).  

- **Initial Conv & Pool:** Conv1D with 64 filters (kernel = 3) + ReLU, followed by MaxPooling (pool = 2).  

- **Residual Feature Extractors:** three blocks with [64, 128, 256] filters; each block has two Conv1D→BatchNorm→ReLU stages plus a skip connection.  

- **Flatten:** converts the final feature map into a 23 808-dimensional vector.  

- **Dense Classifier:** two fully connected layers (128 → ReLU → Dropout 0.4, then 64 → ReLU → Dropout 0.3).  

- **Output Layer:** softmax over five heartbeat classes.  

- **Model Capacity:** ≈ 3.45 million trainable parameters.  

- **Training Setup:** Adam optimizer (lr = 1 × 10⁻⁴) with categorical cross-entropy loss.  

##### **Key Point:** Deep-learning model with task-optimized architecture highlighting three residual blocks in combination with two dense and dropout layers ✅


---

### <u>**Class Imbalance:**</u>

<img src="../visualizations/class_distribution.png" alt="Class Distribution" width="1000"/>

=== Figure 2 ===


- In the MIT-BIH dataset, normal beats (Class 0) constitute over 80 % of samples in both training (≈72 k/109 446) and test (≈18 k/21 891) sets, while Classes 1–4 each account for less than 10 % of the data.  

- Classes 1 (SVEB) and 3 (Fusion) are especially underrepresented—together comprising under 1 % of all beats—making reliable detection of these rhythms challenging.

- To mitigate this skew, I introduce a targeted resampling strategy specifically for the rare classes  

##### **Key Point:** Strong class Imbalance with underrepresentation of class 1 & 3  ✅

### **Signal Granularity**
<img src="../visualizations/signal_granularity.png" alt="Signal Granularity" width="1200"/>

=== Figure 3 ===

- This example contrasts one ECG beat at full 16-decimal precision with the same beat rounded to two decimals—both clearly preserve the RS-fragment, T- and P-waves.  

- <u>**Conclusion:** For clinical purposes, two-decimal precision is sufficient to retain all diagnostically relevant morphology.</u>


### **Class Imbalance ➤ Targeted Oversampling with Physiological Fidelity**

To address the extreme class imbalance without compromising ECG morphology, we implement a targeted oversampling method that preserves clinical signal integrity:

- **Physiological rounding & noise injection:** Each minority-class beat is rounded to two decimal places—sufficient for diagnostic purposes—then perturbed with small Gaussian noise before re-expanding to full 16-decimal precision.  

- **Class-specific multipliers:** SVEB (Class 1) samples are tripled and Fusion (Class 3) samples are increased ninefold, producing a more uniform class distribution.  

- **Robust but faithful augmentation:** This approach strengthens minority-class boundaries—boosting recall—while retaining true waveform features and leaving majority-class performance unaffected.  

##### **Key Point:** Oversampling and Data Augmentation of Minority Classes -  Maintaining physiological ECG curve integrity while adding noise using Gaussian jitter ✅


---

### <u>**Results on original Data:** </u>

<img src="../visualizations/results_original.png" alt="Results Original" width="600"/>

=== Figure 4 ===

### **Multi Class ROC**
<img src="../visualizations/roc_curve.png" alt="ROC" width="700"/>

=== Figure 5 ===

### <u>**Classification Results:**</u>

- **Accuracy & F1:** 99 % accuracy, weighted F1 ≈ 0.9886 — exceptional overall performance  

- **Minority Recall:** SVEB & Fusion recall ~0.82, showing solid but improvable detection.  

- **Precision:** ≥ 0.94 for all classes—very few false positives.   

- <u>**Assessment:**</u> Model is highly reliable—with minor-class recall as the main target for further refinement.  

##### **Key Point:** Original Data: ≈ 99 % test accuracy; weighted F1-score 0.9886 - excellent model performance ✅


---
## **✅ 2. The Interpretability Challenge**
---

### <u>**Individual vs Mean Signals:**</u>

### **Sinus Rhythm Reference**
<img src="../visualizations/normal_ekg_signal.png" alt="Signal" width="800"/>

=== Figure 6 ===

- Displays a standard ECG “sinus rhythm” as a baseline for illustrating heart‐cycle phases.

- Follows the sequence: P wave → QRS complex → T wave.

### **The Individual Signals**
<img src="../visualizations/individual_signal_mitbih.png" alt="Signal" width="900"/>

=== Figure 7 ===

### **The Mean Signals**
<img src="../visualizations/mean_original.png" alt="Signal" width="900"/>

=== Figure 8 ===

### <u>**The Challenge: Averaged ECG Signals**</u>

The previous figures expose a critical limitation: simply averaging beats without alignment obscures true waveform features.

- **Individual beats** exhibit the classic P–QRS–T sequence, fully interpretable by clinicians for rhythm diagnosis.  

- **Class-averaged waveforms** drift away from textbook ECG shapes due to heart rate variability.  

- **R–R interval dispersion** shifts the secondary QRS peak to different time indices across classes.  

- **P–QRS interval** remains relatively constant, reflecting its lower dependence on instantaneous heart rate.  

- **Misaligned T- and subsequent QRS complexes** highlight the imperative for precise beat-by-beat alignment before averaging. 

- <u>**Conclusion:**</u> To unlock meaningful global model interpretabiliy a data preprocessing and alignment of the signals is necessary  

##### **Key Point:** In contrast to individual signals, the mean signals by class do not allow clear correlation to the different phases of the heart cycle.✅

---
## **✅ 3. Data Preprocessing**
---

### <u>**Two Different Preprocessors:**</u>

Both preprocessors align the R-wave peak to optimize ECG signal classification, but employ fundamentally different approaches regarding data preservation and signal manipulation.

**➤ Preprocessor 1 (High Data Alteration - P-QRS-T):**
- **Data Loss:** Removes first 22 columns to eliminate noise and duplications
- **Alignment Strategy:** Aligns R-wave to earlier column index, requiring extensive left-shifting (>90% of signals)
- **Signal Integrity:** Creates highly consistent averaged signals but with reduced information content
- **Morphology Impact:** T-wave artificially merges with QRS complex due to left-shifting

**➤ Preprocessor 2 (Low Data Alteration - (R)-T-P-QRS):**
- **Data Preservation:** Retains all original columns, only ignores first 20 for R-peak detection
- **Alignment Strategy:** Aligns R-wave to higher column index (145), primarily using right-shifting
- **Signal Integrity:** Maintains complete information content with minimal signal manipulation  
- **Morphology Impact:** First incomplete R-peak overlaps with T-wave region, reducing interpretability in that area

**Key Difference:** Preprocessor 1 prioritizes signal consistency through data reduction, while Preprocessor 2 emphasizes data preservation with accepted morphological trade-offs.

##### **Key Point:** To Achieve Global Interpretability through P-QRS-T Signal Alignment: Preprocessor 1 prioritizes signal consistency through data reduction, while Preprocessor 2 emphasizes data preservation with accepted morphological trade-offs✅

---

### **Example: Individual Signal Alteration**

#### **➤ Preprocessor 1:**
<img src="../visualizations/signal_alteration_p1.png" alt="Signal" width="800"/>

=== Figure 9 ===

#### **➤ Preprocessor 2:**
<img src="../visualizations/signal_alteration_p2.png" alt="Signal" width="800"/>

=== Figure 10 ===

---
### **Mean Signal Alteration**

#### **➤ Preprocessor 1:**
<img src="../visualizations/mean_p1.png" alt="Signal" width="900"/>

=== Figure 11 ===

#### **➤ Preprocessor 2:**
<img src="../visualizations/mean_p2.png" alt="Signal" width="900"/>

=== Figure 12 ===

**Conclusion:** 
- **Preprocessor 1:** Creates compact P-QRS-T sequences with clear phase separation but artificial T-wave positioning
- **Preprocessor 2:** Maintains natural cardiac cycle timing with complete (R)-T-P-QRS representation, enabling better feature distinction for minority class detection
---

### <u>**Results on Preprocessed Data:**</u>
#### Preprocessor 1: ------------------------------------------------------ Preprocessor 2:

<img src="../visualizations/preprocess_results_comparison.png" alt="Signal" width="1200"/>

=== Figure 13 ===

### <u>**Classification Results - Comparison of Preprocessor 1 & 2:**</u>

**➤ Preprocessor 1 (Left):**
- **Accuracy & F1:** 97% accuracy, weighted F1 = 0.9728 — strong overall performance
- **Minority Recall:** SVEB recall 0.61, Fusion recall 0.67 — significant room for improvement
- **Precision:** ≥ 0.80 for all classes, with Normal at 0.98

**➤ Preprocessor 2 (Right):**
- **Accuracy & F1:** 98% accuracy, weighted F1 = 0.9791 — superior overall performance
- **Minority Recall:** SVEB recall 0.74, Fusion recall 0.80 — notably improved detection
- **Precision:** ≥ 0.80 for all classes, with Unclassified at perfect 1.00

##### **Key Point:** Preprocessor 2 represents an excellent compromise between interpretability and overall model performance - better minority class detection (+13-19% recall improvement) while maintaining high precision across all categories✅


---
## **✅ 4. Interpretability**
---

### <u>**Global Model Interpretability with SHAP:**</u>

SHAP (SHapley Additive exPlanations) analysis reveals which temporal features drive classification decisions by quantifying each time step's contribution to model predictions. Using DeepExplainer with 1000 background samples, mean SHAP values are computed across all classes to identify critical cardiac cycle phases.

**Visualization Structure:**
- **Blue Signal Lines:** Mean ECG amplitude per class across all time steps
- **Color-Coded Bars:** SHAP importance values (Red = positive contribution, Blue = negative contribution)
- **Intensity:** Magnitude of SHAP influence on classification decisions

---
### **SHAP-Values & Mean ECG-Signals**

#### **➤ Preprocessor 1:**
<img src="../visualizations/shap_p1.png" alt="Signal" width="1000"/>

=== Figure 14 ===

#### **➤ Preprocessor 2:**
<img src="../visualizations/shap_p2.png" alt="Signal" width="1000"/>

=== Figure 15 ===

---

### <u>**SHAP Analysis Results Comparison:**</u>

**➤ Preprocessor 1 (P-QRS-T Configuration):**
- **Feature Concentration:** SHAP values cluster around P-wave (columns 21-23) and QRS regions
- **Anomalous Patterns:** Class 1 shows unexpected P-wave importance despite being SVEB beats
- **T-Wave Relevance:** Classes 1 and 4 demonstrate broader SHAP distribution extending into T-wave regions
- **Interpretability:** Clear temporal localization but with artificial phase relationships due to left-shifting

**➤ Preprocessor 2 ((R)-T-P-QRS Configuration):**
- **Distributed Importance:** More homogeneous SHAP distribution across multiple temporal regions
- **Natural Phases:** Better preservation of cardiac cycle timing enables authentic phase-specific analysis
- **Pacemaker Detection:** Class 4 shows clear SHAP attribution to incomplete initial R-peak, validating clinical relevance
- **Model Behavior:** Direct linkage between SHAP patterns and natural cardiac cycle phases

##### **Key Point:** Preprocessor 2 enables more clinically meaningful interpretability by maintaining natural cardiac timing, while Preprocessor 1's artificial phase compression limits biological relevance of SHAP attributions.✅

**➤ Limitations:**
- **Computational Constraints:** Background sample size limited to 1000 due to processing time requirements
- **Class Imbalance Impact:** SHAP significance reduced for minority classes (SVEB, Fusion) due to limited representation
- **Sample Size:** Test sample restricted to 1000, potentially affecting generalizability of SHAP patterns


##### **Key Point:** SHAP shows which parts of the P, QRS, and T waves drive each prediction — now allowing direct correlation of SHAP values to the different phases of the heart cycle✅