# WESAD Training - Google Colab

This notebook trains PyTorch models for binary stress classification using the WESAD dataset.

## Features:
- ✅ **Baseline**: Optimized LSTM-only model
- ✅ **DP**: Differential Privacy with Opacus

## Requirements:
- Google Colab with GPU enabled
- Data in Google Drive: `mydrive/mhealth-data/data/processed/wesad/`

## 1. Initial Setup

In [None]:
# Install dependencies
!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install -q numpy pandas scikit-learn matplotlib seaborn opacus

# Clone repository
!git clone https://github.com/vasco-fernandes21/mhealth-data-privacy.git
import sys
sys.path.append('/content/mhealth-data-privacy')
sys.path.append('/content/mhealth-data-privacy/src')

print('✅ Setup complete')

In [None]:
# Mount Google Drive
from google.colab import drive
import os

drive.mount('/content/drive')
data_path = '/content/drive/MyDrive/mhealth-data/data/processed/wesad'

if os.path.exists(data_path):
    print(f'✅ Data found')
else:
    print(f'❌ Data not found')

## 2. Train Baseline

In [None]:
!python /content/mhealth-data-privacy/src/train/wesad/train_baseline.py

## 3. Train DP Model

In [None]:
!python /content/mhealth-data-privacy/src/train/wesad/differential_privacy/train_dp.py

## 4. Setup Data Paths

In [None]:
# Setup directories and symlink
import shutil

repo_data_dir = '/content/mhealth-data-privacy/data/processed/wesad'
drive_data_dir = '/content/drive/MyDrive/mhealth-data/data/processed/wesad'

# Create symlink
os.makedirs('/content/mhealth-data-privacy/data/processed', exist_ok=True)
if os.path.exists(repo_data_dir):
    if os.path.islink(repo_data_dir):
        os.unlink(repo_data_dir)
    else:
        shutil.rmtree(repo_data_dir)

!ln -sf $drive_data_dir $repo_data_dir
print(f'✅ Data linked: {repo_data_dir} -> {drive_data_dir}')

## 5. Analyze Results

In [None]:
import json
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Load results
baseline_path = '/content/mhealth-data-privacy/models/wesad/baseline/results_wesad_baseline.json'
dp_path = '/content/mhealth-data-privacy/models/wesad/dp/results_wesad_dp.json'

with open(baseline_path) as f:
    baseline = json.load(f)
with open(dp_path) as f:
    dp = json.load(f)

# Print comparison
print('📊 RESULTS COMPARISON')
print('=' * 70)
print(f"{'Metric':<20} {'Baseline':<15} {'DP':<15} {'Degradation':<15}")
print('-' * 70)

for metric in ['accuracy', 'precision', 'recall', 'f1_score']:
    b_val = baseline[metric]
    d_val = dp[metric]
    deg = (b_val - d_val) / b_val * 100
    print(f"{metric.capitalize():<20} {b_val:.4f}        {d_val:.4f}        {deg:.2f}%")

print(f"\nPrivacy: ε={dp['dp_params']['final_epsilon']:.2f}, δ={dp['dp_params']['delta']}")

deg_pct = (baseline['accuracy'] - dp['accuracy']) / baseline['accuracy'] * 100
status = '✅ Good' if deg_pct <= 20 else '⚠️ High'
print(f'\nPrivacy-Utility Trade-off: {status} ({deg_pct:.2f}% degradation)')

In [None]:
# Confusion matrices
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

cm_b = np.array(baseline['confusion_matrix'])
cm_d = np.array(dp['confusion_matrix'])
labels = baseline['class_names']

sns.heatmap(cm_b, annot=True, fmt='d', cmap='Blues', xticklabels=labels, yticklabels=labels, ax=ax1)
ax1.set_title(f"Baseline (Acc: {baseline['accuracy']*100:.2f}%)")
ax1.set_xlabel('Predicted')
ax1.set_ylabel('Actual')

sns.heatmap(cm_d, annot=True, fmt='d', cmap='Reds', xticklabels=labels, yticklabels=labels, ax=ax2)
ax2.set_title(f"DP (Acc: {dp['accuracy']*100:.2f}%, ε={dp['dp_params']['final_epsilon']:.2f})")
ax2.set_xlabel('Predicted')
ax2.set_ylabel('Actual')

plt.tight_layout()
plt.show()

In [None]:
# Training history
baseline_hist_path = '/content/mhealth-data-privacy/models/wesad/baseline/history_wesad_baseline.json'
dp_hist_path = '/content/mhealth-data-privacy/models/wesad/dp/history_wesad_dp.json'

if os.path.exists(baseline_hist_path) and os.path.exists(dp_hist_path):
    with open(baseline_hist_path) as f:
        b_hist = json.load(f)
    with open(dp_hist_path) as f:
        d_hist = json.load(f)
    
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(14, 10))
    
    # Baseline
    ax1.plot(b_hist['loss'], label='Train', color='blue')
    ax1.plot(b_hist['val_loss'], label='Val', color='lightblue')
    ax1.set_title('Baseline: Loss')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    ax2.plot(b_hist['accuracy'], label='Train', color='green')
    ax2.plot(b_hist['val_accuracy'], label='Val', color='lightgreen')
    ax2.set_title('Baseline: Accuracy')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    # DP
    ax3.plot(d_hist['loss'], label='Train', color='red')
    ax3.plot(d_hist['val_loss'], label='Val', color='lightcoral')
    ax3.set_title('DP: Loss')
    ax3.legend()
    ax3.grid(True, alpha=0.3)
    
    ax4.plot(d_hist['accuracy'], label='Train', color='purple')
    ax4.plot(d_hist['val_accuracy'], label='Val', color='plum')
    ax4.set_title('DP: Accuracy')
    ax4.legend()
    ax4.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Epsilon evolution
    if 'epsilon' in d_hist:
        plt.figure(figsize=(10, 5))
        plt.plot(d_hist['epsilon'], color='orange', linewidth=2)
        plt.axhline(y=dp['dp_params']['target_epsilon'], color='r', linestyle='--', label=f"Target ε={dp['dp_params']['target_epsilon']}")
        plt.title('Privacy Budget Evolution')
        plt.xlabel('Epoch')
        plt.ylabel('Epsilon (ε)')
        plt.legend()
        plt.grid(True, alpha=0.3)
        plt.show()

## Summary

### Architecture:
- LSTM-only: Bidirectional LSTM (2 layers, 64 units)
- Parameters: ~211K
- Normalization: GroupNorm (DP-compatible)

### Results:
- **Baseline**: ~82-83% accuracy
- **DP**: ~66-70% accuracy, ε≤8.0
- **Trade-off**: ~15-20% degradation

### Next Steps:
1. ✅ Baseline and DP trained
2. 🔄 Implement Federated Learning
3. 🔄 Compare all approaches

---
**SIDM MSc - MHealth Data Privacy** 🚀