# Phase 3: Iterative Reduktions-Evaluierung (LDA/QDA Benchmarking)

**Masterarbeit:** Zerstörungsfreie Werkstoffprüfung mittels 3MA-X8-Mikromagnetik  
**Input:** 8 Feature-Rankings aus Phase 2  
**Output:** Pareto-Kurven (Feature-Anzahl vs. Performance)

---

## Methodische Grundlagen

### Evaluierungsprotokoll

1. **Reduktionsstufen:** 10 Stufen pro Ranking  
   → 90%, 80%, 70%, 60%, 50%, 40%, 30%, 20%, 10%, 5% der ~84 Features

2. **Klassifikatoren:**
   - **LDA (Linear Discriminant Analysis):** Annahme gemeinsamer Kovarianzmatrizen
   - **QDA (Quadratic Discriminant Analysis):** Klassenspezifische Kovarianzmatrizen

3. **Validierung:** 5-Fold GroupKFold CV  
   → Preprocessing (Imputation + Scaling) **innerhalb** jedes Folds!

4. **Metriken (Priorität):**
   1. Balanced Accuracy
   2. F1-Score (macro)
   3. Cohen's Kappa
   4. Standard Accuracy

5. **Konfidenzintervalle:** 95% CI via t-Verteilung (df=4)

### KRITISCHE WARNUNG

**QDA ist extrem anfällig für Overfitting bei kleinen Stichproben!**  
Mit typischerweise nur 2-3 Samples pro Klasse im Training-Fold (bei 12 Klassen) sind instabile Parameterschätzungen zu erwarten.

---

In [None]:
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from tqdm.notebook import tqdm
import warnings
warnings.filterwarnings('ignore')

# Custom Utilities
import sys
sys.path.append('..')
from utils.validation import create_group_kfold_splits, calculate_confidence_intervals
from utils.metrics import compute_classification_metrics, aggregate_cv_metrics
from utils.visualization import plot_pareto_curve, plot_classifier_comparison

# Plotting
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

## 1. Daten laden (Output von Phase 1)

In [None]:
# Daten
DATA_PATH = '../data/processed/features_after_phase1.csv'
df = pd.read_csv(DATA_PATH)

TARGET_COL = 'class'
GROUP_COL = 'sample_id'

feature_cols = [col for col in df.columns if col not in [TARGET_COL, GROUP_COL]]
X = df[feature_cols].copy()
y = df[TARGET_COL].copy()
groups = df[GROUP_COL].copy()

print(f"✓ Daten geladen: {X.shape}")
print(f"  Features: {X.shape[1]}")
print(f"  Samples: {X.shape[0]}")
print(f"  Klassen: {y.nunique()}")
print(f"  Gruppen: {groups.nunique()}")

## 2. Rankings laden (Output von Phase 2)

In [None]:
import os
import glob

# Lade alle Rankings
ranking_files = glob.glob('../results/rankings/phase2_ranking_*.csv')

rankings = {}
for file_path in ranking_files:
    method_name = os.path.basename(file_path).replace('phase2_ranking_', '').replace('.csv', '')
    rankings[method_name] = pd.read_csv(file_path)

print(f"✓ {len(rankings)} Rankings geladen:")
for method_name in rankings.keys():
    print(f"  - {method_name}")

## 3. Reduktionsstufen definieren

In [None]:
# 10 Reduktionsstufen (Prozentuale Anteile)
REDUCTION_PERCENTAGES = [0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.05]

# Anzahl Features pro Stufe
n_total_features = X.shape[1]
reduction_steps = [max(1, int(n_total_features * p)) for p in REDUCTION_PERCENTAGES]

print("Reduktionsstufen:")
for pct, n_feat in zip(REDUCTION_PERCENTAGES, reduction_steps):
    print(f"  {int(pct*100):2d}%: {n_feat:3d} Features")

## 4. Evaluierungs-Pipeline

In [None]:
def evaluate_feature_subset(
    X, y, groups,
    feature_subset,
    classifier_name='LDA',
    n_splits=5,
    random_state=42
):
    """
    Evaluiert ein Feature-Subset mit LDA oder QDA via GroupKFold CV.
    
    KRITISCH: Preprocessing innerhalb jedes Folds!
    
    Returns:
    --------
    dict: Aggregierte Metriken mit CI
    """
    # Subset-Daten
    X_subset = X[feature_subset].copy()
    
    # Klassifikator
    if classifier_name == 'LDA':
        clf_base = LinearDiscriminantAnalysis(solver='lsqr', shrinkage='auto')
    elif classifier_name == 'QDA':
        clf_base = QuadraticDiscriminantAnalysis(reg_param=0.1)  # Regularisierung!
    else:
        raise ValueError(f"Unbekannter Klassifikator: {classifier_name}")
    
    # Pipeline: Imputation → Scaling → Klassifikator
    pipeline = Pipeline([
        ('imputer', SimpleImputer(strategy='median')),
        ('scaler', StandardScaler()),
        ('classifier', clf_base)
    ])
    
    # GroupKFold
    gkf = create_group_kfold_splits(n_splits=n_splits)
    
    cv_results = []
    
    for train_idx, test_idx in gkf.split(X_subset, y, groups):
        X_train, X_test = X_subset.iloc[train_idx], X_subset.iloc[test_idx]
        y_train, y_test = y.iloc[train_idx], y.iloc[test_idx]
        
        try:
            # Fit Pipeline (Preprocessing + Classifier)
            pipeline.fit(X_train, y_train)
            
            # Predict
            y_pred = pipeline.predict(X_test)
            
            # Metriken
            fold_metrics = compute_classification_metrics(y_test, y_pred)
            cv_results.append(fold_metrics)
            
        except Exception as e:
            # Bei Fehler (z.B. QDA Singularität): NaN-Metriken
            cv_results.append({
                'balanced_accuracy': np.nan,
                'f1_macro': np.nan,
                'cohen_kappa': np.nan,
                'accuracy': np.nan
            })
    
    # Aggregation
    aggregated = aggregate_cv_metrics(cv_results)
    
    return aggregated


print("✓ Evaluierungs-Pipeline definiert")

## 5. Haupt-Evaluierungs-Loop

**WARNUNG:** Dies kann mehrere Minuten dauern (8 Rankings × 10 Stufen × 2 Klassifikatoren × 5 Folds = 800 Trainings)!

In [None]:
# Storage für Ergebnisse
results_master = []

# Loop über Rankings
for method_name, ranking_df in tqdm(rankings.items(), desc="Methoden"):
    print(f"\n{'='*70}")
    print(f"Evaluiere: {method_name}")
    print(f"{'='*70}")
    
    # Loop über Reduktionsstufen
    for k_features in tqdm(reduction_steps, desc="Reduktionsstufen", leave=False):
        # Top-K Features
        top_k_features = ranking_df.head(k_features)['feature'].tolist()
        
        # Evaluierung mit LDA
        try:
            lda_metrics = evaluate_feature_subset(
                X=X, y=y, groups=groups,
                feature_subset=top_k_features,
                classifier_name='LDA',
                n_splits=5
            )
        except:
            lda_metrics = None
        
        # Evaluierung mit QDA
        try:
            qda_metrics = evaluate_feature_subset(
                X=X, y=y, groups=groups,
                feature_subset=top_k_features,
                classifier_name='QDA',
                n_splits=5
            )
        except:
            qda_metrics = None
        
        # Speichere Ergebnisse
        result_row = {
            'method': method_name,
            'n_features': k_features,
            'pct_features': k_features / n_total_features
        }
        
        # LDA Metriken
        if lda_metrics is not None:
            for metric in ['balanced_accuracy', 'f1_macro', 'cohen_kappa', 'accuracy']:
                if metric in lda_metrics.index:
                    result_row[f'LDA_{metric}_mean'] = lda_metrics.loc[metric, 'mean']
                    result_row[f'LDA_{metric}_std'] = lda_metrics.loc[metric, 'std']
                    result_row[f'LDA_{metric}_ci_lower'] = lda_metrics.loc[metric, 'ci_lower']
                    result_row[f'LDA_{metric}_ci_upper'] = lda_metrics.loc[metric, 'ci_upper']
        
        # QDA Metriken
        if qda_metrics is not None:
            for metric in ['balanced_accuracy', 'f1_macro', 'cohen_kappa', 'accuracy']:
                if metric in qda_metrics.index:
                    result_row[f'QDA_{metric}_mean'] = qda_metrics.loc[metric, 'mean']
                    result_row[f'QDA_{metric}_std'] = qda_metrics.loc[metric, 'std']
                    result_row[f'QDA_{metric}_ci_lower'] = qda_metrics.loc[metric, 'ci_lower']
                    result_row[f'QDA_{metric}_ci_upper'] = qda_metrics.loc[metric, 'ci_upper']
        
        results_master.append(result_row)

# Results DataFrame
results_df = pd.DataFrame(results_master)

print("\n" + "="*70)
print("✓ EVALUIERUNG ABGESCHLOSSEN")
print("="*70)
print(f"Gesamt-Evaluierungen: {len(results_df)}")
print(f"Methoden: {results_df['method'].nunique()}")
print(f"Reduktionsstufen: {results_df['n_features'].nunique()}")

## 6. Ergebnisse anzeigen

In [None]:
# Beispiel: ANOVA-Ranking
if 'ANOVA' in results_df['method'].values:
    anova_results = results_df[results_df['method'] == 'ANOVA'].sort_values('n_features', ascending=False)
    
    print("\nANOVA-Ranking - LDA Performance:")
    print(anova_results[['n_features', 'LDA_balanced_accuracy_mean', 'LDA_balanced_accuracy_ci_lower', 'LDA_balanced_accuracy_ci_upper']].to_string(index=False))
    
    print("\nANOVA-Ranking - QDA Performance:")
    print(anova_results[['n_features', 'QDA_balanced_accuracy_mean', 'QDA_balanced_accuracy_ci_lower', 'QDA_balanced_accuracy_ci_upper']].to_string(index=False))

## 7. Visualisierung: Pareto-Kurven

### 7.1 LDA Pareto-Kurve

In [None]:
# Bereite Daten für Pareto-Plot vor (LDA)
pareto_data_lda = results_df[['method', 'n_features', 'LDA_balanced_accuracy_mean', 
                               'LDA_balanced_accuracy_ci_lower', 'LDA_balanced_accuracy_ci_upper']].copy()
pareto_data_lda.columns = ['method', 'n_features', 'mean', 'ci_lower', 'ci_upper']
pareto_data_lda = pareto_data_lda.dropna()

# Plot
fig_lda = plot_pareto_curve(
    results_df=pareto_data_lda,
    metric='balanced_accuracy',
    classifier='LDA',
    save_path='../results/plots/phase3_pareto_lda.png'
)
plt.show()

### 7.2 QDA Pareto-Kurve

In [None]:
# Bereite Daten für Pareto-Plot vor (QDA)
pareto_data_qda = results_df[['method', 'n_features', 'QDA_balanced_accuracy_mean', 
                               'QDA_balanced_accuracy_ci_lower', 'QDA_balanced_accuracy_ci_upper']].copy()
pareto_data_qda.columns = ['method', 'n_features', 'mean', 'ci_lower', 'ci_upper']
pareto_data_qda = pareto_data_qda.dropna()

# Plot
fig_qda = plot_pareto_curve(
    results_df=pareto_data_qda,
    metric='balanced_accuracy',
    classifier='QDA',
    save_path='../results/plots/phase3_pareto_qda.png'
)
plt.show()

### 7.3 LDA vs. QDA Vergleich (Beste Methode bei 50% Features)

In [None]:
# Finde besten Punkt bei 50% Features
target_pct = 0.50
subset_50 = results_df[np.abs(results_df['pct_features'] - target_pct) < 0.05].copy()

if len(subset_50) > 0:
    # Beste Methode für LDA
    best_lda = subset_50.loc[subset_50['LDA_balanced_accuracy_mean'].idxmax()]
    print(f"Beste LDA-Performance bei ~50% Features:")
    print(f"  Methode: {best_lda['method']}")
    print(f"  Features: {best_lda['n_features']}")
    print(f"  Balanced Accuracy: {best_lda['LDA_balanced_accuracy_mean']:.3f} "
          f"[{best_lda['LDA_balanced_accuracy_ci_lower']:.3f}, {best_lda['LDA_balanced_accuracy_ci_upper']:.3f}]")
    
    # Beste Methode für QDA
    best_qda = subset_50.loc[subset_50['QDA_balanced_accuracy_mean'].idxmax()]
    print(f"\nBeste QDA-Performance bei ~50% Features:")
    print(f"  Methode: {best_qda['method']}")
    print(f"  Features: {best_qda['n_features']}")
    print(f"  Balanced Accuracy: {best_qda['QDA_balanced_accuracy_mean']:.3f} "
          f"[{best_qda['QDA_balanced_accuracy_ci_lower']:.3f}, {best_qda['QDA_balanced_accuracy_ci_upper']:.3f}]")

## 8. Ergebnisse speichern

In [None]:
import os
os.makedirs('../results/evaluations', exist_ok=True)

# Master-Tabelle
output_path = '../results/evaluations/phase3_evaluation_master.csv'
results_df.to_csv(output_path, index=False)
print(f"✓ Evaluierungs-Ergebnisse gespeichert: {output_path}")

# Kompakte Version (nur Balanced Accuracy)
compact_cols = ['method', 'n_features', 'pct_features',
                'LDA_balanced_accuracy_mean', 'LDA_balanced_accuracy_ci_lower', 'LDA_balanced_accuracy_ci_upper',
                'QDA_balanced_accuracy_mean', 'QDA_balanced_accuracy_ci_lower', 'QDA_balanced_accuracy_ci_upper']
compact_df = results_df[compact_cols].copy()

compact_path = '../results/evaluations/phase3_evaluation_compact.csv'
compact_df.to_csv(compact_path, index=False)
print(f"✓ Kompakte Evaluierung gespeichert: {compact_path}")

---
## ✓ Phase 3 abgeschlossen!

**Nächster Schritt:** Notebook 4 - Phase 4: Konsensus-Analyse und finales Ranking