# Phase 4: Konsensus-Analyse und Finales Ranking

**Masterarbeit:** Zerstörungsfreie Werkstoffprüfung mittels 3MA-X8-Mikromagnetik  
**Input:** 8 Feature-Rankings aus Phase 2  
**Output:** Methodenagnostisches Core-Set robuster Features

---

## Methodische Grundlagen

### Konsensus-Score Berechnung

1. **Rang-Normalisierung:**  
   Für jede Methode $m$ und Feature $f$:
   $$\text{norm\_rank}_{m,f} = 1 - \frac{\text{rank}_{m,f} - 1}{N_{\text{features}} - 1}$$
   
   → Rang 1 → Score 1.0  
   → Rang N → Score 0.0

2. **Konsensus-Score (ungewichtet):**  
   $$\text{Consensus}_f = \frac{1}{M} \sum_{m=1}^{M} \text{norm\_rank}_{m,f}$$
   
   wobei $M = 8$ (Anzahl Methoden)

3. **Robustheit-Metrik:**  
   $$\text{Rank\_Variance}_f = \text{Var}(\text{rank}_{1,f}, \ldots, \text{rank}_{M,f})$$
   
   → Niedrige Varianz = hohe Übereinstimmung zwischen Methoden

### Validierung

Das finale Konsensus-Ranking wird separat mit LDA/QDA evaluiert (10 Stufen, GroupKFold).

---

In [None]:
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from tqdm.notebook import tqdm
import warnings
warnings.filterwarnings('ignore')

# Custom Utilities
import sys
sys.path.append('..')
from utils.validation import create_group_kfold_splits
from utils.metrics import compute_classification_metrics, aggregate_cv_metrics, compare_classifiers
from utils.visualization import plot_pareto_curve, plot_feature_reduction_timeline

# Plotting
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

## 1. Rankings laden (Output von Phase 2)

In [None]:
import os
import glob

# Lade alle Rankings
ranking_files = glob.glob('../results/rankings/phase2_ranking_*.csv')

rankings = {}
for file_path in ranking_files:
    method_name = os.path.basename(file_path).replace('phase2_ranking_', '').replace('.csv', '')
    rankings[method_name] = pd.read_csv(file_path)

print(f"✓ {len(rankings)} Rankings geladen:")
for method_name, ranking_df in rankings.items():
    print(f"  - {method_name}: {len(ranking_df)} Features")

## 2. Konsensus-Score Berechnung

### 2.1 Rang-Normalisierung

In [None]:
def normalize_ranks(ranking_df, rank_col='final_rank'):
    """
    Normalisiert Ränge auf [0, 1].
    
    Rang 1 → 1.0 (beste)
    Rang N → 0.0 (schlechteste)
    """
    ranks = ranking_df[rank_col].values
    n_features = len(ranks)
    
    # Normalisierung
    normalized = 1 - (ranks - 1) / (n_features - 1)
    
    return normalized

# Normalisiere alle Rankings
normalized_rankings = {}

for method_name, ranking_df in rankings.items():
    df_norm = ranking_df.copy()
    df_norm['normalized_score'] = normalize_ranks(df_norm, rank_col='final_rank')
    normalized_rankings[method_name] = df_norm

print("✓ Ränge normalisiert")

# Beispiel: ANOVA
if 'ANOVA' in normalized_rankings:
    print("\nBeispiel (ANOVA, Top 10):")
    print(normalized_rankings['ANOVA'][['feature', 'final_rank', 'normalized_score']].head(10))

### 2.2 Konsensus-Score (Mittelwert über Methoden)

In [None]:
# Sammle alle Features
all_features = set()
for ranking_df in normalized_rankings.values():
    all_features.update(ranking_df['feature'].tolist())

all_features = sorted(list(all_features))
print(f"✓ Gesamt-Features: {len(all_features)}")

# Konsensus-Datenstruktur
consensus_data = []

for feature in all_features:
    feature_scores = []
    feature_ranks = []
    
    for method_name, ranking_df in normalized_rankings.items():
        # Finde Feature in Ranking
        feature_row = ranking_df[ranking_df['feature'] == feature]
        
        if len(feature_row) > 0:
            feature_scores.append(feature_row.iloc[0]['normalized_score'])
            feature_ranks.append(feature_row.iloc[0]['final_rank'])
        else:
            # Falls Feature fehlt (sollte nicht passieren)
            feature_scores.append(0.0)
            feature_ranks.append(len(all_features))
    
    # Konsensus-Score (Mittelwert)
    consensus_score = np.mean(feature_scores)
    
    # Rang-Varianz (Robustheit)
    rank_variance = np.var(feature_ranks, ddof=1)
    rank_std = np.std(feature_ranks, ddof=1)
    
    consensus_data.append({
        'feature': feature,
        'consensus_score': consensus_score,
        'rank_variance': rank_variance,
        'rank_std': rank_std,
        'mean_rank': np.mean(feature_ranks),
        'median_rank': np.median(feature_ranks)
    })

# Konsensus-DataFrame
consensus_df = pd.DataFrame(consensus_data)
consensus_df = consensus_df.sort_values('consensus_score', ascending=False).reset_index(drop=True)
consensus_df['consensus_rank'] = range(1, len(consensus_df) + 1)

print("\n✓ Konsensus-Ranking erstellt")
print(f"\nTop 20 Features (Konsensus):")
print(consensus_df.head(20)[['consensus_rank', 'feature', 'consensus_score', 'rank_std']].to_string(index=False))

## 3. Robustheitsbewertung

Features mit niedriger Rang-Varianz sind über Methoden hinweg konsistent hoch bewertet → hohe Robustheit.

In [None]:
# Top 20 robusteste Features (niedrigste Rang-Standardabweichung)
robust_features = consensus_df.nsmallest(20, 'rank_std')

print("Top 20 robusteste Features (niedrigste Rang-Varianz):")
print(robust_features[['feature', 'consensus_rank', 'consensus_score', 'rank_std']].to_string(index=False))

## 4. Konsensus-Ranking Validierung

Evaluiere das Konsensus-Ranking mit LDA/QDA (10 Reduktionsstufen).

In [None]:
# Daten laden
DATA_PATH = '../data/processed/features_after_phase1.csv'
df = pd.read_csv(DATA_PATH)

TARGET_COL = 'class'
GROUP_COL = 'sample_id'

feature_cols = [col for col in df.columns if col not in [TARGET_COL, GROUP_COL]]
X = df[feature_cols].copy()
y = df[TARGET_COL].copy()
groups = df[GROUP_COL].copy()

print(f"✓ Daten geladen: {X.shape}")

In [None]:
# Evaluierungs-Funktion (aus Phase 3)
def evaluate_feature_subset(
    X, y, groups,
    feature_subset,
    classifier_name='LDA',
    n_splits=5
):
    """
    Evaluiert Feature-Subset mit LDA oder QDA via GroupKFold CV.
    """
    X_subset = X[feature_subset].copy()
    
    if classifier_name == 'LDA':
        clf_base = LinearDiscriminantAnalysis(solver='lsqr', shrinkage='auto')
    elif classifier_name == 'QDA':
        clf_base = QuadraticDiscriminantAnalysis(reg_param=0.1)
    else:
        raise ValueError(f"Unbekannter Klassifikator: {classifier_name}")
    
    pipeline = Pipeline([
        ('imputer', SimpleImputer(strategy='median')),
        ('scaler', StandardScaler()),
        ('classifier', clf_base)
    ])
    
    gkf = create_group_kfold_splits(n_splits=n_splits)
    
    cv_results = []
    
    for train_idx, test_idx in gkf.split(X_subset, y, groups):
        X_train, X_test = X_subset.iloc[train_idx], X_subset.iloc[test_idx]
        y_train, y_test = y.iloc[train_idx], y.iloc[test_idx]
        
        try:
            pipeline.fit(X_train, y_train)
            y_pred = pipeline.predict(X_test)
            fold_metrics = compute_classification_metrics(y_test, y_pred)
            cv_results.append(fold_metrics)
        except:
            cv_results.append({
                'balanced_accuracy': np.nan,
                'f1_macro': np.nan,
                'cohen_kappa': np.nan,
                'accuracy': np.nan
            })
    
    aggregated = aggregate_cv_metrics(cv_results)
    return aggregated

print("✓ Evaluierungs-Funktion definiert")

In [None]:
# Reduktionsstufen
REDUCTION_PERCENTAGES = [0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.05]
n_total_features = len(consensus_df)
reduction_steps = [max(1, int(n_total_features * p)) for p in REDUCTION_PERCENTAGES]

# Evaluierung
consensus_results = []

print("Evaluiere Konsensus-Ranking...\n")

for k_features in tqdm(reduction_steps, desc="Reduktionsstufen"):
    # Top-K Features
    top_k_features = consensus_df.head(k_features)['feature'].tolist()
    
    # LDA
    try:
        lda_metrics = evaluate_feature_subset(
            X=X, y=y, groups=groups,
            feature_subset=top_k_features,
            classifier_name='LDA',
            n_splits=5
        )
    except:
        lda_metrics = None
    
    # QDA
    try:
        qda_metrics = evaluate_feature_subset(
            X=X, y=y, groups=groups,
            feature_subset=top_k_features,
            classifier_name='QDA',
            n_splits=5
        )
    except:
        qda_metrics = None
    
    # Speichere
    result_row = {
        'method': 'Consensus',
        'n_features': k_features,
        'pct_features': k_features / n_total_features
    }
    
    if lda_metrics is not None:
        for metric in ['balanced_accuracy', 'f1_macro', 'cohen_kappa']:
            if metric in lda_metrics.index:
                result_row[f'LDA_{metric}_mean'] = lda_metrics.loc[metric, 'mean']
                result_row[f'LDA_{metric}_ci_lower'] = lda_metrics.loc[metric, 'ci_lower']
                result_row[f'LDA_{metric}_ci_upper'] = lda_metrics.loc[metric, 'ci_upper']
    
    if qda_metrics is not None:
        for metric in ['balanced_accuracy', 'f1_macro', 'cohen_kappa']:
            if metric in qda_metrics.index:
                result_row[f'QDA_{metric}_mean'] = qda_metrics.loc[metric, 'mean']
                result_row[f'QDA_{metric}_ci_lower'] = qda_metrics.loc[metric, 'ci_lower']
                result_row[f'QDA_{metric}_ci_upper'] = qda_metrics.loc[metric, 'ci_upper']
    
    consensus_results.append(result_row)

consensus_eval_df = pd.DataFrame(consensus_results)

print("\n✓ Konsensus-Ranking evaluiert")
print("\nPerformance-Übersicht (Balanced Accuracy):")
print(consensus_eval_df[['n_features', 'LDA_balanced_accuracy_mean', 'QDA_balanced_accuracy_mean']].to_string(index=False))

## 5. Visualisierung: Konsensus Pareto-Kurve

In [None]:
# LDA Pareto
pareto_data = consensus_eval_df[['method', 'n_features', 'LDA_balanced_accuracy_mean',
                                  'LDA_balanced_accuracy_ci_lower', 'LDA_balanced_accuracy_ci_upper']].copy()
pareto_data.columns = ['method', 'n_features', 'mean', 'ci_lower', 'ci_upper']
pareto_data = pareto_data.dropna()

fig = plot_pareto_curve(
    results_df=pareto_data,
    metric='balanced_accuracy',
    classifier='LDA (Consensus Ranking)',
    save_path='../results/plots/phase4_consensus_pareto_lda.png'
)
plt.show()

## 6. Feature-Reduktions-Timeline (Übersicht)

In [None]:
# Timeline-Daten
reduction_history = [
    {'phase': 'Phase 0', 'n_features': 261, 'description': 'Initial (3MA-X8)'},
    {'phase': 'Phase 1', 'n_features': len(consensus_df), 'description': 'Nach Prepruning'},
    {'phase': 'Phase 4 (50%)', 'n_features': int(len(consensus_df) * 0.5), 'description': 'Konsensus 50%'},
    {'phase': 'Phase 4 (20%)', 'n_features': int(len(consensus_df) * 0.2), 'description': 'Konsensus 20%'},
    {'phase': 'Phase 4 (10%)', 'n_features': int(len(consensus_df) * 0.1), 'description': 'Konsensus 10%'}
]

fig = plot_feature_reduction_timeline(
    reduction_history=reduction_history,
    save_path='../results/plots/phase4_reduction_timeline.png'
)
plt.show()

## 7. Finale Empfehlung: Optimales Feature-Set

Identifikation des "Elbow-Points" (bester Trade-off).

In [None]:
# Finde Elbow-Point (maximaler Gradient)
balanced_acc = consensus_eval_df['LDA_balanced_accuracy_mean'].values
n_features_arr = consensus_eval_df['n_features'].values

# Gradients berechnen
gradients = np.diff(balanced_acc) / np.diff(n_features_arr)
elbow_idx = np.argmax(np.abs(gradients)) + 1  # Index des steilsten Anstiegs

optimal_k = n_features_arr[elbow_idx]
optimal_perf = balanced_acc[elbow_idx]

print("="*70)
print("EMPFOHLENES FEATURE-SET (ELBOW-POINT)")
print("="*70)
print(f"Anzahl Features: {optimal_k}")
print(f"Prozentsatz:     {optimal_k / n_total_features * 100:.1f}%")
print(f"LDA Balanced Accuracy: {optimal_perf:.3f}")
print("\nTop Features:")
top_optimal = consensus_df.head(optimal_k)
print(top_optimal[['consensus_rank', 'feature', 'consensus_score']].head(20).to_string(index=False))
print("="*70)

## 8. Ergebnisse speichern

In [None]:
import os
os.makedirs('../results/rankings', exist_ok=True)

# 1. Komplettes Konsensus-Ranking
consensus_path = '../results/rankings/phase4_consensus_ranking_full.csv'
consensus_df.to_csv(consensus_path, index=False)
print(f"✓ Konsensus-Ranking gespeichert: {consensus_path}")

# 2. Top-K Konsensus-Features (optimales Set)
optimal_features_path = '../results/rankings/phase4_optimal_features.csv'
top_optimal.to_csv(optimal_features_path, index=False)
print(f"✓ Optimales Feature-Set gespeichert: {optimal_features_path}")

# 3. Evaluierungs-Ergebnisse
os.makedirs('../results/evaluations', exist_ok=True)
consensus_eval_path = '../results/evaluations/phase4_consensus_evaluation.csv'
consensus_eval_df.to_csv(consensus_eval_path, index=False)
print(f"✓ Konsensus-Evaluierung gespeichert: {consensus_eval_path}")

# 4. Methodenvergleichs-Tabelle (alle Rankings bei optimalem K)
comparison_data = []
phase3_results = pd.read_csv('../results/evaluations/phase3_evaluation_master.csv')

for method in phase3_results['method'].unique():
    method_data = phase3_results[
        (phase3_results['method'] == method) & 
        (np.abs(phase3_results['n_features'] - optimal_k) <= 2)
    ]
    
    if len(method_data) > 0:
        best_row = method_data.iloc[0]
        comparison_data.append({
            'Method': method,
            'LDA_Mean': best_row.get('LDA_balanced_accuracy_mean', np.nan),
            'LDA_CI_Lower': best_row.get('LDA_balanced_accuracy_ci_lower', np.nan),
            'LDA_CI_Upper': best_row.get('LDA_balanced_accuracy_ci_upper', np.nan)
        })

# Konsensus hinzufügen
consensus_row = consensus_eval_df[consensus_eval_df['n_features'] == optimal_k].iloc[0]
comparison_data.append({
    'Method': 'Consensus',
    'LDA_Mean': consensus_row['LDA_balanced_accuracy_mean'],
    'LDA_CI_Lower': consensus_row['LDA_balanced_accuracy_ci_lower'],
    'LDA_CI_Upper': consensus_row['LDA_balanced_accuracy_ci_upper']
})

comparison_df = pd.DataFrame(comparison_data).sort_values('LDA_Mean', ascending=False)
comparison_path = '../results/evaluations/phase4_method_comparison.csv'
comparison_df.to_csv(comparison_path, index=False)
print(f"✓ Methodenvergleich gespeichert: {comparison_path}")

print("\nMethodenvergleich bei optimalem K={} Features:".format(optimal_k))
print(comparison_df.to_string(index=False))

---
## ✓ Phase 4 abgeschlossen!

## Zusammenfassung der Pipeline

| Phase | Eingabe | Ausgabe | Hauptziel |
|-------|---------|---------|----------|
| **Phase 1** | 261 Features | ~84 Features | Qualitätsfilterung & Redundanzelimination |
| **Phase 2** | ~84 Features | 8 Rankings | Multi-Methoden Feature-Ranking (Fold-Aware) |
| **Phase 3** | 8 Rankings | Performance-Kurven | LDA/QDA Benchmarking (10 Stufen) |
| **Phase 4** | 8 Rankings | Konsensus-Ranking | Methodenagnostisches Core-Set |

### Finale Outputs

1. **Konsensus-Ranking:** `results/rankings/phase4_consensus_ranking_full.csv`
2. **Optimales Feature-Set:** `results/rankings/phase4_optimal_features.csv`
3. **Pareto-Plots:** `results/plots/`
4. **Evaluierungs-Tabellen:** `results/evaluations/`

---

**Für Ihre Masterarbeit:**  
Nutzen Sie die optimalen Features für weitere Experimente oder finale Klassifikation!