# Hybrid Ensemble System for Fraud Detection

## Tutorial 8: Advanced Ensemble Techniques and Meta-Learning

In this tutorial, you'll learn how to build sophisticated ensemble systems that combine multiple approaches:
- **Context-Aware Ensembles**: Adapt to transaction characteristics
- **Meta-Learning (Stacking)**: Learn from base model predictions
- **Dynamic Model Selection**: Choose optimal models for each prediction
- **Hierarchical Ensembles**: Multi-level ensemble architectures

## Learning Objectives

By the end of this tutorial, you'll understand:

1. **Advanced Ensemble Strategies**: Beyond simple voting and averaging
2. **Meta-Learning**: Train models on model predictions
3. **Context-Aware Systems**: Adapt behavior based on transaction context
4. **Dynamic Model Selection**: Choose best models for each prediction
5. **Hierarchical Ensembles**: Multi-level ensemble architectures
6. **Feature Engineering**: Create meaningful features at multiple levels
7. **Cross-Validation**: Proper validation to avoid overfitting
8. **Performance Optimization**: Combine complementary strengths

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression, Ridge
from sklearn.svm import SVC
from sklearn.cluster import KMeans
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score, roc_auc_score,
    confusion_matrix, classification_report, roc_curve, auc
)
import xgboost as xgb
from scipy import stats
from collections import defaultdict
import warnings
warnings.filterwarnings('ignore')

# Set style for better visualizations
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("Hybrid Ensemble System for Fraud Detection")
print("Advanced ensemble techniques and meta-learning tutorial")

## Part 1: Understanding Ensemble Diversity

### The Power of Diversity

Ensemble methods work best when individual models:
- **Make different types of errors**: Complementary strengths
- **Use different algorithms**: Diverse learning approaches
- **Focus on different aspects**: Feature importance, decision boundaries
- **Perform well in different contexts**: Time, amount, pattern types

### Traditional vs Advanced Ensembles

- **Traditional**: Simple voting, averaging, basic weighting
- **Advanced**: Meta-learning, context-aware weighting, dynamic selection

In [None]:
# Load and prepare data
df = pd.read_csv('creditcard.csv')
print(f"Dataset shape: {df.shape}")
print(f"Fraud rate: {df['Class'].mean()*100:.3f}%")

# Prepare features and target
X = df.drop('Class', axis=1)
y = df['Class']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"\nTraining set: {X_train.shape}")
print(f"Test set: {X_test.shape}")
print(f"Training fraud rate: {y_train.mean()*100:.3f}%")
print(f"Test fraud rate: {y_test.mean()*100:.3f}%")

# Visualize ensemble diversity concept
def visualize_ensemble_diversity():
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # Traditional ensemble
    axes[0, 0].text(0.5, 0.9, 'Traditional Ensemble', ha='center', fontsize=14, fontweight='bold')
    axes[0, 0].text(0.5, 0.7, 'Simple Voting/Averaging', ha='center', fontsize=12)
    axes[0, 0].text(0.5, 0.5, 'Model 1 + Model 2 + Model 3', ha='center', fontsize=11, family='monospace')
    axes[0, 0].text(0.5, 0.3, 'Fixed weights', ha='center', fontsize=10, style='italic')
    axes[0, 0].text(0.5, 0.1, 'Same treatment for all predictions', ha='center', fontsize=10, style='italic')
    axes[0, 0].set_xlim(0, 1)
    axes[0, 0].set_ylim(0, 1)
    axes[0, 0].axis('off')
    
    # Context-aware ensemble
    axes[0, 1].text(0.5, 0.9, 'Context-Aware Ensemble', ha='center', fontsize=14, fontweight='bold')
    axes[0, 1].text(0.5, 0.7, 'Adaptive Weighting', ha='center', fontsize=12)
    axes[0, 1].text(0.5, 0.5, 'w₁(context) × Model 1 + ...', ha='center', fontsize=11, family='monospace')
    axes[0, 1].text(0.5, 0.3, 'Context-specific weights', ha='center', fontsize=10, style='italic')
    axes[0, 1].text(0.5, 0.1, 'Adapts to transaction characteristics', ha='center', fontsize=10, style='italic')
    axes[0, 1].set_xlim(0, 1)
    axes[0, 1].set_ylim(0, 1)
    axes[0, 1].axis('off')
    
    # Meta-learning ensemble
    axes[1, 0].text(0.5, 0.9, 'Meta-Learning Ensemble', ha='center', fontsize=14, fontweight='bold')
    axes[1, 0].text(0.5, 0.7, 'Stacking', ha='center', fontsize=12)
    axes[1, 0].text(0.5, 0.5, 'Meta-Model(pred₁, pred₂, pred₃)', ha='center', fontsize=11, family='monospace')
    axes[1, 0].text(0.5, 0.3, 'Learns from predictions', ha='center', fontsize=10, style='italic')
    axes[1, 0].text(0.5, 0.1, 'Optimal combination strategy', ha='center', fontsize=10, style='italic')
    axes[1, 0].set_xlim(0, 1)
    axes[1, 0].set_ylim(0, 1)
    axes[1, 0].axis('off')
    
    # Dynamic selection
    axes[1, 1].text(0.5, 0.9, 'Dynamic Selection', ha='center', fontsize=14, fontweight='bold')
    axes[1, 1].text(0.5, 0.7, 'Instance-based Selection', ha='center', fontsize=12)
    axes[1, 1].text(0.5, 0.5, 'Select best models per instance', ha='center', fontsize=11, family='monospace')
    axes[1, 1].text(0.5, 0.3, 'Confidence-weighted', ha='center', fontsize=10, style='italic')
    axes[1, 1].text(0.5, 0.1, 'Optimal model per prediction', ha='center', fontsize=10, style='italic')
    axes[1, 1].set_xlim(0, 1)
    axes[1, 1].set_ylim(0, 1)
    axes[1, 1].axis('off')
    
    plt.tight_layout()
    plt.show()
    
    print("Key Advantages of Advanced Ensembles:")
    print("1. Context-Aware: Adapt to transaction characteristics")
    print("2. Meta-Learning: Learn optimal combination strategies")
    print("3. Dynamic: Choose best models for each prediction")
    print("4. Hierarchical: Multi-level ensemble architectures")

visualize_ensemble_diversity()

## Part 2: Context-Aware Ensemble

### Understanding Context in Fraud Detection

Different transaction contexts require different model emphasis:
- **Time of Day**: Different fraud patterns during business hours vs night
- **Transaction Amount**: High-value vs low-value transaction patterns
- **Feature Patterns**: Statistical characteristics of V1-V28 features
- **Historical Performance**: Which models work best in which contexts

### Context-Aware Weighting Strategy

1. **Extract Context Features**: Time, amount, statistical measures
2. **Cluster Similar Contexts**: Group transactions with similar characteristics
3. **Learn Context-Specific Weights**: Optimize weights for each context cluster
4. **Apply Dynamic Weighting**: Use context to determine model weights

In [None]:
class ContextAwareEnsemble:
    """
    Context-aware ensemble that adapts model weights based on transaction context.
    
    Key features:
    - Extracts contextual features (time, amount, statistical measures)
    - Uses clustering to identify similar transaction contexts
    - Learns optimal model weights for each context
    - Dynamically applies context-specific weights
    """
    
    def __init__(self, base_models, n_context_clusters=5):
        """
        Initialize context-aware ensemble.
        
        Args:
            base_models: Dictionary of base models
            n_context_clusters: Number of context clusters
        """
        self.base_models = base_models
        self.n_context_clusters = n_context_clusters
        self.model_names = list(base_models.keys())
        self.n_models = len(base_models)
        
        # Context analysis components
        self.context_clusterer = KMeans(n_clusters=n_context_clusters, random_state=42)
        self.context_weights = {}  # weights for each context cluster
        self.context_scaler = StandardScaler()
        
        # Performance tracking
        self.context_performance = defaultdict(lambda: defaultdict(list))
        
    def _extract_context_features(self, X):
        """
        Extract contextual features from transactions.
        
        Args:
            X: Transaction features
        
        Returns:
            Context feature matrix
        """
        context_features = []
        
        for i in range(len(X)):
            if isinstance(X, pd.DataFrame):
                row = X.iloc[i]
            else:
                row = X[i]
            
            # Time-based features
            if 'Time' in X.columns if isinstance(X, pd.DataFrame) else len(row) > 0:
                time_val = row.iloc[0] if isinstance(X, pd.DataFrame) else row[0]
                hour_of_day = (time_val % (24 * 3600)) // 3600
                is_weekend = ((time_val // (24 * 3600)) % 7) >= 5
            else:
                hour_of_day = 12  # Default
                is_weekend = False
            
            # Amount-based features
            if 'Amount' in X.columns if isinstance(X, pd.DataFrame) else len(row) > 28:
                amount = row['Amount'] if isinstance(X, pd.DataFrame) else row[28]
                amount_bin = self._get_amount_bin(amount)
                log_amount = np.log1p(amount)
            else:
                amount = 0
                amount_bin = 0
                log_amount = 0
            
            # Statistical features from V1-V28
            if isinstance(X, pd.DataFrame):
                v_features = row[[f'V{i}' for i in range(1, 29)]].values
            else:
                v_features = row[1:29]  # Assuming V1-V28 are columns 1-28
            
            v_mean = np.mean(v_features)
            v_std = np.std(v_features)
            v_skew = stats.skew(v_features)
            v_kurtosis = stats.kurtosis(v_features)
            v_max = np.max(v_features)
            v_min = np.min(v_features)
            
            # Combine all context features
            context_feature = [
                hour_of_day,
                float(is_weekend),
                amount_bin,
                log_amount,
                v_mean,
                v_std,
                v_skew,
                v_kurtosis,
                v_max,
                v_min
            ]
            
            context_features.append(context_feature)
        
        return np.array(context_features)
    
    def _get_amount_bin(self, amount):
        """
        Categorize transaction amount into bins.
        
        Args:
            amount: Transaction amount
        
        Returns:
            Amount bin (0-4)
        """
        if amount <= 10:
            return 0  # Very low
        elif amount <= 50:
            return 1  # Low
        elif amount <= 200:
            return 2  # Medium
        elif amount <= 1000:
            return 3  # High
        else:
            return 4  # Very high
    
    def fit(self, X, y):
        """
        Fit the context-aware ensemble.
        
        Args:
            X: Training features
            y: Training labels
        """
        print("Training context-aware ensemble...")
        
        # Train base models
        for name, model in self.base_models.items():
            print(f"  Training {name}...")
            model.fit(X, y)
        
        # Extract context features
        context_features = self._extract_context_features(X)
        
        # Scale context features
        context_features_scaled = self.context_scaler.fit_transform(context_features)
        
        # Cluster contexts
        context_clusters = self.context_clusterer.fit_predict(context_features_scaled)
        
        # Learn context-specific weights
        self._learn_context_weights(X, y, context_clusters)
        
        print(f"  Identified {self.n_context_clusters} context clusters")
        print(f"  Learned context-specific weights for {self.n_models} models")
    
    def _learn_context_weights(self, X, y, context_clusters):
        """
        Learn optimal model weights for each context cluster.
        
        Args:
            X: Training features
            y: Training labels
            context_clusters: Cluster assignments for each sample
        """
        # Get predictions from all base models
        base_predictions = np.zeros((len(X), self.n_models))
        
        for i, (name, model) in enumerate(self.base_models.items()):
            if hasattr(model, 'predict_proba'):
                base_predictions[:, i] = model.predict_proba(X)[:, 1]
            else:
                base_predictions[:, i] = model.decision_function(X)
        
        # Learn weights for each context cluster
        for cluster_id in range(self.n_context_clusters):
            cluster_mask = context_clusters == cluster_id
            
            if np.sum(cluster_mask) < 10:  # Skip clusters with too few samples
                self.context_weights[cluster_id] = np.ones(self.n_models) / self.n_models
                continue
            
            cluster_predictions = base_predictions[cluster_mask]
            cluster_labels = y.iloc[cluster_mask] if isinstance(y, pd.Series) else y[cluster_mask]
            
            # Use Ridge regression to learn optimal weights
            ridge = Ridge(alpha=1.0, fit_intercept=False, positive=True)
            ridge.fit(cluster_predictions, cluster_labels)
            
            # Normalize weights to sum to 1
            weights = ridge.coef_
            weights = np.maximum(weights, 0)  # Ensure non-negative
            weights = weights / np.sum(weights) if np.sum(weights) > 0 else np.ones(self.n_models) / self.n_models
            
            self.context_weights[cluster_id] = weights
    
    def predict_proba(self, X):
        """
        Make predictions using context-aware weighting.
        
        Args:
            X: Features
        
        Returns:
            Prediction probabilities
        """
        # Extract context features
        context_features = self._extract_context_features(X)
        context_features_scaled = self.context_scaler.transform(context_features)
        
        # Assign context clusters
        context_clusters = self.context_clusterer.predict(context_features_scaled)
        
        # Get base model predictions
        base_predictions = np.zeros((len(X), self.n_models))
        
        for i, (name, model) in enumerate(self.base_models.items()):
            if hasattr(model, 'predict_proba'):
                base_predictions[:, i] = model.predict_proba(X)[:, 1]
            else:
                # Convert decision function to probabilities
                decisions = model.decision_function(X)
                base_predictions[:, i] = 1 / (1 + np.exp(-decisions))
        
        # Apply context-specific weights
        ensemble_predictions = np.zeros(len(X))
        
        for i in range(len(X)):
            cluster_id = context_clusters[i]
            weights = self.context_weights.get(cluster_id, np.ones(self.n_models) / self.n_models)
            ensemble_predictions[i] = np.dot(base_predictions[i], weights)
        
        # Convert to probability format
        prob_positive = ensemble_predictions
        prob_negative = 1 - prob_positive
        
        return np.column_stack([prob_negative, prob_positive])
    
    def predict(self, X):
        """Make binary predictions."""
        probabilities = self.predict_proba(X)
        return (probabilities[:, 1] > 0.5).astype(int)
    
    def get_context_analysis(self):
        """
        Get analysis of context clusters and their weights.
        
        Returns:
            Dictionary with context analysis
        """
        analysis = {
            'n_clusters': self.n_context_clusters,
            'cluster_weights': self.context_weights,
            'model_names': self.model_names
        }
        
        return analysis

# Create and test context-aware ensemble
print("Creating Context-Aware Ensemble...")

# Define base models
base_models = {
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42, class_weight='balanced'),
    'XGBoost': xgb.XGBClassifier(n_estimators=100, random_state=42, scale_pos_weight=10, eval_metric='logloss'),
    'Logistic Regression': LogisticRegression(random_state=42, class_weight='balanced', max_iter=1000),
    'Gradient Boosting': GradientBoostingClassifier(n_estimators=100, random_state=42)
}

# Create context-aware ensemble
context_ensemble = ContextAwareEnsemble(base_models, n_context_clusters=5)

# Train ensemble
context_ensemble.fit(X_train, y_train)

# Make predictions
context_predictions = context_ensemble.predict(X_test)
context_probabilities = context_ensemble.predict_proba(X_test)

# Evaluate performance
context_accuracy = accuracy_score(y_test, context_predictions)
context_precision = precision_score(y_test, context_predictions)
context_recall = recall_score(y_test, context_predictions)
context_f1 = f1_score(y_test, context_predictions)
context_roc_auc = roc_auc_score(y_test, context_probabilities[:, 1])

print(f"\nContext-Aware Ensemble Performance:")
print(f"Accuracy: {context_accuracy:.4f}")
print(f"Precision: {context_precision:.4f}")
print(f"Recall: {context_recall:.4f}")
print(f"F1-Score: {context_f1:.4f}")
print(f"ROC-AUC: {context_roc_auc:.4f}")

# Analyze context clusters
context_analysis = context_ensemble.get_context_analysis()
print(f"\nContext Analysis:")
print(f"Number of clusters: {context_analysis['n_clusters']}")
print(f"\nCluster weights:")
for cluster_id, weights in context_analysis['cluster_weights'].items():
    print(f"  Cluster {cluster_id}:")
    for i, (model_name, weight) in enumerate(zip(context_analysis['model_names'], weights)):
        print(f"    {model_name}: {weight:.3f}")

## Part 3: Meta-Learning Ensemble (Stacking)

### Understanding Meta-Learning

Meta-learning in ensembles involves:
1. **Level 0**: Train base models on original data
2. **Level 1**: Use base model predictions as features for meta-model
3. **Cross-Validation**: Avoid overfitting in meta-feature generation
4. **Feature Engineering**: Create meaningful meta-features

### Advanced Meta-Features

Beyond simple predictions, we can create:
- **Agreement measures**: How much models agree/disagree
- **Confidence measures**: Model uncertainty indicators
- **Ranking features**: Relative model performance
- **Statistical features**: Mean, std, min, max of predictions

In [None]:
class MetaLearnerEnsemble:
    """
    Meta-learning ensemble using stacking with advanced feature engineering.
    
    Key features:
    - Cross-validation for meta-feature generation
    - Advanced meta-feature engineering
    - Multiple meta-learners
    - Ensemble of meta-learners
    """
    
    def __init__(self, base_models, meta_models=None, cv_folds=5):
        """
        Initialize meta-learning ensemble.
        
        Args:
            base_models: Dictionary of base models
            meta_models: Dictionary of meta-learners
            cv_folds: Cross-validation folds for meta-feature generation
        """
        self.base_models = base_models
        self.cv_folds = cv_folds
        self.model_names = list(base_models.keys())
        self.n_models = len(base_models)
        
        # Default meta-learners if not provided
        if meta_models is None:
            self.meta_models = {
                'Meta_Logistic': LogisticRegression(random_state=42, class_weight='balanced'),
                'Meta_RF': RandomForestClassifier(n_estimators=50, random_state=42, class_weight='balanced'),
                'Meta_XGB': xgb.XGBClassifier(n_estimators=50, random_state=42, scale_pos_weight=10, eval_metric='logloss')
            }
        else:
            self.meta_models = meta_models
        
        # Meta-ensemble for combining meta-learners
        self.meta_ensemble = LogisticRegression(random_state=42, class_weight='balanced')
        
    def fit(self, X, y):
        """
        Fit the meta-learning ensemble.
        
        Args:
            X: Training features
            y: Training labels
        """
        print("Training meta-learning ensemble...")
        
        # Train base models
        for name, model in self.base_models.items():
            print(f"  Training base model: {name}")
            model.fit(X, y)
        
        # Generate meta-features using cross-validation
        print("  Generating meta-features...")
        meta_features = self._generate_meta_features(X, y)
        
        # Engineer additional meta-features
        print("  Engineering meta-features...")
        engineered_meta_features = self._engineer_meta_features(meta_features)
        
        # Train meta-learners
        print("  Training meta-learners...")
        for name, meta_model in self.meta_models.items():
            print(f"    Training {name}")
            meta_model.fit(engineered_meta_features, y)
        
        # Generate meta-meta-features for final ensemble
        meta_meta_features = np.zeros((len(X), len(self.meta_models)))
        
        # Use cross-validation to generate meta-meta-features
        skf = StratifiedKFold(n_splits=self.cv_folds, shuffle=True, random_state=42)
        
        for fold, (train_idx, val_idx) in enumerate(skf.split(X, y)):
            # Train meta-learners on fold
            fold_meta_features = engineered_meta_features[train_idx]
            fold_labels = y.iloc[train_idx] if isinstance(y, pd.Series) else y[train_idx]
            
            for i, (name, meta_model) in enumerate(self.meta_models.items()):
                # Clone and train meta-model on fold
                from sklearn.base import clone
                fold_meta_model = clone(meta_model)
                fold_meta_model.fit(fold_meta_features, fold_labels)
                
                # Predict on validation set
                val_meta_features = engineered_meta_features[val_idx]
                if hasattr(fold_meta_model, 'predict_proba'):
                    meta_meta_features[val_idx, i] = fold_meta_model.predict_proba(val_meta_features)[:, 1]
                else:
                    decisions = fold_meta_model.decision_function(val_meta_features)
                    meta_meta_features[val_idx, i] = 1 / (1 + np.exp(-decisions))
        
        # Train final meta-ensemble
        print("  Training final meta-ensemble...")
        self.meta_ensemble.fit(meta_meta_features, y)
        
        print(f"  Meta-learning ensemble trained with {self.n_models} base models and {len(self.meta_models)} meta-learners")
    
    def _generate_meta_features(self, X, y):
        """
        Generate meta-features using cross-validation.
        
        Args:
            X: Training features
            y: Training labels
        
        Returns:
            Meta-features matrix
        """
        meta_features = np.zeros((len(X), self.n_models))
        
        # Use stratified k-fold cross-validation
        skf = StratifiedKFold(n_splits=self.cv_folds, shuffle=True, random_state=42)
        
        for fold, (train_idx, val_idx) in enumerate(skf.split(X, y)):
            # Get fold data
            if isinstance(X, pd.DataFrame):
                X_fold_train, X_fold_val = X.iloc[train_idx], X.iloc[val_idx]
            else:
                X_fold_train, X_fold_val = X[train_idx], X[val_idx]
            
            y_fold_train = y.iloc[train_idx] if isinstance(y, pd.Series) else y[train_idx]
            
            # Train base models on fold and predict on validation
            for i, (name, model) in enumerate(self.base_models.items()):
                # Clone and train model on fold
                from sklearn.base import clone
                fold_model = clone(model)
                fold_model.fit(X_fold_train, y_fold_train)
                
                # Predict on validation set
                if hasattr(fold_model, 'predict_proba'):
                    meta_features[val_idx, i] = fold_model.predict_proba(X_fold_val)[:, 1]
                else:
                    decisions = fold_model.decision_function(X_fold_val)
                    meta_features[val_idx, i] = 1 / (1 + np.exp(-decisions))
        
        return meta_features
    
    def _engineer_meta_features(self, meta_features):
        """
        Engineer additional meta-features from base predictions.
        
        Args:
            meta_features: Base model predictions
        
        Returns:
            Engineered meta-features
        """
        n_samples = meta_features.shape[0]
        
        # Initialize feature list with original meta-features
        features = [meta_features]
        
        # Statistical features
        features.append(np.mean(meta_features, axis=1).reshape(-1, 1))  # Mean prediction
        features.append(np.std(meta_features, axis=1).reshape(-1, 1))   # Std of predictions
        features.append(np.min(meta_features, axis=1).reshape(-1, 1))   # Min prediction
        features.append(np.max(meta_features, axis=1).reshape(-1, 1))   # Max prediction
        
        # Agreement features
        binary_predictions = (meta_features > 0.5).astype(int)
        agreement = np.mean(binary_predictions, axis=1).reshape(-1, 1)  # Fraction agreeing
        features.append(agreement)
        
        # Disagreement features
        disagreement = np.std(binary_predictions, axis=1).reshape(-1, 1)
        features.append(disagreement)
        
        # Confidence features
        confidence = 1 - np.std(meta_features, axis=1).reshape(-1, 1)  # Inverse of std
        features.append(confidence)
        
        # Ranking features
        ranks = np.argsort(np.argsort(meta_features, axis=1), axis=1)  # Rank of each prediction
        features.append(ranks)
        
        # Pairwise differences
        for i in range(self.n_models):
            for j in range(i + 1, self.n_models):
                diff = (meta_features[:, i] - meta_features[:, j]).reshape(-1, 1)
                features.append(diff)
        
        # Combine all features
        engineered_features = np.hstack(features)
        
        return engineered_features
    
    def predict_proba(self, X):
        """
        Make predictions using meta-learning ensemble.
        
        Args:
            X: Features
        
        Returns:
            Prediction probabilities
        """
        # Get base model predictions
        base_predictions = np.zeros((len(X), self.n_models))
        
        for i, (name, model) in enumerate(self.base_models.items()):
            if hasattr(model, 'predict_proba'):
                base_predictions[:, i] = model.predict_proba(X)[:, 1]
            else:
                decisions = model.decision_function(X)
                base_predictions[:, i] = 1 / (1 + np.exp(-decisions))
        
        # Engineer meta-features
        engineered_meta_features = self._engineer_meta_features(base_predictions)
        
        # Get meta-learner predictions
        meta_predictions = np.zeros((len(X), len(self.meta_models)))
        
        for i, (name, meta_model) in enumerate(self.meta_models.items()):
            if hasattr(meta_model, 'predict_proba'):
                meta_predictions[:, i] = meta_model.predict_proba(engineered_meta_features)[:, 1]
            else:
                decisions = meta_model.decision_function(engineered_meta_features)
                meta_predictions[:, i] = 1 / (1 + np.exp(-decisions))
        
        # Final ensemble prediction
        final_predictions = self.meta_ensemble.predict_proba(meta_predictions)
        
        return final_predictions
    
    def predict(self, X):
        """Make binary predictions."""
        probabilities = self.predict_proba(X)
        return (probabilities[:, 1] > 0.5).astype(int)

# Create and test meta-learning ensemble
print("\nCreating Meta-Learning Ensemble...")

# Create meta-learning ensemble
meta_ensemble = MetaLearnerEnsemble(base_models, cv_folds=5)

# Train ensemble (this will take a bit longer due to cross-validation)
meta_ensemble.fit(X_train, y_train)

# Make predictions
meta_predictions = meta_ensemble.predict(X_test)
meta_probabilities = meta_ensemble.predict_proba(X_test)

# Evaluate performance
meta_accuracy = accuracy_score(y_test, meta_predictions)
meta_precision = precision_score(y_test, meta_predictions)
meta_recall = recall_score(y_test, meta_predictions)
meta_f1 = f1_score(y_test, meta_predictions)
meta_roc_auc = roc_auc_score(y_test, meta_probabilities[:, 1])

print(f"\nMeta-Learning Ensemble Performance:")
print(f"Accuracy: {meta_accuracy:.4f}")
print(f"Precision: {meta_precision:.4f}")
print(f"Recall: {meta_recall:.4f}")
print(f"F1-Score: {meta_f1:.4f}")
print(f"ROC-AUC: {meta_roc_auc:.4f}")

## Part 4: Dynamic Model Selection

### Instance-Based Model Selection

Instead of using all models for every prediction, we can:
1. **Evaluate model confidence** for each prediction
2. **Select best-performing models** for each instance
3. **Weight by confidence** or performance
4. **Use diversity measures** to avoid similar models

### Selection Strategies

- **Performance-based**: Choose models with highest validation accuracy
- **Confidence-based**: Weight models by prediction confidence
- **Diversity-based**: Select diverse models to avoid correlation
- **Hybrid**: Combine multiple selection criteria

In [None]:
class DynamicEnsembleSelector:
    """
    Dynamic ensemble that selects optimal models for each prediction.
    
    Key features:
    - Instance-based model selection
    - Multiple selection strategies
    - Confidence-weighted predictions
    - Performance-based ranking
    """
    
    def __init__(self, base_models, selection_strategy='performance', top_k=3):
        """
        Initialize dynamic ensemble selector.
        
        Args:
            base_models: Dictionary of base models
            selection_strategy: Strategy for model selection ('performance', 'confidence', 'diversity')
            top_k: Number of top models to select
        """
        self.base_models = base_models
        self.selection_strategy = selection_strategy
        self.top_k = min(top_k, len(base_models))
        self.model_names = list(base_models.keys())
        self.n_models = len(base_models)
        
        # Performance tracking
        self.model_performance = {}
        self.model_confidence = {}
        
    def fit(self, X, y):
        """
        Fit the dynamic ensemble selector.
        
        Args:
            X: Training features
            y: Training labels
        """
        print("Training dynamic ensemble selector...")
        
        # Train base models
        for name, model in self.base_models.items():
            print(f"  Training {name}...")
            model.fit(X, y)
        
        # Evaluate model performance using cross-validation
        print("  Evaluating model performance...")
        self._evaluate_model_performance(X, y)
        
        print(f"  Dynamic selector trained with {self.n_models} models")
        print(f"  Selection strategy: {self.selection_strategy}")
        print(f"  Top-k models: {self.top_k}")
    
    def _evaluate_model_performance(self, X, y):
        """
        Evaluate model performance using cross-validation.
        
        Args:
            X: Training features
            y: Training labels
        """
        skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
        
        # Initialize performance tracking
        for name in self.model_names:
            self.model_performance[name] = []
            self.model_confidence[name] = []
        
        # Cross-validation evaluation
        for fold, (train_idx, val_idx) in enumerate(skf.split(X, y)):
            # Get fold data
            if isinstance(X, pd.DataFrame):
                X_fold_train, X_fold_val = X.iloc[train_idx], X.iloc[val_idx]
            else:
                X_fold_train, X_fold_val = X[train_idx], X[val_idx]
            
            y_fold_train = y.iloc[train_idx] if isinstance(y, pd.Series) else y[train_idx]
            y_fold_val = y.iloc[val_idx] if isinstance(y, pd.Series) else y[val_idx]
            
            # Evaluate each model
            for name, model in self.base_models.items():
                # Clone and train model on fold
                from sklearn.base import clone
                fold_model = clone(model)
                fold_model.fit(X_fold_train, y_fold_train)
                
                # Predict on validation set
                if hasattr(fold_model, 'predict_proba'):
                    y_prob = fold_model.predict_proba(X_fold_val)[:, 1]
                else:
                    decisions = fold_model.decision_function(X_fold_val)
                    y_prob = 1 / (1 + np.exp(-decisions))
                
                y_pred = (y_prob > 0.5).astype(int)
                
                # Calculate performance metrics
                f1 = f1_score(y_fold_val, y_pred)
                self.model_performance[name].append(f1)
                
                # Calculate confidence (inverse of prediction uncertainty)
                confidence = np.mean(np.abs(y_prob - 0.5)) * 2  # Scale to 0-1
                self.model_confidence[name].append(confidence)
        
        # Average performance across folds
        for name in self.model_names:
            self.model_performance[name] = np.mean(self.model_performance[name])
            self.model_confidence[name] = np.mean(self.model_confidence[name])
    
    def _select_models_for_instance(self, instance_predictions):
        """
        Select optimal models for a single instance.
        
        Args:
            instance_predictions: Predictions for a single instance
        
        Returns:
            Selected model indices and weights
        """
        if self.selection_strategy == 'performance':
            # Select top-k models based on cross-validation performance
            performance_scores = [self.model_performance[name] for name in self.model_names]
            selected_indices = np.argsort(performance_scores)[-self.top_k:]
            weights = np.array([performance_scores[i] for i in selected_indices])
            weights = weights / np.sum(weights)
            
        elif self.selection_strategy == 'confidence':
            # Select models with highest confidence for this instance
            confidences = [np.abs(instance_predictions[i] - 0.5) * 2 for i in range(len(instance_predictions))]
            selected_indices = np.argsort(confidences)[-self.top_k:]
            weights = np.array([confidences[i] for i in selected_indices])
            weights = weights / np.sum(weights) if np.sum(weights) > 0 else np.ones(len(weights)) / len(weights)
            
        elif self.selection_strategy == 'diversity':
            # Select diverse models (different predictions)
            # Simple diversity measure: variance of predictions
            selected_indices = []
            remaining_indices = list(range(len(instance_predictions)))
            
            # Start with best performing model
            performance_scores = [self.model_performance[name] for name in self.model_names]
            best_idx = np.argmax(performance_scores)
            selected_indices.append(best_idx)
            remaining_indices.remove(best_idx)
            
            # Add models that maximize diversity
            for _ in range(self.top_k - 1):
                if not remaining_indices:
                    break
                
                best_diversity = -1
                best_candidate = None
                
                for candidate in remaining_indices:
                    # Calculate diversity with selected models
                    candidate_preds = [instance_predictions[i] for i in selected_indices + [candidate]]
                    diversity = np.var(candidate_preds)
                    
                    if diversity > best_diversity:
                        best_diversity = diversity
                        best_candidate = candidate
                
                if best_candidate is not None:
                    selected_indices.append(best_candidate)
                    remaining_indices.remove(best_candidate)
            
            # Equal weights for diversity
            weights = np.ones(len(selected_indices)) / len(selected_indices)
            
        else:
            # Default: use all models with equal weights
            selected_indices = list(range(len(instance_predictions)))
            weights = np.ones(len(selected_indices)) / len(selected_indices)
        
        return selected_indices, weights
    
    def predict_proba(self, X):
        """
        Make predictions using dynamic model selection.
        
        Args:
            X: Features
        
        Returns:
            Prediction probabilities
        """
        # Get predictions from all base models
        all_predictions = np.zeros((len(X), self.n_models))
        
        for i, (name, model) in enumerate(self.base_models.items()):
            if hasattr(model, 'predict_proba'):
                all_predictions[:, i] = model.predict_proba(X)[:, 1]
            else:
                decisions = model.decision_function(X)
                all_predictions[:, i] = 1 / (1 + np.exp(-decisions))
        
        # Dynamic selection for each instance
        ensemble_predictions = np.zeros(len(X))
        
        for i in range(len(X)):
            instance_predictions = all_predictions[i]
            selected_indices, weights = self._select_models_for_instance(instance_predictions)
            
            # Weighted prediction
            selected_predictions = instance_predictions[selected_indices]
            ensemble_predictions[i] = np.dot(selected_predictions, weights)
        
        # Convert to probability format
        prob_positive = ensemble_predictions
        prob_negative = 1 - prob_positive
        
        return np.column_stack([prob_negative, prob_positive])
    
    def predict(self, X):
        """Make binary predictions."""
        probabilities = self.predict_proba(X)
        return (probabilities[:, 1] > 0.5).astype(int)
    
    def get_model_ranking(self):
        """
        Get model performance ranking.
        
        Returns:
            Sorted list of models by performance
        """
        performance_items = [(name, perf) for name, perf in self.model_performance.items()]
        return sorted(performance_items, key=lambda x: x[1], reverse=True)

# Create and test dynamic selector
print("\nCreating Dynamic Ensemble Selector...")

# Create dynamic selector
dynamic_selector = DynamicEnsembleSelector(base_models, selection_strategy='performance', top_k=3)

# Train selector
dynamic_selector.fit(X_train, y_train)

# Make predictions
dynamic_predictions = dynamic_selector.predict(X_test)
dynamic_probabilities = dynamic_selector.predict_proba(X_test)

# Evaluate performance
dynamic_accuracy = accuracy_score(y_test, dynamic_predictions)
dynamic_precision = precision_score(y_test, dynamic_predictions)
dynamic_recall = recall_score(y_test, dynamic_predictions)
dynamic_f1 = f1_score(y_test, dynamic_predictions)
dynamic_roc_auc = roc_auc_score(y_test, dynamic_probabilities[:, 1])

print(f"\nDynamic Ensemble Selector Performance:")
print(f"Accuracy: {dynamic_accuracy:.4f}")
print(f"Precision: {dynamic_precision:.4f}")
print(f"Recall: {dynamic_recall:.4f}")
print(f"F1-Score: {dynamic_f1:.4f}")
print(f"ROC-AUC: {dynamic_roc_auc:.4f}")

# Show model ranking
model_ranking = dynamic_selector.get_model_ranking()
print(f"\nModel Performance Ranking:")
for i, (name, performance) in enumerate(model_ranking):
    print(f"  {i+1}. {name}: {performance:.4f}")

## Part 5: Hybrid Ensemble System

### Combining All Approaches

The hybrid ensemble system combines all three approaches:
1. **Context-Aware Ensemble**: Adapts to transaction characteristics
2. **Meta-Learning Ensemble**: Learns optimal combination strategies
3. **Dynamic Selector**: Chooses best models for each prediction
4. **Final Meta-Ensemble**: Combines predictions from all three systems

### Hierarchical Architecture

```
Base Models → Context-Aware Ensemble →
             Meta-Learning Ensemble  → Final Meta-Ensemble → Prediction
             Dynamic Selector       →
```

In [None]:
class HybridEnsembleSystem:
    """
    Hybrid ensemble system combining multiple advanced ensemble techniques.
    
    Architecture:
    - Context-aware ensemble
    - Meta-learning ensemble
    - Dynamic model selector
    - Final meta-ensemble
    """
    
    def __init__(self, base_models):
        """
        Initialize hybrid ensemble system.
        
        Args:
            base_models: Dictionary of base models
        """
        self.base_models = base_models
        
        # Initialize ensemble components
        self.context_ensemble = ContextAwareEnsemble(base_models, n_context_clusters=5)
        self.meta_ensemble = MetaLearnerEnsemble(base_models, cv_folds=5)
        self.dynamic_selector = DynamicEnsembleSelector(base_models, selection_strategy='performance', top_k=3)
        
        # Final meta-ensemble to combine all approaches
        self.final_meta_ensemble = LogisticRegression(random_state=42, class_weight='balanced')
        
    def fit(self, X, y):
        """
        Fit the hybrid ensemble system.
        
        Args:
            X: Training features
            y: Training labels
        """
        print("Training Hybrid Ensemble System...")
        print("="*50)
        
        # Train all ensemble components
        print("\n1. Training Context-Aware Ensemble...")
        self.context_ensemble.fit(X, y)
        
        print("\n2. Training Meta-Learning Ensemble...")
        self.meta_ensemble.fit(X, y)
        
        print("\n3. Training Dynamic Selector...")
        self.dynamic_selector.fit(X, y)
        
        # Generate meta-features for final ensemble
        print("\n4. Generating final meta-features...")
        final_meta_features = self._generate_final_meta_features(X)
        
        # Train final meta-ensemble
        print("\n5. Training final meta-ensemble...")
        self.final_meta_ensemble.fit(final_meta_features, y)
        
        print("\n" + "="*50)
        print("Hybrid Ensemble System Training Complete!")
        print("="*50)
    
    def _generate_final_meta_features(self, X):
        """
        Generate meta-features from all ensemble components.
        
        Args:
            X: Features
        
        Returns:
            Final meta-features
        """
        # Get predictions from all ensemble components
        context_probs = self.context_ensemble.predict_proba(X)[:, 1]
        meta_probs = self.meta_ensemble.predict_proba(X)[:, 1]
        dynamic_probs = self.dynamic_selector.predict_proba(X)[:, 1]
        
        # Engineer meta-features
        features = []
        
        # Raw ensemble predictions
        features.append(context_probs.reshape(-1, 1))
        features.append(meta_probs.reshape(-1, 1))
        features.append(dynamic_probs.reshape(-1, 1))
        
        # Statistical features
        ensemble_preds = np.column_stack([context_probs, meta_probs, dynamic_probs])
        features.append(np.mean(ensemble_preds, axis=1).reshape(-1, 1))
        features.append(np.std(ensemble_preds, axis=1).reshape(-1, 1))
        features.append(np.min(ensemble_preds, axis=1).reshape(-1, 1))
        features.append(np.max(ensemble_preds, axis=1).reshape(-1, 1))
        
        # Agreement features
        binary_preds = (ensemble_preds > 0.5).astype(int)
        agreement = np.mean(binary_preds, axis=1).reshape(-1, 1)
        features.append(agreement)
        
        # Disagreement features
        disagreement = np.std(binary_preds, axis=1).reshape(-1, 1)
        features.append(disagreement)
        
        # Pairwise differences
        features.append((context_probs - meta_probs).reshape(-1, 1))
        features.append((context_probs - dynamic_probs).reshape(-1, 1))
        features.append((meta_probs - dynamic_probs).reshape(-1, 1))
        
        # Combine all features
        final_features = np.hstack(features)
        
        return final_features
    
    def predict_proba(self, X):
        """
        Make predictions using hybrid ensemble.
        
        Args:
            X: Features
        
        Returns:
            Prediction probabilities
        """
        # Generate final meta-features
        final_meta_features = self._generate_final_meta_features(X)
        
        # Make final prediction
        final_prediction = self.final_meta_ensemble.predict_proba(final_meta_features)
        
        return final_prediction
    
    def predict(self, X):
        """Make binary predictions."""
        probabilities = self.predict_proba(X)
        return (probabilities[:, 1] > 0.5).astype(int)
    
    def get_detailed_predictions(self, X):
        """
        Get predictions from all ensemble components.
        
        Args:
            X: Features
        
        Returns:
            Dictionary with predictions from all components
        """
        context_probs = self.context_ensemble.predict_proba(X)[:, 1]
        meta_probs = self.meta_ensemble.predict_proba(X)[:, 1]
        dynamic_probs = self.dynamic_selector.predict_proba(X)[:, 1]
        final_probs = self.predict_proba(X)[:, 1]
        
        return {
            'context_aware': context_probs,
            'meta_learning': meta_probs,
            'dynamic_selector': dynamic_probs,
            'final_ensemble': final_probs
        }

# Create and test hybrid ensemble system
print("\nCreating Hybrid Ensemble System...")

# Create hybrid ensemble
hybrid_ensemble = HybridEnsembleSystem(base_models)

# Train hybrid ensemble (this will take some time)
hybrid_ensemble.fit(X_train, y_train)

# Make predictions
hybrid_predictions = hybrid_ensemble.predict(X_test)
hybrid_probabilities = hybrid_ensemble.predict_proba(X_test)

# Evaluate performance
hybrid_accuracy = accuracy_score(y_test, hybrid_predictions)
hybrid_precision = precision_score(y_test, hybrid_predictions)
hybrid_recall = recall_score(y_test, hybrid_predictions)
hybrid_f1 = f1_score(y_test, hybrid_predictions)
hybrid_roc_auc = roc_auc_score(y_test, hybrid_probabilities[:, 1])

print(f"\nHybrid Ensemble System Performance:")
print(f"Accuracy: {hybrid_accuracy:.4f}")
print(f"Precision: {hybrid_precision:.4f}")
print(f"Recall: {hybrid_recall:.4f}")
print(f"F1-Score: {hybrid_f1:.4f}")
print(f"ROC-AUC: {hybrid_roc_auc:.4f}")

# Get detailed predictions for analysis
detailed_predictions = hybrid_ensemble.get_detailed_predictions(X_test)

print(f"\nDetailed Predictions Analysis:")
for component, predictions in detailed_predictions.items():
    component_binary = (predictions > 0.5).astype(int)
    component_f1 = f1_score(y_test, component_binary)
    component_auc = roc_auc_score(y_test, predictions)
    print(f"  {component}: F1={component_f1:.4f}, AUC={component_auc:.4f}")

## Part 6: Comprehensive Performance Comparison

Let's compare all the ensemble approaches we've implemented:

In [None]:
# Comprehensive performance comparison
def comprehensive_performance_comparison():
    """
    Compare all ensemble approaches and visualize results.
    """
    print("\nComprehensive Performance Comparison")
    print("="*80)
    
    # Collect all results
    results = {
        'Context-Aware': {
            'accuracy': context_accuracy,
            'precision': context_precision,
            'recall': context_recall,
            'f1_score': context_f1,
            'roc_auc': context_roc_auc,
            'probabilities': context_probabilities[:, 1]
        },
        'Meta-Learning': {
            'accuracy': meta_accuracy,
            'precision': meta_precision,
            'recall': meta_recall,
            'f1_score': meta_f1,
            'roc_auc': meta_roc_auc,
            'probabilities': meta_probabilities[:, 1]
        },
        'Dynamic Selector': {
            'accuracy': dynamic_accuracy,
            'precision': dynamic_precision,
            'recall': dynamic_recall,
            'f1_score': dynamic_f1,
            'roc_auc': dynamic_roc_auc,
            'probabilities': dynamic_probabilities[:, 1]
        },
        'Hybrid Ensemble': {
            'accuracy': hybrid_accuracy,
            'precision': hybrid_precision,
            'recall': hybrid_recall,
            'f1_score': hybrid_f1,
            'roc_auc': hybrid_roc_auc,
            'probabilities': hybrid_probabilities[:, 1]
        }
    }
    
    # Create comparison table
    comparison_data = []
    for method, metrics in results.items():
        comparison_data.append({
            'Method': method,
            'Accuracy': f"{metrics['accuracy']:.4f}",
            'Precision': f"{metrics['precision']:.4f}",
            'Recall': f"{metrics['recall']:.4f}",
            'F1-Score': f"{metrics['f1_score']:.4f}",
            'ROC-AUC': f"{metrics['roc_auc']:.4f}"
        })
    
    comparison_df = pd.DataFrame(comparison_data)
    print(comparison_df.to_string(index=False))
    
    # Visualize performance comparison
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    
    # Performance metrics bar chart
    metrics = ['accuracy', 'precision', 'recall', 'f1_score', 'roc_auc']
    methods = list(results.keys())
    
    x = np.arange(len(methods))
    width = 0.15
    
    for i, metric in enumerate(metrics):
        values = [results[method][metric] for method in methods]
        axes[0, 0].bar(x + i * width, values, width, label=metric.replace('_', ' ').title())
    
    axes[0, 0].set_xlabel('Ensemble Method')
    axes[0, 0].set_ylabel('Score')
    axes[0, 0].set_title('Performance Metrics Comparison')
    axes[0, 0].set_xticks(x + width * 2)
    axes[0, 0].set_xticklabels(methods, rotation=45, ha='right')
    axes[0, 0].legend()
    axes[0, 0].grid(True, alpha=0.3)
    
    # ROC curves
    for method, metrics in results.items():
        fpr, tpr, _ = roc_curve(y_test, metrics['probabilities'])
        roc_auc = auc(fpr, tpr)
        axes[0, 1].plot(fpr, tpr, linewidth=2, label=f'{method} (AUC = {roc_auc:.3f})')
    
    axes[0, 1].plot([0, 1], [0, 1], 'k--', linewidth=1)
    axes[0, 1].set_xlabel('False Positive Rate')
    axes[0, 1].set_ylabel('True Positive Rate')
    axes[0, 1].set_title('ROC Curves Comparison')
    axes[0, 1].legend()
    axes[0, 1].grid(True, alpha=0.3)
    
    # F1-Score comparison
    f1_scores = [results[method]['f1_score'] for method in methods]
    bars = axes[1, 0].bar(methods, f1_scores, color=['lightblue', 'lightgreen', 'lightcoral', 'gold'])
    
    # Highlight best performer
    best_idx = np.argmax(f1_scores)
    bars[best_idx].set_color('darkgreen')
    
    axes[1, 0].set_ylabel('F1-Score')
    axes[1, 0].set_title('F1-Score Comparison')
    axes[1, 0].set_ylim(0, 1)
    
    # Add value labels
    for i, (method, score) in enumerate(zip(methods, f1_scores)):
        axes[1, 0].text(i, score + 0.01, f'{score:.3f}', ha='center', va='bottom')
    
    plt.setp(axes[1, 0].get_xticklabels(), rotation=45, ha='right')
    axes[1, 0].grid(True, alpha=0.3)
    
    # Prediction correlation heatmap
    correlation_data = {}
    for method, metrics in results.items():
        correlation_data[method] = metrics['probabilities']
    
    correlation_df = pd.DataFrame(correlation_data)
    correlation_matrix = correlation_df.corr()
    
    sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, 
                square=True, ax=axes[1, 1])
    axes[1, 1].set_title('Prediction Correlation Matrix')
    
    plt.tight_layout()
    plt.show()
    
    # Find best performing method
    best_method = max(results.keys(), key=lambda x: results[x]['f1_score'])
    best_f1 = results[best_method]['f1_score']
    
    print(f"\nBest Performing Method: {best_method}")
    print(f"Best F1-Score: {best_f1:.4f}")
    
    # Calculate improvement over baseline
    baseline_f1 = max(results[method]['f1_score'] for method in ['Context-Aware', 'Meta-Learning', 'Dynamic Selector'])
    hybrid_f1 = results['Hybrid Ensemble']['f1_score']
    
    improvement = ((hybrid_f1 - baseline_f1) / baseline_f1) * 100
    print(f"\nHybrid Ensemble Improvement: {improvement:.2f}% over best individual method")
    
    return results

# Run comprehensive comparison
comparison_results = comprehensive_performance_comparison()

## Practice Exercises

Now it's your turn! Try these exercises to deepen your understanding:

### Exercise 1: Custom Context Features
Modify the `ContextAwareEnsemble` to use different context features:
- Add seasonal features (month, quarter)
- Include economic indicators (if available)
- Try different clustering algorithms (DBSCAN, Hierarchical)

How do these changes affect performance?

In [None]:
# Your code here
# Hint: Modify the _extract_context_features method
# Try different context features and compare performance

### Exercise 2: Meta-Feature Engineering
Enhance the meta-learning ensemble with new features:
- Add prediction confidence intervals
- Include model uncertainty measures
- Try different meta-learners (Neural Networks, SVM)

Which meta-features are most important?

In [None]:
# Your code here
# Hint: Modify the _engineer_meta_features method
# Add feature importance analysis for meta-learners

### Exercise 3: Dynamic Selection Strategies
Implement new selection strategies:
- **Competence-based**: Select models based on local performance
- **Clustering-based**: Use feature space clustering for selection
- **Ensemble of selectors**: Combine multiple selection strategies

Which strategy works best for your data?

In [None]:
# Your code here
# Hint: Add new selection strategies to the DynamicEnsembleSelector
# Compare different strategies on the same dataset

## Key Takeaways

### 1. Advanced Ensemble Strategies
- **Context-Aware**: Adapt to transaction characteristics and patterns
- **Meta-Learning**: Learn optimal combination strategies from data
- **Dynamic Selection**: Choose best models for each prediction
- **Hierarchical**: Multi-level ensemble architectures

### 2. Meta-Learning Principles
- **Cross-Validation**: Prevent overfitting in meta-feature generation
- **Feature Engineering**: Create meaningful meta-features
- **Multiple Meta-Learners**: Ensemble at the meta-level
- **Stacking**: Layer models for optimal performance

### 3. Context-Aware Systems
- **Feature Extraction**: Time, amount, statistical measures
- **Clustering**: Group similar transaction contexts
- **Adaptive Weighting**: Learn context-specific model weights
- **Dynamic Application**: Apply weights based on current context

### 4. Dynamic Model Selection
- **Instance-Based**: Choose models for each prediction
- **Performance-Based**: Select top-performing models
- **Confidence-Based**: Weight by prediction confidence
- **Diversity-Based**: Ensure model diversity

### 5. System Integration
- **Modular Design**: Independent ensemble components
- **Hierarchical Structure**: Multiple levels of ensembling
- **Cross-Validation**: Proper validation throughout
- **Final Meta-Ensemble**: Combine all approaches optimally

### 6. Performance Optimization
- **Complementary Strengths**: Different ensembles excel in different areas
- **Reduced Variance**: Multiple approaches reduce overfitting
- **Improved Generalization**: Better performance on unseen data
- **Robustness**: Resilient to individual model failures

### 7. Production Considerations
- **Computational Cost**: Balance performance vs efficiency
- **Model Interpretability**: Understand ensemble decisions
- **Maintenance**: Update and retrain individual components
- **Scalability**: Handle large-scale deployment

### 8. When to Use Each Approach
- **Context-Aware**: When transaction patterns vary by context
- **Meta-Learning**: When you have sufficient training data
- **Dynamic Selection**: When models have varying performance
- **Hybrid**: When you need maximum performance

## Next Steps

In the next tutorial, we'll explore:
- Professional fraud detection dashboards
- Real-time monitoring and alerting
- Business intelligence and reporting
- User interface design for fraud analysts

Remember: Hybrid ensemble systems represent the state-of-the-art in machine learning. While they're complex to implement, they can provide significant performance improvements by combining the strengths of multiple approaches. The key is understanding when and how to apply each technique!