# Fairness Pipeline Toolkit - Complete Demonstration

This notebook provides a comprehensive walkthrough of the Fairness Pipeline Development Toolkit, demonstrating how to:

1. Configure and run fairness-aware ML pipelines
2. Analyze bias detection results
3. Apply bias mitigation techniques
4. Train fairness-constrained models
5. Evaluate improvements using MLflow tracking
6. Interpret results and make data-driven decisions

## Overview

The toolkit implements a three-step fairness pipeline:

**Step 1: Baseline Measurement** - Analyze raw data for bias and fairness violations  
**Step 2: Data Processing & Training** - Apply bias mitigation and train fair models  
**Step 3: Final Validation** - Compare results and generate improvement reports


## Setup and Installation

First, let's ensure all dependencies are installed and import necessary libraries:

In [None]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
from fairness_pipeline_toolkit.pipeline_executor import PipelineExecutor
from fairness_pipeline_toolkit.config import ConfigParser

warnings.filterwarnings('ignore')

# Add parent directory to path for imports
parent_dir = Path().resolve().parent
src_path = parent_dir / "src"
sys.path.insert(0, str(src_path))

# Configure plotting
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("✅ Environment setup complete!")
print(f"📁 Working directory: {Path().resolve()}")
print(f"📦 Source path: {src_path}")

## 1. Configuration Overview

Let's examine and understand our pipeline configuration:

In [None]:
# Load and display the configuration
config_path = parent_dir / "config.yml"
config = ConfigParser.load(config_path)

print("📋 PIPELINE CONFIGURATION")
print("="*50)

print("🔵 Data Configuration:")
for key, value in config['data'].items():
    print(f"  {key}: {value}")

print("\n🟡 Preprocessing Configuration:")
transformer_config = config['preprocessing']['transformer']
print(f"  Transformer: {transformer_config['name']}")
for param, value in transformer_config['parameters'].items():
    print(f"  {param}: {value}")

print("\n🟢 Training Configuration:")
training_config = config['training']['method']
print(f"  Method: {training_config['name']}")
for param, value in training_config['parameters'].items():
    print(f"  {param}: {value}")

print("\n🔴 Evaluation Configuration:")
eval_config = config['evaluation']
print(f"  Primary Metric: {eval_config['primary_metric']}")
print(f"  Fairness Threshold: {eval_config['fairness_threshold']}")
print(f"  Additional Metrics: {', '.join(eval_config['additional_metrics'])}")

print("\n🟣 MLflow Configuration:")
mlflow_config = config['mlflow']
print(f"  Experiment Name: {mlflow_config['experiment_name']}")
print(f"  Log Model: {mlflow_config['log_model']}")
print(f"  Log Config: {mlflow_config['log_config']}")

## 2. Understanding the Synthetic Dataset

Since no external dataset is provided, our pipeline generates synthetic data with intentional bias. Let's explore this data first:

In [None]:
# Create a pipeline executor to access data generation
executor = PipelineExecutor(config, verbose=False)

# Generate synthetic data for exploration
synthetic_data = executor._generate_synthetic_data(n_samples=1000)

print("📊 SYNTHETIC DATASET OVERVIEW")
print("="*50)
print(f"Dataset shape: {synthetic_data.shape}")
print("\nColumn types:")
for col in synthetic_data.columns:
    print(f"  {col}: {synthetic_data[col].dtype}")

# Display first few rows
print("\n📋 Sample Data:")
display(synthetic_data.head(10))

# Basic statistics
print("\n📈 Descriptive Statistics:")
display(synthetic_data.describe())

### Visualizing Bias in the Synthetic Data

Let's visualize the intentional bias built into our synthetic dataset:

In [None]:
# Create visualizations to show bias
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Target rate by race
race_target = synthetic_data.groupby('race')['target'].mean().sort_values(ascending=False)
race_target.plot(kind='bar', ax=axes[0,0], title='Target Rate by Race', color='skyblue')
axes[0,0].set_ylabel('Positive Rate')
axes[0,0].tick_params(axis='x', rotation=45)

# Target rate by sex
sex_target = synthetic_data.groupby('sex')['target'].mean().sort_values(ascending=False)
sex_target.plot(kind='bar', ax=axes[0,1], title='Target Rate by Sex', color='lightcoral')
axes[0,1].set_ylabel('Positive Rate')
axes[0,1].tick_params(axis='x', rotation=0)

# Income distribution by race
synthetic_data.boxplot(column='income', by='race', ax=axes[1,0])
axes[1,0].set_title('Income Distribution by Race')
axes[1,0].set_xlabel('Race')
axes[1,0].set_ylabel('Income')

# Education distribution by sex
synthetic_data.boxplot(column='education_years', by='sex', ax=axes[1,1])
axes[1,1].set_title('Education Years by Sex')
axes[1,1].set_xlabel('Sex')
axes[1,1].set_ylabel('Education Years')

plt.tight_layout()
plt.suptitle('Bias Visualization in Synthetic Dataset', fontsize=16, y=1.02)
plt.show()

# Print bias summary
print("\n⚠️  INTENTIONAL BIAS SUMMARY:")
print("="*50)
print(f"Overall target rate: {synthetic_data['target'].mean():.3f}")
print("\nTarget rate by race:")
for race, rate in race_target.items():
    print(f"  {race}: {rate:.3f}")
print(f"  Max difference: {race_target.max() - race_target.min():.3f}")

print("\nTarget rate by sex:")
for sex, rate in sex_target.items():
    print(f"  {sex}: {rate:.3f}")
print(f"  Difference: {abs(sex_target.iloc[0] - sex_target.iloc[1]):.3f}")

## 3. Running the Complete Fairness Pipeline

Now let's execute the complete pipeline and observe each step in detail:

In [None]:
executor = PipelineExecutor(config, verbose=True)

results = executor.execute_pipeline()

print("\n✅ Pipeline execution completed successfully!")

## 4. Detailed Analysis of Results

Let's dive deeper into the results and understand what happened at each step:

### 4.1 Baseline Analysis

In [None]:
# Extract baseline results
baseline_report = results['baseline_report']['prediction_audit']
baseline_metrics = baseline_report['metrics']

print("🎯 Performance Metrics:")
performance_metrics = ['accuracy', 'precision', 'recall']
for metric in performance_metrics:
    if metric in baseline_metrics:
        print(f"  {metric.title()}: {baseline_metrics[metric]:.4f}")

print("\n⚖️  Fairness Metrics:")
fairness_metrics = ['demographic_parity_difference', 'equalized_odds_difference']
for metric in fairness_metrics:
    if metric in baseline_metrics:
        value = baseline_metrics[metric]
        threshold = config['evaluation']['fairness_threshold']
        status = "❌ VIOLATION" if value > threshold else "✅ OK"
        print(f"  {metric.replace('_', ' ').title()}: {value:.4f} ({status})")

print(f"\n📈 Overall Fairness Score: {baseline_report['overall_fairness_score']:.4f}")

# Violations summary
violations = baseline_report['fairness_violations']
if any(violations.values()):
    print("\n⚠️  DETECTED VIOLATIONS:")
    for violation, detected in violations.items():
        if detected:
            print(f"  - {violation.replace('_', ' ').title()}")
else:
    print("\n✅ No fairness violations detected in baseline!")

### 4.2 Final Model Analysis

In [None]:
# Extract final results
final_report = results['final_report']
final_metrics = final_report['metrics']

print("🎯 FINAL MODEL PERFORMANCE ANALYSIS")
print("="*50)

print("🎯 Performance Metrics:")
for metric in performance_metrics:
    if metric in final_metrics:
        print(f"  {metric.title()}: {final_metrics[metric]:.4f}")

print("\n⚖️  Fairness Metrics:")
for metric in fairness_metrics:
    if metric in final_metrics:
        value = final_metrics[metric]
        threshold = config['evaluation']['fairness_threshold']
        status = "❌ VIOLATION" if value > threshold else "✅ OK"
        print(f"  {metric.replace('_', ' ').title()}: {value:.4f} ({status})")

print(f"\n📈 Overall Fairness Score: {final_report['overall_fairness_score']:.4f}")

# Violations summary
final_violations = final_report['fairness_violations']
if any(final_violations.values()):
    print("\n⚠️  REMAINING VIOLATIONS:")
    for violation, detected in final_violations.items():
        if detected:
            print(f"  - {violation.replace('_', ' ').title()}")
else:
    print("\n✅ No fairness violations in final model!")

### 4.3 Improvement Comparison

In [None]:
# Create improvement comparison visualization
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Performance metrics comparison
perf_comparison = pd.DataFrame({
    'Baseline': [baseline_metrics[m] for m in performance_metrics if m in baseline_metrics],
    'Final': [final_metrics[m] for m in performance_metrics if m in final_metrics]
}, index=[m.title() for m in performance_metrics if m in baseline_metrics])

perf_comparison.plot(kind='bar', ax=axes[0], title='Performance Metrics Comparison')
axes[0].set_ylabel('Score')
axes[0].set_ylim(0, 1)
axes[0].legend()
axes[0].tick_params(axis='x', rotation=0)

# Fairness metrics comparison
fair_comparison = pd.DataFrame({
    'Baseline': [baseline_metrics[m] for m in fairness_metrics if m in baseline_metrics],
    'Final': [final_metrics[m] for m in fairness_metrics if m in final_metrics]
}, index=[m.replace('_', ' ').title() for m in fairness_metrics if m in baseline_metrics])

fair_comparison.plot(kind='bar', ax=axes[1], title='Fairness Metrics Comparison', color=['red', 'green'])
axes[1].set_ylabel('Difference Score')
axes[1].axhline(y=config['evaluation']['fairness_threshold'], color='orange', linestyle='--', label='Fairness Threshold')
axes[1].legend()
axes[1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

# Print detailed improvement analysis
print("\n📊 DETAILED IMPROVEMENT ANALYSIS")
print("="*50)

primary_metric = config['evaluation']['primary_metric']
baseline_primary = baseline_metrics[primary_metric]
final_primary = final_metrics[primary_metric]
improvement = baseline_primary - final_primary
improvement_pct = (improvement / baseline_primary) * 100 if baseline_primary != 0 else 0

print(f"🎯 Primary Fairness Metric ({primary_metric.replace('_', ' ').title()}):")
print(f"  Baseline: {baseline_primary:.4f}")
print(f"  Final: {final_primary:.4f}")
print(f"  Absolute Improvement: {improvement:.4f}")
print(f"  Percentage Improvement: {improvement_pct:.1f}%")

# Performance trade-offs
print("\n⚖️  Performance Trade-offs:")
for metric in performance_metrics:
    if metric in baseline_metrics and metric in final_metrics:
        baseline_val = baseline_metrics[metric]
        final_val = final_metrics[metric]
        change = final_val - baseline_val
        change_pct = (change / baseline_val) * 100 if baseline_val != 0 else 0
        direction = "📈 Improved" if change > 0 else "📉 Decreased" if change < 0 else "➡️  Unchanged"
        print(f"  {metric.title()}: {baseline_val:.4f} → {final_val:.4f} ({change:+.4f}, {change_pct:+.1f}%) {direction}")

## 5. Understanding the Transformations

Let's examine what the bias mitigation transformer actually did to our data:

In [None]:
# Get transformation details
transformer = results['transformer']
if transformer:
    transformation_info = transformer.get_mitigation_details()
    
    print("🔧 BIAS MITIGATION TRANSFORMATION DETAILS")
    print("="*50)
    
    print(f"Repair Level: {transformation_info['repair_level']}")
    print(f"Sensitive Features: {transformation_info['sensitive_features']}")
    print(f"Non-sensitive Features: {transformation_info['non_sensitive_features']}")
    
    print("\n📊 Group Statistics:")
    for sensitive_attr, group_stats in transformation_info['group_statistics'].items():
        print(f"\n  {sensitive_attr}:")
        for group, stats in group_stats.items():
            print(f"    {group}: {stats['size']} samples")
            for feature, mean_val in stats['mean'].items():
                overall_mean = transformation_info['overall_means'][feature]
                diff = overall_mean - mean_val
                print(f"      {feature}: {mean_val:.3f} (diff from overall: {diff:+.3f})")
else:
    print("❌ No transformer found in results")

## 6. Model Information and Fairness Constraints

Let's examine the fairness-constrained model that was trained:

In [None]:
# Get model details
model = results['model']
if model:
    fairness_info = model.get_fairness_info()
    
    print("🤖 FAIRNESS-CONSTRAINED MODEL DETAILS")
    print("="*50)
    
    for key, value in fairness_info.items():
        print(f"{key.replace('_', ' ').title()}: {value}")
    
    # Show whether fairlearn is being used
    if hasattr(model, 'use_fairlearn'):
        if model.use_fairlearn:
            print("\n✅ Using Fairlearn's ExponentiatedGradient for constraint optimization")
        else:
            print("\n⚠️  Using fallback fair classifier (Fairlearn not available)")
            if hasattr(model, 'group_thresholds_'):
                print("\n📊 Group-specific Decision Thresholds:")
                for attr, thresholds in model.group_thresholds_.items():
                    print(f"  {attr}:")
                    for group, threshold in thresholds.items():
                        print(f"    {group}: {threshold:.4f}")
else:
    print("❌ No model found in results")

## 7. MLflow Experiment Tracking

Our results have been automatically logged to MLflow. Let's explore what was captured:

In [None]:
import mlflow
from mlflow.tracking import MlflowClient

# Get the experiment
experiment_name = config['mlflow']['experiment_name']
experiment = mlflow.get_experiment_by_name(experiment_name)

if experiment:
    print(f"🔬 MLFLOW EXPERIMENT: {experiment_name}")
    print("="*50)
    print(f"Experiment ID: {experiment.experiment_id}")
    print(f"Artifact Location: {experiment.artifact_location}")
    
    # Get recent runs
    client = MlflowClient()
    runs = client.search_runs(
        experiment_ids=[experiment.experiment_id],
        max_results=5,
        order_by=["created_time DESC"]
    )
    
    if runs:
        latest_run = runs[0]
        print(f"\n📊 Latest Run ID: {latest_run.info.run_id}")
        print(f"Status: {latest_run.info.status}")
        print(f"Start Time: {pd.to_datetime(latest_run.info.start_time, unit='ms')}")
        
        # Show logged metrics
        print("\n📈 Logged Metrics:")
        for metric_name, metric_value in latest_run.data.metrics.items():
            print(f"  {metric_name}: {metric_value:.4f}")
        
        # Show artifacts
        artifacts = client.list_artifacts(latest_run.info.run_id)
        print("\n📁 Logged Artifacts:")
        for artifact in artifacts:
            if artifact.is_dir:
                print(f"  📁 {artifact.path}/")
                # List contents of directory
                sub_artifacts = client.list_artifacts(latest_run.info.run_id, artifact.path)
                for sub_artifact in sub_artifacts[:3]:  # Show first 3 items
                    print(f"    📄 {sub_artifact.path}")
                if len(sub_artifacts) > 3:
                    print(f"    ... and {len(sub_artifacts) - 3} more files")
            else:
                print(f"  📄 {artifact.path}")
        
        # Show tags
        if latest_run.data.tags:
            print("\n🏷️  Tags:")
            for tag_key, tag_value in latest_run.data.tags.items():
                if not tag_key.startswith('mlflow.'):
                    print(f"  {tag_key}: {tag_value}")
    
    print("\n🌐 To view in MLflow UI, run: mlflow ui")
    print(f"📂 In the parent directory: {parent_dir}")
else:
    print(f"❌ Experiment '{experiment_name}' not found")

## 8. Comprehensive Results Summary

Let's create a final comprehensive summary of our fairness pipeline results:

In [None]:
print("🎯 FAIRNESS PIPELINE RESULTS SUMMARY")
print("="*60)

# Configuration summary
print("⚙️  Configuration Used:")
print(f"  • Data: Synthetic dataset ({config['data']['input_path']})")
print(f"  • Sensitive Features: {', '.join(config['data']['sensitive_features'])}")
print(f"  • Bias Mitigation: {config['preprocessing']['transformer']['name']} (repair_level={config['preprocessing']['transformer']['parameters']['repair_level']})")
print(f"  • Fair Training: {config['training']['method']['name']} ({config['training']['method']['parameters']['constraint']})")
print(f"  • Primary Metric: {config['evaluation']['primary_metric']}")
print(f"  • Fairness Threshold: {config['evaluation']['fairness_threshold']}")

# Results summary
print("\n📊 Key Results:")

# Fairness improvements
primary_metric = config['evaluation']['primary_metric']
if primary_metric in baseline_metrics and primary_metric in final_metrics:
    baseline_val = baseline_metrics[primary_metric]
    final_val = final_metrics[primary_metric]
    improvement = baseline_val - final_val
    improvement_pct = (improvement / baseline_val) * 100 if baseline_val != 0 else 0
    
    threshold = config['evaluation']['fairness_threshold']
    baseline_violation = "Yes" if baseline_val > threshold else "No"
    final_violation = "Yes" if final_val > threshold else "No"
    
    print(f"\n  🎯 Primary Fairness Metric ({primary_metric.replace('_', ' ').title()}):")
    print(f"    Baseline: {baseline_val:.4f} (Violation: {baseline_violation})")
    print(f"    Final: {final_val:.4f} (Violation: {final_violation})")
    print(f"    Improvement: {improvement:.4f} ({improvement_pct:+.1f}%)")

# Performance summary
print("\n  📈 Performance Summary:")
for metric in ['accuracy', 'precision', 'recall']:
    if metric in baseline_metrics and metric in final_metrics:
        baseline_val = baseline_metrics[metric]
        final_val = final_metrics[metric]
        change = final_val - baseline_val
        symbol = "📈" if change > 0 else "📉" if change < 0 else "➡️"
        print(f"    {metric.title()}: {baseline_val:.4f} → {final_val:.4f} ({change:+.4f}) {symbol}")

# Overall assessment
baseline_fairness_score = baseline_report['overall_fairness_score']
final_fairness_score = final_report['overall_fairness_score']
fairness_improvement = final_fairness_score - baseline_fairness_score

print("\n  🏆 Overall Fairness Score:")
print(f"    Baseline: {baseline_fairness_score:.4f}")
print(f"    Final: {final_fairness_score:.4f}")
print(f"    Improvement: {fairness_improvement:+.4f}")

# Success criteria
print("\n✅ Success Assessment:")
success_count = 0
total_criteria = 4

# Criterion 1: Reduced primary fairness metric
if improvement > 0:
    print(f"  ✅ Primary fairness metric improved by {improvement:.4f}")
    success_count += 1
else:
    print(f"  ❌ Primary fairness metric did not improve ({improvement:.4f})")

# Criterion 2: No severe performance degradation (< 10% decrease)
accuracy_change = final_metrics['accuracy'] - baseline_metrics['accuracy']
accuracy_change_pct = (accuracy_change / baseline_metrics['accuracy']) * 100
if accuracy_change_pct > -10:
    print(f"  ✅ Accuracy maintained within acceptable range ({accuracy_change_pct:+.1f}%)")
    success_count += 1
else:
    print(f"  ❌ Accuracy degraded significantly ({accuracy_change_pct:+.1f}%)")

# Criterion 3: Final model meets fairness threshold
if final_val <= threshold:
    print(f"  ✅ Final model meets fairness threshold ({final_val:.4f} ≤ {threshold})")
    success_count += 1
else:
    print(f"  ⚠️  Final model still exceeds fairness threshold ({final_val:.4f} > {threshold})")

# Criterion 4: Successful MLflow logging
if experiment:
    print(f"  ✅ Results successfully logged to MLflow experiment '{experiment_name}'")
    success_count += 1
else:
    print("  ❌ Failed to log results to MLflow")

# Overall success
success_rate = (success_count / total_criteria) * 100
print(f"\n🎯 Overall Success Rate: {success_count}/{total_criteria} ({success_rate:.0f}%)")

if success_rate >= 75:
    print("🌟 EXCELLENT: Fairness pipeline achieved strong results!")
elif success_rate >= 50:
    print("👍 GOOD: Fairness pipeline achieved satisfactory results")
else:
    print("⚠️  NEEDS IMPROVEMENT: Consider tuning parameters or trying different approaches")

print("\n" + "="*60)
print("🎉 FAIRNESS PIPELINE DEMONSTRATION COMPLETE!")
print("="*60)

## 9. Next Steps and Recommendations

Based on the results of this demonstration, here are key insights and recommendations:

### 🔧 **Configuration Tuning Recommendations**

1. **Repair Level Optimization**: Try different `repair_level` values (0.1 to 1.0) to find the optimal balance between fairness and performance

2. **Constraint Selection**: Experiment with different fairness constraints:
   - `demographic_parity`: Ensures equal positive prediction rates across groups
   - `equalized_odds`: Ensures equal true/false positive rates across groups

3. **Base Estimator Tuning**: Consider using different base estimators or tuning hyperparameters for better convergence

### 📊 **Monitoring and Production Considerations**

1. **Continuous Monitoring**: Set up automated fairness monitoring in production
2. **Data Drift Detection**: Monitor for changes in data distribution that might affect fairness
3. **A/B Testing**: Compare fairness-constrained models against baseline models in production
4. **Stakeholder Review**: Regular review of fairness metrics with domain experts

### 🚀 **Advanced Usage**

1. **Custom Datasets**: Replace synthetic data with real-world datasets
2. **Multiple Sensitive Attributes**: Experiment with intersectional fairness
3. **Custom Transformers**: Implement domain-specific bias mitigation techniques
4. **Advanced Constraints**: Implement individual fairness or counterfactual fairness constraints

### 📚 **Learning Resources**

- **Fairlearn Documentation**: https://fairlearn.org/
- **MLflow Documentation**: https://mlflow.org/docs/latest/index.html
- **Fairness in ML Research**: Explore recent papers on algorithmic fairness

---

**Thank you for exploring the Fairness Pipeline Toolkit! 🎉**

This toolkit provides a solid foundation for implementing fairness-aware machine learning in production environments. Remember that fairness is an ongoing process that requires continuous monitoring, evaluation, and improvement.