# Bias and Fairness Analysis

**Goal**: Evaluate the fairness of the offer completion prediction model across different demographic groups.

**Why Fairness Matters:**
- Avoid discriminatory outcomes in marketing offers
- Ensure equitable customer experience
- Build trust in AI-driven recommendations
- Meet regulatory and ethical standards

**Protected Attributes Analyzed:**
1. **Gender**: Male, Female, Other, Missing
2. **Age Group**: 18-30, 31-45, 46-60, 61-75, 76+
3. **Income Bracket**: Missing, Low, Medium, High, Very High

**Fairness Metrics:**
- **Demographic Parity**: Similar prediction rates across groups
- **Equal Opportunity**: Similar true positive rates across groups
- **Predictive Parity**: Similar precision across groups
- **Disparate Impact**: Ratio of favorable outcomes between groups
- **Overall Accuracy**: Similar accuracy across groups

In [13]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import (accuracy_score, precision_score, recall_score,
                            f1_score, confusion_matrix, roc_auc_score,
                            precision_recall_curve, roc_curve)
import joblib
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_columns', None)
plt.style.use('seaborn-v0_8-whitegrid')

print("Environment ready! ‚úì")

Environment ready! ‚úì


## Load Data and Models

In [14]:
processed_dir = '../Cafe_Rewards_Offers/processed'
models_dir = '../Cafe_Rewards_Offers/models'

X_test = joblib.load(f'{processed_dir}/X_test_scaled.pkl')
y_test = joblib.load(f'{processed_dir}/y_test.pkl')
feature_names = joblib.load(f'{processed_dir}/feature_names.pkl')

print(f"Test set loaded: {X_test.shape[0]:,} samples √ó {X_test.shape[1]} features")
print(f"Target distribution in test set:")
print(y_test.value_counts(normalize=True).round(3))

Test set loaded: 17,287 samples √ó 26 features
Target distribution in test set:
target
1    0.534
0    0.466
Name: proportion, dtype: float64


In [18]:
print("="*60)
print("CHECKING FOR DATA LEAKAGE")
print("="*60)

# Check feature names
print(f"\nFeatures ({len(feature_names)} total):")
for i, feat in enumerate(feature_names):
    print(f"  {i:2}. {feat}")

# Check if target has perfect correlation with any feature
print("\n" + "="*60)
print("CHECKING FOR PERFECT CORRELATION")
print("="*60)

# Combine X and y for correlation check
train_df = X_train.copy()
train_df['target'] = y_train.values

# Calculate correlation with target
correlations = train_df.corr()['target'].sort_values(ascending=False)

print("\nTop correlations with target:")
for feat, corr in correlations.head(10).items():
    print(f"  {feat:30}: {corr:.4f}")

# Flag potential data leaks (correlation = 1.0 or near 1.0)
perfect_leaks = correlations[correlations == 1.0]
if len(perfect_leaks) > 0:
    print(f"\n‚ö†Ô∏è  DATA LEAKAGE DETECTED!")
    print(f"Features with perfect correlation (r=1.0):")
    for feat in perfect_leaks.index:
        print(f"  - {feat}")
    print("\n‚ö†Ô∏è  ACTION REQUIRED: Remove these features before modeling!")
else:
    print("\n‚úì No perfect data leaks detected (correlation < 1.0)")

# Check for near-perfect leaks (correlation > 0.95)
near_leaks = correlations[(correlations > 0.95) & (correlations < 1.0)]
if len(near_leaks) > 0:
    print(f"\n‚ö†Ô∏è  NEAR-PERFECT DATA LEAKAGE DETECTED!")
    print(f"Features with near-perfect correlation (r > 0.95):")
    for feat, corr in near_leaks.items():
        print(f"  - {feat:30}: {corr:.4f}")

CHECKING FOR DATA LEAKAGE

Features (26 total):
   0. received_time
   1. difficulty
   2. duration
   3. in_email
   4. in_mobile
   5. in_social
   6. in_web
   7. offer_received
   8. offer_viewed
   9. offer_completed
  10. age
  11. income
  12. membership_year
  13. is_demographics_missing
  14. membership_duration_days
  15. membership_month
  16. offer_type_bogo
  17. offer_type_discount
  18. offer_type_informational
  19. gender_F
  20. gender_M
  21. gender_Missing
  22. gender_O
  23. age_group_encoded
  24. income_bracket_encoded
  25. tenure_group_encoded

CHECKING FOR PERFECT CORRELATION


NameError: name 'X_train' is not defined

In [17]:
rf_model = joblib.load(f'{models_dir}/random_forest.pkl')
rf_tuned = joblib.load(f'{models_dir}/random_forest_tuned.pkl')

print(f"Model expects {len(rf_model.feature_names_in_)} features:")
print(f"  Test set has {X_test.shape[1]} features")

if 'offer_completed' in rf_model.feature_names_in_:
    print("\n‚ö†Ô∏è  Model was trained with 'offer_completed' (data leakage)")
    print("   Retraining model without leakage feature...")
    
    X_train = joblib.load(f'{processed_dir}/X_train_scaled.pkl')
    y_train = joblib.load(f'{processed_dir}/y_train.pkl')
    
    if 'offer_completed' in X_train.columns:
        X_train = X_train.drop('offer_completed', axis=1)
        X_test = X_test.drop('offer_completed', axis=1)
        
        rf_model.fit(X_train, y_train)
        rf_tuned.fit(X_train, y_train)
        
        print("   ‚úì Model retrained without data leakage")

y_pred = rf_model.predict(X_test)
y_proba = rf_model.predict_proba(X_test)[:, 1]

print("\nRandom Forest model loaded!")
print(f"\nOverall Model Performance on Test Set:")
print(f"  Accuracy:  {accuracy_score(y_test, y_pred):.4f}")
print(f"  Precision: {precision_score(y_test, y_pred):.4f}")
print(f"  Recall:    {recall_score(y_test, y_pred):.4f}")
print(f"  F1-Score:  {f1_score(y_test, y_pred):.4f}")
print(f"  AUC-ROC:   {roc_auc_score(y_test, y_proba):.4f}")

Model expects 25 features:
  Test set has 26 features


ValueError: The feature names should match those that were passed during fit.
Feature names unseen at fit time:
- offer_completed


## Protected Attributes Analysis

We'll analyze model performance across protected attributes from the original data.

In [None]:
df_original = pd.read_csv('../Cafe_Rewards_Offers/processed_data_for_classification.csv')

print(f"Original dataset loaded: {df_original.shape[0]:,} rows √ó {df_original.shape[1]} columns")
print(f"\nColumns available for fairness analysis:")
protected_cols = ['gender', 'age', 'income', 'age_group', 'income_bracket', 'tenure_group']
for col in protected_cols:
    if col in df_original.columns:
        unique_vals = df_original[col].unique()
        print(f"  - {col}: {len(unique_vals)} unique values")

In [None]:
df_test_fairness = X_test.copy()
df_test_fairness['target'] = y_test.values
df_test_fairness['prediction'] = y_pred
df_test_fairness['prediction_proba'] = y_proba

df_original_test = df_original.iloc[X_test.index].copy()

for col in ['gender', 'age_group', 'income_bracket', 'tenure_group']:
    if col in df_original_test.columns:
        df_test_fairness[col] = df_original_test[col].values

print(f"Fairness analysis dataframe created: {df_test_fairness.shape}")
print(f"\nProtected attributes added: {['gender', 'age_group', 'income_bracket', 'tenure_group']}")

## Fairness Metrics Functions

In [None]:
def calculate_group_metrics(y_true, y_pred, y_proba, group_mask):
    """Calculate classification metrics for a specific subgroup."""
    if sum(group_mask) < 10:
        return None
    
    y_true_g = y_true[group_mask]
    y_pred_g = y_pred[group_mask]
    y_proba_g = y_proba[group_mask]
    
    cm = confusion_matrix(y_true_g, y_pred_g)
    
    tn, fp, fn, tp = cm.ravel() if cm.size == 4 else (0, 0, 0, 0)
    
    metrics = {
        'count': sum(group_mask),
        'positive_rate': y_pred_g.mean(),
        'accuracy': accuracy_score(y_true_g, y_pred_g),
        'precision': precision_score(y_true_g, y_pred_g, zero_division=0),
        'recall': recall_score(y_true_g, y_pred_g, zero_division=0),
        'f1': f1_score(y_true_g, y_pred_g, zero_division=0),
        'tpr': recall_score(y_true_g, y_pred_g, zero_division=0),
        'tnr': tn / (tn + fp) if (tn + fp) > 0 else 0,
        'fpr': fp / (fp + tn) if (fp + tn) > 0 else 0,
        'fnr': fn / (fn + tp) if (fn + tp) > 0 else 0,
        'auc': roc_auc_score(y_true_g, y_proba_g) if len(np.unique(y_true_g)) > 1 else np.nan
    }
    
    return metrics


def analyze_fairness_by_attribute(df, attribute, y_true_col='target', 
                                  y_pred_col='prediction', y_proba_col='prediction_proba'):
    """Analyze fairness metrics across all values of a protected attribute."""
    
    results = []
    overall_metrics = calculate_group_metrics(
        df[y_true_col].values, 
        df[y_pred_col].values, 
        df[y_proba_col].values,
        np.ones(len(df), dtype=bool)
    )
    
    for value in df[attribute].unique():
        if pd.isna(value):
            continue
        
        mask = df[attribute] == value
        group_metrics = calculate_group_metrics(
            df[y_true_col].values, 
            df[y_pred_col].values, 
            df[y_proba_col].values,
            mask
        )
        
        if group_metrics:
            group_metrics['attribute'] = attribute
            group_metrics['value'] = value
            
            for metric in ['accuracy', 'precision', 'recall', 'f1', 'positive_rate', 'tpr', 'fpr']:
                if overall_metrics[metric] > 0:
                    diff = group_metrics[metric] - overall_metrics[metric]
                    group_metrics[f'{metric}_diff'] = diff
                    if overall_metrics[metric] > 0:
                        group_metrics[f'{metric}_pct_diff'] = (diff / overall_metrics[metric]) * 100
            
            results.append(group_metrics)
    
    return pd.DataFrame(results)


def calculate_disparate_impact(df, attribute, y_pred_col='prediction', reference_value=None):
    """Calculate disparate impact ratio for a protected attribute."""
    
    positive_rates = df.groupby(attribute)[y_pred_col].mean()
    
    if reference_value is None:
        reference_value = positive_rates.idxmax()
    
    reference_rate = positive_rates[reference_value]
    
    di_results = []
    for value, rate in positive_rates.items():
        if reference_rate > 0:
            di = rate / reference_rate
        else:
            di = np.nan
        
        di_results.append({
            'attribute': attribute,
            'value': value,
            'positive_rate': rate,
            'reference': reference_value,
            'reference_rate': reference_rate,
            'disparate_impact': di,
            'is_fair': 0.8 <= di <= 1.25
        })
    
    return pd.DataFrame(di_results)


def plot_fairness_comparison(metrics_df, attribute, metric_cols=['accuracy', 'precision', 'recall', 'f1']):
    """Plot fairness metrics comparison across groups."""
    
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    fig.suptitle(f'Fairness Analysis by {attribute}', fontsize=16, fontweight='bold')
    
    metrics_df = metrics_df.sort_values('value')
    values = metrics_df['value'].values
    
    for idx, metric in enumerate(metric_cols):
        ax = axes[idx // 2, idx % 2]
        
        bars = ax.bar(values, metrics_df[metric].values, alpha=0.7, edgecolor='black')
        
        overall_mean = metrics_df[f'{metric}_diff'].mean() + metrics_df[metric].mean()
        ax.axhline(y=overall_mean, color='red', linestyle='--', linewidth=2, 
                  label=f'Overall Mean: {overall_mean:.3f}')
        
        for bar, val in zip(bars, metrics_df[metric].values):
            height = bar.get_height()
            ax.text(bar.get_x() + bar.get_width()/2., height,
                   f'{val:.3f}',
                   ha='center', va='bottom', fontsize=9)
        
        ax.set_xlabel(attribute)
        ax.set_ylabel(metric.replace('_', ' ').title())
        ax.set_title(f'{metric.replace("_", " ").title()} by {attribute}')
        ax.legend()
        ax.grid(axis='y', alpha=0.3)
        
        plt.setp(ax.xaxis.get_majorticklabels(), rotation=45, ha='right')
    
    plt.tight_layout()
    plt.show()


def plot_positive_rates(metrics_df, attribute):
    """Plot positive prediction rates across groups (Demographic Parity)."""
    
    plt.figure(figsize=(12, 6))
    
    metrics_df = metrics_df.sort_values('positive_rate')
    
    colors = ['green' if 0.8 <= (rate / metrics_df['positive_rate'].max()) <= 1.25 
              else 'orange' for rate in metrics_df['positive_rate']]
    
    bars = plt.bar(metrics_df['value'], metrics_df['positive_rate'], 
                    color=colors, alpha=0.7, edgecolor='black')
    
    plt.axhline(y=metrics_df['positive_rate'].mean(), color='red', 
                linestyle='--', linewidth=2, 
                label=f'Overall Mean: {metrics_df["positive_rate"].mean():.3f}')
    
    for bar, val in zip(bars, metrics_df['positive_rate'].values):
        height = bar.get_height()
        plt.text(bar.get_x() + bar.get_width()/2., height,
                f'{val:.3f}',
                ha='center', va='bottom', fontsize=10)
    
    plt.xlabel(attribute)
    plt.ylabel('Positive Prediction Rate')
    plt.title(f'Demographic Parity - Positive Rate by {attribute}', 
              fontweight='bold')
    plt.legend()
    plt.grid(axis='y', alpha=0.3)
    plt.xticks(rotation=45, ha='right')
    plt.tight_layout()
    plt.show()

print("Fairness metrics functions defined! ‚úì")

## 1. Gender-based Fairness Analysis

In [None]:
print("="*70)
print("GENDER DISTRIBUTION")
print("="*70)

gender_dist = df_test_fairness['gender'].value_counts(normalize=True).sort_index()
print("\nTest set distribution:")
for gender, pct in gender_dist.items():
    count = (df_test_fairness['gender'] == gender).sum()
    print(f"  {gender}: {count:5,} ({pct*100:5.2f}%)")

print("\nTarget completion rate by gender:")
for gender in df_test_fairness['gender'].unique():
    if pd.notna(gender):
        subset = df_test_fairness[df_test_fairness['gender'] == gender]
        completion_rate = subset['target'].mean()
        print(f"  {gender}: {completion_rate:.3f}")

In [None]:
gender_fairness = analyze_fairness_by_attribute(df_test_fairness, 'gender')

print("="*70)
print("GENDER FAIRNESS METRICS")
print("="*70)

display_cols = ['value', 'count', 'accuracy', 'precision', 'recall', 'f1', 
                'positive_rate', 'tpr', 'fpr', 'auc']
print(gender_fairness[display_cols].round(4).to_string(index=False))

print("\n" + "="*70)
print("DISPARITIES FROM OVERALL")
print("="*70)

diff_cols = ['value', 'accuracy_pct_diff', 'precision_pct_diff', 
             'recall_pct_diff', 'f1_pct_diff']
print(gender_fairness[diff_cols].round(2).to_string(index=False))

In [None]:
gender_di = calculate_disparate_impact(df_test_fairness, 'gender')

print("="*70)
print("GENDER DISPARATE IMPACT ANALYSIS")
print("="*70)
print("\nDisparate Impact Ratio = (Group Positive Rate) / (Reference Group Positive Rate)")
print("Fair range: 0.8 ‚â§ DI ‚â§ 1.25 (80% rule)")
print("\n")

for _, row in gender_di.iterrows():
    status = "‚úì FAIR" if row['is_fair'] else "‚ö†Ô∏è  UNFAIR"
    print(f"{row['value']:10} | Rate: {row['positive_rate']:.4f} | DI: {row['disparate_impact']:.3f} | {status}")

In [None]:
plot_fairness_comparison(gender_fairness, 'gender')
plot_positive_rates(gender_fairness, 'gender')

## 2. Age Group-based Fairness Analysis

In [None]:
print("="*70)
print("AGE GROUP DISTRIBUTION")
print("="*70)

age_dist = df_test_fairness['age_group'].value_counts().sort_index()
print("\nTest set distribution:")
for age, count in age_dist.items():
    if pd.notna(age):
        pct = (count / len(df_test_fairness)) * 100
        print(f"  {age:10}: {count:6,} ({pct:5.2f}%)")

print("\nTarget completion rate by age group:")
for age in sorted(df_test_fairness['age_group'].unique()):
    if pd.notna(age):
        subset = df_test_fairness[df_test_fairness['age_group'] == age]
        completion_rate = subset['target'].mean()
        print(f"  {age:10}: {completion_rate:.3f}")

In [None]:
age_fairness = analyze_fairness_by_attribute(df_test_fairness, 'age_group')

print("="*70)
print("AGE GROUP FAIRNESS METRICS")
print("="*70)

display_cols = ['value', 'count', 'accuracy', 'precision', 'recall', 'f1', 
                'positive_rate', 'tpr', 'fpr', 'auc']
print(age_fairness[display_cols].round(4).to_string(index=False))

print("\n" + "="*70)
print("DISPARITIES FROM OVERALL")
print("="*70)

diff_cols = ['value', 'accuracy_pct_diff', 'precision_pct_diff', 
             'recall_pct_diff', 'f1_pct_diff']
print(age_fairness[diff_cols].round(2).to_string(index=False))

In [None]:
age_di = calculate_disparate_impact(df_test_fairness, 'age_group')

print("="*70)
print("AGE GROUP DISPARATE IMPACT ANALYSIS")
print("="*70)
print("\nDisparate Impact Ratio = (Group Positive Rate) / (Reference Group Positive Rate)")
print("Fair range: 0.8 ‚â§ DI ‚â§ 1.25 (80% rule)")
print("\n")

for _, row in age_di.iterrows():
    status = "‚úì FAIR" if row['is_fair'] else "‚ö†Ô∏è  UNFAIR"
    print(f"{row['value']:10} | Rate: {row['positive_rate']:.4f} | DI: {row['disparate_impact']:.3f} | {status}")

In [None]:
plot_fairness_comparison(age_fairness, 'age_group')
plot_positive_rates(age_fairness, 'age_group')

## 3. Income Bracket-based Fairness Analysis

In [None]:
print("="*70)
print("INCOME BRACKET DISTRIBUTION")
print("="*70)

income_dist = df_test_fairness['income_bracket'].value_counts().sort_index()
print("\nTest set distribution:")
for income, count in income_dist.items():
    if pd.notna(income):
        pct = (count / len(df_test_fairness)) * 100
        print(f"  {income:12}: {count:6,} ({pct:5.2f}%)")

print("\nTarget completion rate by income bracket:")
for income in sorted(df_test_fairness['income_bracket'].unique()):
    if pd.notna(income):
        subset = df_test_fairness[df_test_fairness['income_bracket'] == income]
        completion_rate = subset['target'].mean()
        print(f"  {income:12}: {completion_rate:.3f}")

In [None]:
income_fairness = analyze_fairness_by_attribute(df_test_fairness, 'income_bracket')

print("="*70)
print("INCOME BRACKET FAIRNESS METRICS")
print("="*70)

display_cols = ['value', 'count', 'accuracy', 'precision', 'recall', 'f1', 
                'positive_rate', 'tpr', 'fpr', 'auc']
print(income_fairness[display_cols].round(4).to_string(index=False))

print("\n" + "="*70)
print("DISPARITIES FROM OVERALL")
print("="*70)

diff_cols = ['value', 'accuracy_pct_diff', 'precision_pct_diff', 
             'recall_pct_diff', 'f1_pct_diff']
print(income_fairness[diff_cols].round(2).to_string(index=False))

In [None]:
income_di = calculate_disparate_impact(df_test_fairness, 'income_bracket')

print("="*70)
print("INCOME BRACKET DISPARATE IMPACT ANALYSIS")
print("="*70)
print("\nDisparate Impact Ratio = (Group Positive Rate) / (Reference Group Positive Rate)")
print("Fair range: 0.8 ‚â§ DI ‚â§ 1.25 (80% rule)")
print("\n")

for _, row in income_di.iterrows():
    status = "‚úì FAIR" if row['is_fair'] else "‚ö†Ô∏è  UNFAIR"
    print(f"{row['value']:12} | Rate: {row['positive_rate']:.4f} | DI: {row['disparate_impact']:.3f} | {status}")

In [None]:
plot_fairness_comparison(income_fairness, 'income_bracket')
plot_positive_rates(income_fairness, 'income_bracket')

## 4. Tenure Group-based Fairness Analysis

In [None]:
print("="*70)
print("TENURE GROUP DISTRIBUTION")
print("="*70)

tenure_dist = df_test_fairness['tenure_group'].value_counts()
print("\nTest set distribution:")
for tenure, count in tenure_dist.items():
    if pd.notna(tenure):
        pct = (count / len(df_test_fairness)) * 100
        print(f"  {tenure:15}: {count:6,} ({pct:5.2f}%)")

print("\nTarget completion rate by tenure group:")
for tenure in df_test_fairness['tenure_group'].unique():
    if pd.notna(tenure):
        subset = df_test_fairness[df_test_fairness['tenure_group'] == tenure]
        completion_rate = subset['target'].mean()
        print(f"  {tenure:15}: {completion_rate:.3f}")

In [None]:
tenure_fairness = analyze_fairness_by_attribute(df_test_fairness, 'tenure_group')

print("="*70)
print("TENURE GROUP FAIRNESS METRICS")
print("="*70)

display_cols = ['value', 'count', 'accuracy', 'precision', 'recall', 'f1', 
                'positive_rate', 'tpr', 'fpr', 'auc']
print(tenure_fairness[display_cols].round(4).to_string(index=False))

print("\n" + "="*70)
print("DISPARITIES FROM OVERALL")
print("="*70)

diff_cols = ['value', 'accuracy_pct_diff', 'precision_pct_diff', 
             'recall_pct_diff', 'f1_pct_diff']
print(tenure_fairness[diff_cols].round(2).to_string(index=False))

In [None]:
plot_fairness_comparison(tenure_fairness, 'tenure_group')
plot_positive_rates(tenure_fairness, 'tenure_group')

## 5. Confusion Matrices by Protected Groups

In [None]:
def plot_group_confusion_matrix(y_true, y_pred, group_name, ax):
    cm = confusion_matrix(y_true, y_pred)
    
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False, ax=ax,
                xticklabels=['Not Completed', 'Completed'],
                yticklabels=['Not Completed', 'Completed'])
    ax.set_xlabel('Predicted')
    ax.set_ylabel('Actual')
    ax.set_title(group_name, fontweight='bold')


fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Confusion Matrices by Gender', fontsize=16, fontweight='bold')

gender_values = [g for g in df_test_fairness['gender'].unique() if pd.notna(g)]
axes_flat = axes.flatten()

for idx, gender in enumerate(gender_values):
    mask = df_test_fairness['gender'] == gender
    plot_group_confusion_matrix(
        df_test_fairness[mask]['target'],
        df_test_fairness[mask]['prediction'],
        f'Gender: {gender} (n={mask.sum():,})',
        axes_flat[idx]
    )

for idx in range(len(gender_values), len(axes_flat)):
    axes_flat[idx].axis('off')

plt.tight_layout()
plt.show()

In [None]:
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Confusion Matrices by Age Group', fontsize=16, fontweight='bold')

age_values = sorted([a for a in df_test_fairness['age_group'].unique() if pd.notna(a)])
axes_flat = axes.flatten()

for idx, age in enumerate(age_values):
    mask = df_test_fairness['age_group'] == age
    plot_group_confusion_matrix(
        df_test_fairness[mask]['target'],
        df_test_fairness[mask]['prediction'],
        f'Age: {age} (n={mask.sum():,})',
        axes_flat[idx]
    )

for idx in range(len(age_values), len(axes_flat)):
    axes_flat[idx].axis('off')

plt.tight_layout()
plt.show()

## 6. ROC Curves by Protected Groups

In [None]:
def plot_roc_by_group(df, attribute, title_suffix):
    plt.figure(figsize=(10, 8))
    
    values = sorted([v for v in df[attribute].unique() if pd.notna(v)])
    
    for value in values:
        mask = df[attribute] == value
        if sum(mask) < 10:
            continue
        
        y_true = df[mask]['target']
        y_proba = df[mask]['prediction_proba']
        
        if len(np.unique(y_true)) < 2:
            continue
        
        fpr, tpr, _ = roc_curve(y_true, y_proba)
        auc_score = roc_auc_score(y_true, y_proba)
        
        plt.plot(fpr, tpr, label=f'{value} (AUC = {auc_score:.3f})', linewidth=2)
    
    plt.plot([0, 1], [0, 1], 'k--', label='Random', linewidth=1)
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title(f'ROC Curves by {title_suffix}', fontweight='bold')
    plt.legend(loc='lower right')
    plt.grid(alpha=0.3)
    plt.tight_layout()
    plt.show()

plot_roc_by_group(df_test_fairness, 'gender', 'Gender')
plot_roc_by_group(df_test_fairness, 'age_group', 'Age Group')
plot_roc_by_group(df_test_fairness, 'income_bracket', 'Income Bracket')

## 7. Intersectional Fairness Analysis

Analyze fairness across intersections of protected attributes (e.g., Gender √ó Age Group).

In [None]:
print("="*70)
print("INTERSECTIONAL FAIRNESS: GENDER √ó AGE GROUP")
print("="*70)

intersectional_results = []

for gender in ['M', 'F']:
    for age in sorted([a for a in df_test_fairness['age_group'].unique() if pd.notna(a)]):
        mask = (df_test_fairness['gender'] == gender) & (df_test_fairness['age_group'] == age)
        
        if sum(mask) < 10:
            continue
        
        y_true = df_test_fairness[mask]['target']
        y_pred = df_test_fairness[mask]['prediction']
        
        metrics = {
            'gender': gender,
            'age_group': age,
            'count': sum(mask),
            'accuracy': accuracy_score(y_true, y_pred),
            'precision': precision_score(y_true, y_pred, zero_division=0),
            'recall': recall_score(y_true, y_pred, zero_division=0),
            'f1': f1_score(y_true, y_pred, zero_division=0),
            'positive_rate': y_pred.mean()
        }
        
        intersectional_results.append(metrics)

intersectional_df = pd.DataFrame(intersectional_results)

print("\nIntersectional Fairness Metrics:")
print(intersectional_df.round(4).to_string(index=False))

print("\n" + "="*70)
print("MAXIMUM DISPARITY IN INTERSECTIONAL GROUPS")
print("="*70)

for metric in ['accuracy', 'precision', 'recall', 'f1', 'positive_rate']:
    if len(intersectional_df) > 0:
        max_val = intersectional_df[metric].max()
        min_val = intersectional_df[metric].min()
        disparity = max_val - min_val
        print(f"{metric:15}: Max={max_val:.3f}, Min={min_val:.3f}, Disparity={disparity:.3f}")

In [None]:
pivot_accuracy = intersectional_df.pivot(index='age_group', columns='gender', values='accuracy')
pivot_recall = intersectional_df.pivot(index='age_group', columns='gender', values='recall')
pivot_f1 = intersectional_df.pivot(index='age_group', columns='gender', values='f1')

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

sns.heatmap(pivot_accuracy, annot=True, fmt='.3f', cmap='RdYlGn', cbar_kws={'label': 'Accuracy'}, ax=axes[0])
axes[0].set_title('Accuracy by Gender √ó Age', fontweight='bold')

sns.heatmap(pivot_recall, annot=True, fmt='.3f', cmap='RdYlGn', cbar_kws={'label': 'Recall'}, ax=axes[1])
axes[1].set_title('Recall by Gender √ó Age', fontweight='bold')

sns.heatmap(pivot_f1, annot=True, fmt='.3f', cmap='RdYlGn', cbar_kws={'label': 'F1-Score'}, ax=axes[2])
axes[2].set_title('F1-Score by Gender √ó Age', fontweight='bold')

plt.tight_layout()
plt.show()

## 8. Summary and Recommendations

In [None]:
def generate_fairness_summary(gender_fairness, age_fairness, income_fairness, tenure_fairness):
    """Generate a comprehensive fairness summary."""
    
    summary = {}
    
    fairness_dfs = {
        'gender': gender_fairness,
        'age_group': age_fairness,
        'income_bracket': income_fairness,
        'tenure_group': tenure_fairness
    }
    
    for attr_name, df in fairness_dfs.items():
        if df is None or len(df) == 0:
            continue
        
        metrics_to_check = ['accuracy', 'precision', 'recall', 'f1', 'positive_rate']
        
        summary[attr_name] = {}
        
        for metric in metrics_to_check:
            if f'{metric}_pct_diff' in df.columns:
                max_diff = df[f'{metric}_pct_diff'].abs().max()
                min_diff = df[f'{metric}_pct_diff'].abs().min()
                
                if max_diff > 10:
                    risk_level = "HIGH"
                elif max_diff > 5:
                    risk_level = "MEDIUM"
                else:
                    risk_level = "LOW"
                
                summary[attr_name][metric] = {
                    'max_disparity_pct': round(max_diff, 2),
                    'risk_level': risk_level
                }
    
    return summary


fairness_summary = generate_fairness_summary(
    gender_fairness, age_fairness, income_fairness, tenure_fairness
)

print("="*70)
print("FAIRNESS ANALYSIS SUMMARY")
print("="*70)

for attr_name, metrics in fairness_summary.items():
    print(f"\n{attr_name.upper()}:")
    for metric_name, values in metrics.items():
        risk_emoji = {
            'HIGH': 'üî¥',
            'MEDIUM': 'üü°',
            'LOW': 'üü¢'
        }[values['risk_level']]
        print(f"  {metric_name:15}: {values['max_disparity_pct']:>6.2f}% disparity - {risk_emoji} {values['risk_level']}")

In [None]:
print("="*70)
print("RECOMMENDATIONS")
print("="*70)

recommendations = [
    {
        'category': 'Model Monitoring',
        'items': [
            "Implement ongoing fairness monitoring in production",
            "Set up alerts for fairness metric degradation > 5%",
            "Regularly audit model performance across protected groups"
        ]
    },
    {
        'category': 'Data Collection',
        'items': [
            "Ensure balanced representation across all groups",
            "Collect more data for underrepresented groups",
            "Address missing demographic data systematically"
        ]
    },
    {
        'category': 'Mitigation Strategies',
        'items': [
            "Consider fairness-aware algorithms if disparities are high",
            "Apply post-processing techniques to calibrate predictions",
            "Use reweighting strategies during training",
            "Test with/without sensitive attributes to understand bias sources"
        ]
    },
    {
        'category': 'Business Context',
        'items': [
            "Define acceptable fairness thresholds for your use case",
            "Balance fairness with business objectives",
            "Document trade-offs between accuracy and fairness",
            "Engage stakeholders in fairness decisions"
        ]
    },
    {
        'category': 'Governance',
        'items': [
            "Document fairness analysis process and results",
            "Create model cards documenting biases and limitations",
            "Establish fairness review process before deployment",
            "Consider regulatory requirements (e.g., EEOC guidelines)"
        ]
    }
]

for rec in recommendations:
    print(f"\n{rec['category']}:")
    for item in rec['items']:
        print(f"  ‚Ä¢ {item}")

In [None]:
os.makedirs('../Cafe_Rewards_Offers/fairness_analysis', exist_ok=True)

gender_fairness.to_csv('../Cafe_Rewards_Offers/fairness_analysis/gender_fairness.csv', index=False)
age_fairness.to_csv('../Cafe_Rewards_Offers/fairness_analysis/age_fairness.csv', index=False)
income_fairness.to_csv('../Cafe_Rewards_Offers/fairness_analysis/income_fairness.csv', index=False)
tenure_fairness.to_csv('../Cafe_Rewards_Offers/fairness_analysis/tenure_fairness.csv', index=False)
intersectional_df.to_csv('../Cafe_Rewards_Offers/fairness_analysis/intersectional_fairness.csv', index=False)

print("="*70)
print("‚úì FAIRNESS ANALYSIS RESULTS SAVED")
print("="*70)
print("\nSaved files:")
print("  - gender_fairness.csv")
print("  - age_fairness.csv")
print("  - income_fairness.csv")
print("  - tenure_fairness.csv")
print("  - intersectional_fairness.csv")
print("\nLocation: ../Cafe_Rewards_Offers/fairness_analysis/")

## Conclusion

This notebook provides a comprehensive bias and fairness analysis of the offer completion prediction model. Key takeaways:

### What Was Analyzed:
1. **Protected Attributes**: Gender, Age Group, Income Bracket, Tenure Group
2. **Fairness Metrics**: Accuracy, Precision, Recall, F1-Score, Positive Rate, TPR, FPR, AUC
3. **Bias Types**: Demographic parity, equal opportunity, predictive parity
4. **Intersectional Analysis**: Combinations of protected attributes

### Fairness Frameworks Applied:
- **Demographic Parity (80% Rule)**: Disparate impact between 0.8-1.25
- **Equal Opportunity**: Similar true positive rates across groups
- **Predictive Parity**: Similar precision across groups
- **Individual Fairness**: Similar predictions for similar individuals

### Next Steps:
1. Review the fairness analysis results
2. Identify groups with high disparities (>10%)
3. Implement appropriate mitigation strategies
4. Set up ongoing fairness monitoring
5. Document findings and create model cards

### Key Questions to Consider:
- Are the observed disparities acceptable for your business context?
- Do these disparities reflect real differences in customer behavior or model bias?
- What are the legal and ethical implications of these disparities?
- How can you balance business objectives with fairness considerations?

**Remember**: Fairness is context-dependent. What's "fair" depends on your specific use case, regulations, and values. Regular monitoring and iteration are essential.