# Bias and Fairness Assessment (Binary Classification: Adult Income)

Assessing Machine Learning models for bias and fairness is of great importance:

- Prevent Discrimination
  - Avoid unfair treatment based on protected attributes.

- Meet Legal Standards
  - Ensure compliance with laws and anti-discrimination acts.

- Build Trust
  - Fair models are more accepted by users, stakeholders, and regulators.

- Expose Hidden Gaps
  - Surface performance differences across demographic subgroups.

- Promote Ethical AI
  - Prevent reinforcement of societal or historical biases in data.

- Enable Accountability
  - Make models more transparent and open to external review.

- Guide Fairness Fixes
  - Identify where to apply debiasing or fairness-enhancing techniques

## Dataset Overview: UCI Adult Income Dataset
The **Adult Income dataset** (also known as the **Census Income** dataset) originates from the **UCI Machine Learning Repository**. It was extracted from the 1994 U.S. Census database and is widely used for benchmarking classification models, especially in fairness and bias research.

The task is to **predict whether an individual earns more than $50K per year** based on features such as age, education, occupation, and marital status.

- Target variable: income (binary: <=50K or >50K)

- Samples: 48,842

- Features: 14 demographic and employment-related attributes

- Use case: Benchmarking algorithms, fairness audits, and bias mitigation

Due to its inclusion of sensitive attributes (e.g., sex, race), it’s commonly used in studies evaluating algorithmic fairness and disparate impact.



# Modeling

In this notebook, we’ll train an XGBoost model to predict whether an individual’s annual income exceeds \$50K and then evaluate its performance and fairness across different demographic groups.

### Step 1: Install and import dependencies


In [None]:
! pip install equiboots

In [None]:
! pip install ucimlrepo

In [None]:
from ucimlrepo import fetch_ucirepo
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import classification_report

In [None]:
# fetch dataset
adult = fetch_ucirepo(id=2)
adult = adult.data.features.join(adult.data.targets, how="inner")

In [None]:
adult.head(3)

## Basic Preprocessing Steps

### 1. Drop missing values

In [None]:
# Drop missing values
adult.dropna(inplace=True)

### 2. Copy DataFrame for posterity

In [None]:
df = adult.copy()

In [None]:
adult["income"].value_counts()

### 3. Encode categorical variables

In [None]:
def outcome_merge(val):
    if val == "<=50K" or val == "<=50K.":
        return 0
    else:
        return 1

In [None]:
df["income"] = df["income"].apply(outcome_merge)

In [None]:
#  sex, count and percentages above_50k

income_by_sex = df.groupby("sex")["income"].agg(
    ["count", lambda x: (x.sum() / x.count()) * 100]
)
income_by_sex.columns = ["count", "percentage_above_50k"]
income_by_sex

In [None]:
#  race, count and percentages above_50k

income_by_race = df.groupby("race")["income"].agg(
    ["count", lambda x: (x.sum() / x.count()) * 100]
)
income_by_race.columns = ["count", "percentage_above_50k"]
income_by_race

### 4. Split the data

In [None]:
# Split data
X = df.drop("income", axis=1)
y = df["income"]

In [None]:
for col in X.columns:
    if isinstance(X[col], object):
        X[col] = X[col].astype("category")

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42,
)

In [None]:
y_train.value_counts()

## Train XGBoost Model

In [None]:
model = XGBClassifier(eval_metric="logloss", random_state=42, enable_categorical=True)
model.fit(X_train, y_train)

## Evaluate XGBoost Model

In [None]:
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)
print(classification_report(y_test, y_pred))

# Bias and Fairness Analysis with EquiBoots

**Equiboots supports a point estimate fairness analysis on a model's operating point (e.g., optimal threshold) as well as on multiple bootstraps with replacement.**


To initialize an analysis with equiboots:

1. Define a fairness Dataframe with the variables of interest.
2. Initialize an equiboots object using:
    - Ground truth (y_true)
    - Model probabilities (y_prob)
    - Model predictions (y_pred)
3. Identify the columns/variables that we will be assessing (e.g., race, sex)

In [None]:
import equiboots as eqb

In [None]:
# get predictions and true values
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:,1]
y_test = y_test.to_numpy()

X_test[['race', 'sex']] = X_test[['race', 'sex']].astype(str)

## Point Estimates

In [None]:
sensitive_features = ['race', 'sex']

fairness_df = X_test[sensitive_features].reset_index(drop=True)

eq = eqb.EquiBoots(y_true=y_test, y_pred=y_pred, y_prob=y_prob, fairness_df=fairness_df, fairness_vars=sensitive_features)

eq.grouper(groupings_vars=sensitive_features)

In [None]:
sliced_race_data = eq.slicer("race")
sliced_sex_data = eq.slicer("sex")

race_metrics = eq.get_metrics(sliced_race_data)
sex_metrics = eq.get_metrics(sliced_sex_data)

In [None]:
eqb.eq_plot_group_curves(
    sliced_race_data,
    curve_type="roc",
    title="ROC AUC by Race Group",
    exclude_groups=['Other', 'Amer-Indian-Eskimo']
)

In [None]:
test_config = {
    "test_type": "chi_square",
    "alpha": 0.05,
    "adjust_method": "bonferroni",
    "confidence_level": 0.95,
    "classification_task": "binary_classification"
}

In [None]:
stat_test_results_race = eq.analyze_statistical_significance(race_metrics, "race", test_config)

In [None]:
stat_test_results_sex = eq.analyze_statistical_significance(sex_metrics, "sex", test_config)

In [None]:
stat_test_results_sex

In [None]:
race_metrics.pop('Other')

In [None]:
eqb.eq_plot_metrics_forest(
    group_metrics=race_metrics,
    metric_name="dkdk",
    title="Forest Plot: Accuracy Across Groups",
    reference_group="White",
    statistical_tests=stat_test_results_race,
)