# COS598I Spring 2024: Responsible AI in Societal Deployments
**Authors:** Lelia Marie Hampton, Prof. Marzyeh Ghassemi

Assignment adapted from MIT 6.882: Ethical Machine Learning in Human Deployments


## Assignment 2: Algorithmic Fairness Exploration

#### Guidelines

Add helper functions as needed. **Do NOT use outside libraries** such as AI Fairness 360 Toolkit or What-If Toolkit. It is important to learn the mechanics of the underlying approaches.

Please include comments (both block comments and inline comments) so that we can easily understand what you're doing.

#### Resources

[Fairness Definitions Explained](https://fairware.cs.umass.edu/papers/Verma.pdf)

[Taiwan Default of Credit Card Clients](https://archive.ics.uci.edu/dataset/350/default+of+credit+card+clients)

- [Paper](https://bradzzz.gitbooks.io/ga-dsi-seattle/content/dsi/dsi_05_classification_databases/2.1-lesson/assets/datasets/DefaultCreditCardClients_yeh_2009.pdf) See Section 3.1 for description of the dataset features

- [Dataset](https://archive.ics.uci.edu/static/public/350/default+of+credit+card+clients.zip) You can download the dataset from this URL or from the Ed Resources Tab

[Machine Learning Glossary: Fairness](https://developers.google.com/machine-learning/glossary/fairness)

[Chapter 4. Fairness Pre-Processing](https://www.oreilly.com/library/view/practical-fairness/9781492075721/ch04.html)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_predict, RepeatedStratifiedKFold
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, roc_auc_score
from sklearn.model_selection import train_test_split

In [None]:
# preprocessing code for Taiwan Default dataset

def load_dataset(filename='default of credit card clients.xls'):
    # Load the dataset
    df = pd.read_excel(filename, header=1)

    # Define categorical and numerical features
    categorical_features = ['SEX', 'EDUCATION', 'MARRIAGE', 'PAY_0', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6']
    numerical_features = ['LIMIT_BAL', 'AGE', 'BILL_AMT1', 'BILL_AMT2', 'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6',
                          'PAY_AMT1', 'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6']

    # Separate features and target variable
    X = df.drop(columns=['ID', 'default payment next month'])
    y = df['default payment next month']

    # Preprocessing: One-hot encoding for categorical variables and scaling for numerical variables
    preprocessor = ColumnTransformer(
        transformers=[
            ('num', StandardScaler(), numerical_features),
            ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
        ])

    # Extract gender as a binary attribute before any transformations
    gender_binary = df['SEX'].apply(lambda x: 1 if x == 2 else 0)

    # Split the dataset into training and testing sets with a fixed random state for reproducibility
    X_train_full, X_test_full, y_train, y_test, gender_train, gender_test = train_test_split(X, y, gender_binary, test_size=0.2, random_state=42)

    # Apply preprocessing to training and testing set separately
    X_train_processed = preprocessor.fit_transform(X_train_full)
    X_test_processed = preprocessor.transform(X_test_full)

    # Return processed training and testing sets along with gender attributes
    return X_train_processed, X_test_processed, y_train, y_test, gender_train.values, gender_test.values

In [None]:
def train_and_predict_model(X_train, X_test, y_train, weights=None):

    # Initialize the Logistic Regression model
    model = LogisticRegression(max_iter=10000, random_state=0)

    # Train the Logistic Regression model
    model.fit(X_train, y_train, sample_weight=weights)

    # Predict on the testing set
    y_pred = model.predict(X_test)
    y_pred_proba = model.predict_proba(X_test)[:, 1]

    return y_pred, y_pred_proba

In [None]:
# preprocess and load the data
X_train, X_test, y_train, y_test, gender_train, gender_test = load_dataset()

# train a model and obtain predictions on the test set
y_pred, y_pred_proba = train_and_predict_model(X_train, X_test, y_train)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
auc = roc_auc_score(y_test, y_pred_proba)

print('Accuracy:', accuracy, '\nAUC:', auc)

In [None]:
def determine_confusion_matrix(df):
    if df['y_true'] == df['y_pred'] == 1:
        return 'TP'
    elif df['y_pred'] == 1 and df['y_true'] != df['y_pred']:
        return 'FP'
    elif df['y_true'] == df['y_pred'] == 0:
        return 'TN'
    else:
        return 'FN'

In [None]:
# Female = 1 and Male = 0
fair_df = pd.DataFrame({'sex': gender_test, 'y_true': y_test, 'y_pred': y_pred})
fair_df['confusion_matrix'] = fair_df[['y_true','y_pred']].apply(determine_confusion_matrix, axis=1)

In [None]:
fair_df.head()

In [None]:
fair_df['sex'].value_counts()

## Q1) Detecting Algorithmic Bias through Fairness Measurements and Definitions
Fill in the code for the fairness definitions.

#### Statistical Parity (Demographic Parity) **5 points**
A classifier satisfies this definition if subjects in both protected and unprotected groups have equal probability of being assigned to the positive predicted class.

$$P(y_{pred} = 1|G = male) = P(y_{pred} = 1|G = female )$$

Recall the formula for conditional probability: $P(B|A)=\frac{P(A \cap B)}{P(A)}$

In [None]:
def statistical_parity(df):
    """
    TODO: Add your code here
    """

    print('Female Probability of Positive Predictions: %.3f' % female_positive_prob)
    print('Male Probability of Positive Predictions: %.3f' % male_positive_prob)
    print('Achieves Statistical Parity: %r' % (female_positive_prob == male_positive_prob))

In [None]:
statistical_parity(fair_df)

#### Predictive Value Parity **5 points**
Positive predictive value (PPV): the fraction of positive cases correctly predicted to be in the positive class out of all predicted positive cases.
$$\frac{TP}{TP+FP}$$
A classifier satisfies this definition if both protected and unprotected groups have equal PPV – the probability of a subject with positive predictive value to truly belong to the positive class.
$$ P(y_{true}=1|y_{pred} = 1,G = male) = P(y_{true} = 1|y_{pred} = 1,G = female ) $$

In [None]:
def predictive_parity(df):
    """
    TODO: Add your code here
    """

    print('Female Probability of True Positive Predictions: %.3f' % PPV_female)
    print('Male Probability of True Positive Predictions: %.3f' % PPV_male)
    print('Achieves Statistical Parity: %r' % (PPV_female == PPV_male))

In [None]:
predictive_parity(fair_df)

#### Equalized Odds (Error Rate Balance) **5 points**
**False Positive Error Rate Balance**: A classifier satisfies this definition if both protected and unprotected groups have equal False Positive Rate (FPR) – the probability of a subject in the negative class to have a positive predictive value.
$$P(y_{pred} = 1|y_{true} = 0,G = male) = P(y_{pred} = 1|y_{true} = 0,G = female) $$
**False Negative Error Rate Balance (Equal Opportunity)**: A classifier satisfies this definition if both protected and unprotected groups have equal False Negative Rate (FNR) – the probability of a subject in a positive class to have a negative predictive value.
$$P(y_{pred} = 0|y_{true} = 1,G = male) = P(y_{pred} = 0|y_{true} = 1,G = female)$$
The definition of equalized odds combines the previous two: a classifier satisfies the definition if protected and unprotected groups have equal True Positive Rate (TPR) and equal FPR. Mathematically, it is equivalent to the conjunction of conditions for false positive error rate balance and false negative error rate balance definitions given above.
$$P(y_{pred} = 1|y_{true} = i,G = male) = P(y_{pred}=1|y_{true}=i,G=female),i \in 0,1$$

In [None]:
def equalized_odds(df):
    """
    TODO: Add your code here
    """

    print('Probability of Credit-Worthy Female Predicted Not Credit-Worthy: %.3f' % fnr_female)
    print('Probability of Credit-Worthy Male Predicted Not Credit-Worthy: %.3f' % fnr_male)
    print('Achieves Equality of Non Credit Worthy Prediction: %r' % (fnr_female == fnr_male))
    print('Probability of Non Credit-Worthy Female Predicted Credit-Worthy: %.3f' % fpr_female)
    print('Probability of Non Credit-Worthy Male Predicted Credit-Worthy: %.3f' % fpr_male)
    print('Achieves Equality of Credit Worthy Prediction: %r' % (fpr_female == fpr_male))

In [None]:
equalized_odds(fair_df)

#### Accuracy Equality **5 points**

A classifier satisfies this definition if both protected and unprotected groups have equal prediction accuracy – the probability of a subject from either positive or negative class to be assigned to its respective class. The definition assumes that true negatives are as desirable as true positives.

In [None]:
def accuracy_equality(df):
    """
    TODO: Add your code here
    """
    print('Female Accuracy: %.3f' % accuracy_female)
    print('Male Accuracy: %.3f' % accuracy_male)
    print('Equality of Accuracy: %r' % (accuracy_female == accuracy_male))

In [None]:
accuracy_equality(fair_df)

#### Treatment Equality **5 points**
This definition looks at the ratio of errors that the classifier makes rather than at its accuracy. A
classifier satisfies this definition if both protected and unprotected groups have an equal ratio of false negatives and false positives.

$$\frac{FN_{male}}{FP_{male}}=\frac{FN_{female}}{FP_{female}}$$

In [None]:
def treatment_equality(df):
    """
    TODO: Add your code here
    """

    print('Female Ratio of Errors: %.3f' % ratio_female)
    print('Male Ratio of Errors: %.3f' % ratio_male)
    print('Achieves Treatment Equality: %r' % (ratio_female == ratio_male))

In [None]:
treatment_equality(fair_df)

## Q3 Mitigating Algorithmic Bias through Pre-Processing and Post-Processing
Satisfying algorithmic fairness definitions can be done through a variety of methods, including pre-, in-, and post-processing. In this homework, we will explore pre-processing and post-processing as approaches to mitigate algorthmic bias.

### Pre-Processing
We will start with pre-processing methods. Pre-processing methods alter the dataset in order to satisfy one or more fairness metrics.

#### Fairness through Unawareness **5 points**
Fairness through unawareness is an approach to fairness mitigation in which protected/sensitive attributes are not used to train the model. Run the model without the feature for **sex**. This approach is known as fairness through unwareness.

Source: [Fairness Through Unwareness](https://ocw.mit.edu/resources/res-ec-001-exploring-fairness-in-machine-learning-for-international-development-spring-2020/module-three-framework/fairness-criteria/MITRES_EC001S19_video6.pdf)

In [None]:
# preprocessing code for Taiwan Default dataset

def load_dataset_unawareness(filename='default of credit card clients.xls'):
    # Load the dataset
    df = pd.read_excel(filename, header=1)

    '''
    TODO: Add your code here to drop the 'sex' feature
    '''

    # Return processed training and testing sets along with gender attributes
    return X_train_processed, X_test_processed, y_train, y_test, gender_train.values, gender_test.values

# preprocess and load the data
X_train, X_test, y_train, y_test, gender_train, gender_test = load_dataset_unawareness()

# train a model and obtain predictions on the test set
y_pred, y_pred_proba = train_and_predict_model(X_train, X_test, y_train)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
auc = roc_auc_score(y_test, y_pred_proba)

print(accuracy, auc)

# Female = 1 and Male = 0
fair_df = pd.DataFrame({'sex': gender_test, 'y_true': y_test, 'y_pred': y_pred})
fair_df['confusion_matrix'] = fair_df[['y_true','y_pred']].apply(determine_confusion_matrix, axis=1)

Use the fairness measurement functions you created to measure the fairness of this method.

In [None]:
statistical_parity(fair_df)
predictive_parity(fair_df)
equalized_odds(fair_df)
accuracy_equality(fair_df)
treatment_equality(fair_df)

#### Reweighing **5 points**
Reweighing assigns weights for each tuple in the dataset. For those with marginalized protected attributes, positive outcomes will receive greater weights than negative outcomes. For those with non-marginalized sensitive attributes, negative outcomes will receive greater weights than positive outcomes. We assume that we want to remove all discrimination while maintaining the overall positive class probability.

For $s \in \{m,f\}$ and $c \in \{0,1\}$, we define the 'unbiased' weights as:
$$W(s,c)=\frac{|X(S)=s| \times |X(C)=c|}{|D| \times | X(S)=s \cap X(C)=c|}$$
where $W$ is the reweight function, $D$ is the dataset, $X \in D$ represents a point in the dataset, $c \in C$ represents the classes, and $s \in S$ represents the sensitive attribute of interest.

In our case, we should have 4 different weights.

Source: [Data preprocessing techniques for classification without discrimination](https://core.ac.uk/download/pdf/81728147.pdf) - See Algorithm 3

**Reweigh the dataset and then create a model for the new dataset.**

In [None]:
def calculate_weights(df):
    # variables to reflect the number of certain attributes
    # add other variables if desired
    dataset_size = 6000
    male_size = 2406
    female_size = 3594


    """TODO: add your code here"""
    # To determine the negative variables (i.e., y = 0), simply subtract
    # the following variables from male_size or female_size
    male_1_size =
    female_1_size =
    male_0_size = male_size - male_1_size
    female_0_size = female_size - female_1_size

    # reweighting scheme for each scenario
    w_male_0 =
    w_male_1 =
    w_female_0 =
    w_female_1 =

    return w_male_0, w_male_1, w_female_0, w_female_1

In [None]:
def reweigh_df(df):
    """
    Multiply each reweighting scheme by the appropriate subset of the dataset
    """
    # Get the reweighting scheme
    w_male_0, w_male_1, w_female_0, w_female_1 = calculate_weights(df)

    '''
    TODO: add code here
    '''

    df['weight'] =

    return df

In [None]:
# Female = 1 and Male = 0
X_train, X_test, y_train, y_test, gender_train, gender_test = load_dataset()
df_weight = reweigh_df(pd.DataFrame({'sex': gender_train, 'y_true': y_train}))
print(df_weight['weight'].value_counts())
y_pred, y_pred_proba = train_and_predict_model(X_train, X_test, y_train, df_weight['weight'])
accuracy = accuracy_score(y_test, y_pred)
auc = roc_auc_score(y_test, y_pred_proba)

print(accuracy, auc)

fair_df = pd.DataFrame({'sex': gender_test, 'y_true': y_test, 'y_pred': y_pred})
fair_df['confusion_matrix'] = fair_df[['y_true','y_pred']].apply(determine_confusion_matrix, axis=1)

Use the fairness measurement functions you created to measure the fairness of this method.

In [None]:
statistical_parity(fair_df)
predictive_parity(fair_df)
equalized_odds(fair_df)
accuracy_equality(fair_df)
treatment_equality(fair_df)

#### Relabeling **5 points**
Another approach to pre-processing is relabeling. Relabeling changes the labels of some objects in the dataset in order to attempt to remove the discrimination from the input data.

One way to relabel the data is to identify likely unfair/discriminatory decisions and correct these by changing the outcome to what ought to have happened. The other way is to change the labeled sensitive class rather than the outcome, and this can be done either randomly or in a systematic way to correct discrimination.

Let's keep it simple. Randomly reassign the sensitive attribute **sex** of each individual data point and create a model for this new dataset.

In [None]:
# preprocessing code for Taiwan Default dataset

def load_dataset_relabel(filename='default of credit card clients.xls'):
    # Load the dataset
    df = pd.read_excel(filename, header=1)

    '''
    TODO: Add code here to randomly reassign 'sex' of each individual datapoint
    '''

    # Return processed training and testing sets along with gender attributes
    return X_train_processed, X_test_processed, y_train, y_test, gender_train.values, gender_test.values

# preprocess and load the data
X_train, X_test, y_train, y_test, gender_train, gender_test = load_dataset_relabel()

# train a model and obtain predictions on the test set
y_pred, y_pred_proba = train_and_predict_model(X_train, X_test, y_train)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
auc = roc_auc_score(y_test, y_pred_proba)

print(accuracy, auc)

# Female = 1 and Male = 0
fair_df = pd.DataFrame({'sex': gender_test, 'y_true': y_test, 'y_pred': y_pred})
fair_df['confusion_matrix'] = fair_df[['y_true','y_pred']].apply(determine_confusion_matrix, axis=1)

Use the fairness measurement functions you created to measure the fairness of this method.

In [None]:
statistical_parity(fair_df)
predictive_parity(fair_df)
equalized_odds(fair_df)
accuracy_equality(fair_df)
treatment_equality(fair_df)