# Credit Scoring in FHE

In this notebook, we build and evaluate a model that predicts the chance that a given loan applicant defaults on loan repayment while keeping the user's data private using Fully Homomorphic Encryption (FHE). It is strongly inspired from an [existing notebook](https://www.kaggle.com/code/ajay1735/my-credit-scoring-model) found on Kaggle, which uses the [Home Equity (HMEQ) dataset](https://www.kaggle.com/code/ajay1735/my-credit-scoring-model/input). In addition, we compare the performance between the original scikit-learn models and their Concrete ML equivalent. 

### Import libraries

In [1]:
import time

import pandas as pd
from sklearn.ensemble import RandomForestClassifier as SklearnRandomForestClassifier
from sklearn.linear_model import LogisticRegression as SklearnLogisticRegression
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

# Import models from scikit-learn and XGBoost
from sklearn.tree import DecisionTreeClassifier as SklearnDecisionTreeClassifier
from xgboost import XGBClassifier as SklearnXGBoostClassifier

# Import models from Concrete ML
from concrete.ml.sklearn import DecisionTreeClassifier as ConcreteDecisionTreeClassifier
from concrete.ml.sklearn import LogisticRegression as ConcreteLogisticRegression
from concrete.ml.sklearn import RandomForestClassifier as ConcreteRandomForestClassifier
from concrete.ml.sklearn import XGBClassifier as ConcreteXGBoostClassifier

CONCRETE_ML_MODELS = [
    ConcreteDecisionTreeClassifier,
    ConcreteLogisticRegression,
    ConcreteRandomForestClassifier,
    ConcreteXGBoostClassifier,
]

### Load the HMEQ dataset

In [2]:
# Reading the dataset
df = pd.read_csv("hmeq.csv")

### Clean the dataset
Further details about dataset cleaning can be found the [original notebook](https://www.kaggle.com/code/ajay1735/my-credit-scoring-model).

In [3]:
# Replace missing values
df["REASON"].fillna(value="DebtCon", inplace=True)
df["JOB"].fillna(value="Other", inplace=True)
df["DEROG"].fillna(value=0, inplace=True)
df["DELINQ"].fillna(value=0, inplace=True)

df.fillna(value=df.mean(numeric_only=True), inplace=True)

In [4]:
df.head()

Unnamed: 0,BAD,LOAN,MORTDUE,VALUE,REASON,JOB,YOJ,DEROG,DELINQ,CLAGE,NINQ,CLNO,DEBTINC
0,1,1100,25860.0,39025.0,HomeImp,Other,10.5,0.0,0.0,94.366667,1.0,9.0,33.779915
1,1,1300,70053.0,68400.0,HomeImp,Other,7.0,0.0,2.0,121.833333,0.0,14.0,33.779915
2,1,1500,13500.0,16700.0,HomeImp,Other,4.0,0.0,0.0,149.466667,1.0,10.0,33.779915
3,1,1500,73760.8172,101776.048741,DebtCon,Other,8.922268,0.0,0.0,179.766275,1.186055,21.296096,33.779915
4,0,1700,97800.0,112000.0,HomeImp,Office,3.0,0.0,0.0,93.333333,0.0,14.0,33.779915


In [5]:
# Remove features BAD, JOB and REASON from the input feature set
x_basic = df.drop(columns=["BAD", "JOB", "REASON"])
y = df["BAD"]

### Credit scoring with Concrete ML
In the following step, we first define the scikit-learn models found in the original notebook and build their FHE equivalent model using Concrete ML. Then, we evaluate and compare them side by side using several metrics (accuracy, F1 score, recall, precision). For Concrete ML models, their inference's execution time is also provided when done in FHE.

In [6]:
def evaluate(
    model, x, y, test_size=0.33, show_circuit=False, predict_in_fhe=True, fhe_samples=None
):
    """Evaluate the given model using several metrics.

    The model is evaluated using the following metrics: accuracy, F1 score, precision, recall.
    For Concrete ML models, the inference's execution time is provided when done in FHE.

    Args:
        model: The initialized model to consider.
        x: The input data to consider.
        y: The target data to consider.
        test_size: The proportion to use for the test data. Default to 0.33.
        show_circuit: If the FHE circuit should be printed for Concrete ML models. Default to False.
        predict_in_fhe: If the inference should be executed in FHE for Concrete ML models. Else, it
            will only be simulated.
        fhe_sample: The number of samples to consider for evaluating the inference of Concrete ML
            models if predict_in_fhe is set to True. If None, the complete test set is used. Default
            to None.
    """
    evaluation_result = {}

    is_concrete_ml = model.__class__ in CONCRETE_ML_MODELS

    name = model.__class__.__name__ + (" (Concrete ML)" if is_concrete_ml else " (sklearn)")

    evaluation_result["name"] = name

    print(f"Evaluating model {name}")

    # Split the data into test and train sets. Stratify is used to make sure that the test set
    # contains some representative class distribution for targets
    x_train, x_test, y_train, y_test = train_test_split(
        x, y, stratify=y, test_size=test_size, random_state=1
    )
    test_length = len(x_test)

    evaluation_result["Test samples"] = test_length

    evaluation_result["n_bits"] = model.n_bits if is_concrete_ml else None

    # Normalization pipeline
    model = Pipeline(
        [
            ("preprocessor", StandardScaler()),
            ("model", model),
        ]
    )

    # Train the model
    model.fit(x_train, y_train)

    # Run the prediction and store its execution time
    y_pred = model.predict(x_test)

    # Evaluate the model
    # For Concrete ML models, this will execute the (quantized) inference in the clear
    evaluation_result["Accuracy (clear)"] = accuracy_score(y_test, y_pred)
    evaluation_result["F1 (clear)"] = f1_score(y_test, y_pred, average="macro")
    evaluation_result["Precision (clear)"] = precision_score(y_test, y_pred, average="macro")
    evaluation_result["Recall (clear)"] = recall_score(y_test, y_pred, average="macro")

    # If the model is from Concrete ML
    if is_concrete_ml:

        print("Compile the model")

        # Compile the model using the training data
        circuit = model["model"].compile(x_train)  # pylint: disable=no-member

        # Print the FHE circuit if needed
        if show_circuit:
            print(circuit)

        # Retrieve the circuit's max bit-width
        evaluation_result["max bit-width"] = circuit.graph.maximum_integer_bit_width()

        print("Predict (simulated)")

        # Run the prediction in the clear using FHE simulation, store its execution time and
        # evaluate the accuracy score
        y_pred_simulate = model.predict(x_test, fhe="simulate")

        evaluation_result["Accuracy (simulated)"] = accuracy_score(y_test, y_pred_simulate)

        # Run the prediction in FHE, store its execution time and evaluate the accuracy score
        if predict_in_fhe:
            if fhe_samples is not None:
                x_test = x_test[0:fhe_samples]
                y_test = y_test[0:fhe_samples]
                test_length = fhe_samples

            evaluation_result["FHE samples"] = test_length

            print("Predict (FHE)")

            before_time = time.time()
            y_pred_fhe = model.predict(x_test, fhe="execute")
            evaluation_result["FHE execution time (second per sample)"] = (
                time.time() - before_time
            ) / test_length

            evaluation_result["Accuracy (FHE)"] = accuracy_score(y_test, y_pred_fhe)

    print("Done !\n")

    return evaluation_result

### Run the evaluation
In the following, we evaluate several types of classifiers : logistic regression, decision tree, random forest and XGBoost.

In [7]:
results = []

# Define the test size proportion
test_size = 0.2

# For testing FHE execution locally, define the number of inference to run. If None, the complete
# test set is used
fhe_samples = None

# Logistic regression
results.append(evaluate(SklearnLogisticRegression(), x_basic, y, test_size=test_size))
results.append(evaluate(ConcreteLogisticRegression(), x_basic, y, test_size=test_size))

# Define the initialization parameters for tree-based models
init_params_dt = {"max_depth": 10}
init_params_rf = {"max_depth": 7, "n_estimators": 5}
init_params_xgb = {"max_depth": 7, "n_estimators": 5}
init_params_cml = {"n_bits": 3}

# Determine the type of models to evaluate
use_dt = True
use_rf = True
use_xgb = True
predict_in_fhe = True

# Decision tree models
if use_dt:

    # Scikit-Learn model
    results.append(
        evaluate(
            SklearnDecisionTreeClassifier(**init_params_dt),
            x_basic,
            y,
            test_size=test_size,
        )
    )

    # Concrete ML model
    results.append(
        evaluate(
            ConcreteDecisionTreeClassifier(**init_params_dt, **init_params_cml),
            x_basic,
            y,
            test_size=test_size,
            predict_in_fhe=predict_in_fhe,
            fhe_samples=fhe_samples,
        )
    )

# Random Forest
if use_rf:

    # Scikit-Learn model
    results.append(
        evaluate(
            SklearnRandomForestClassifier(**init_params_rf),
            x_basic,
            y,
            test_size=test_size,
        )
    )

    # Concrete ML model
    results.append(
        evaluate(
            ConcreteRandomForestClassifier(**init_params_rf, **init_params_cml),
            x_basic,
            y,
            test_size=test_size,
            predict_in_fhe=predict_in_fhe,
            fhe_samples=fhe_samples,
        )
    )

# XGBoost
if use_xgb:

    # Scikit-Learn model
    results.append(
        evaluate(
            SklearnXGBoostClassifier(**init_params_xgb),
            x_basic,
            y,
            test_size=test_size,
        )
    )

    # Concrete ML model
    results.append(
        evaluate(
            ConcreteXGBoostClassifier(**init_params_xgb, **init_params_cml),
            x_basic,
            y,
            test_size=test_size,
            predict_in_fhe=predict_in_fhe,
            fhe_samples=fhe_samples,
        )
    )

Evaluating model LogisticRegression (sklearn)
Done !

Evaluating model LogisticRegression (Concrete ML)
Compile the model
Predict (simulated)
Predict (FHE)
Done !

Evaluating model DecisionTreeClassifier (sklearn)
Done !

Evaluating model DecisionTreeClassifier (Concrete ML)
Compile the model
Predict (simulated)
Predict (FHE)
Done !

Evaluating model RandomForestClassifier (sklearn)
Done !

Evaluating model RandomForestClassifier (Concrete ML)
Compile the model
Predict (simulated)
Predict (FHE)
Done !

Evaluating model XGBClassifier (sklearn)
Done !

Evaluating model XGBClassifier (Concrete ML)
Compile the model
Predict (simulated)
Predict (FHE)
Done !



### Compare the models

Let's compare the models' performance in a pandas Dataframe. We can see that with only a few bits of quantization, the Concrete models perform as well as their scikit-learn equivalent. More precisely, the small differences that can be observed are only the result of quantization: running the inference in FHE does not impact the accuracy score.

In [8]:
pd.set_option("display.precision", 3)

results_dataframe = pd.DataFrame(results)
results_dataframe.fillna("")

Unnamed: 0,name,Test samples,n_bits,Accuracy (clear),F1 (clear),Precision (clear),Recall (clear),max bit-width,Accuracy (simulated),FHE samples,FHE execution time (second per sample),Accuracy (FHE)
0,LogisticRegression (sklearn),1192,,0.824,0.627,0.748,0.606,,,,,
1,LogisticRegression (Concrete ML),1192,8.0,0.824,0.627,0.748,0.606,18.0,0.824,1192.0,0.001,0.824
2,DecisionTreeClassifier (sklearn),1192,,0.879,0.783,0.843,0.75,,,,,
3,DecisionTreeClassifier (Concrete ML),1192,3.0,0.852,0.705,0.818,0.67,4.0,0.848,1192.0,0.194,0.848
4,RandomForestClassifier (sklearn),1192,,0.872,0.761,0.839,0.724,,,,,
5,RandomForestClassifier (Concrete ML),1192,3.0,0.84,0.645,0.836,0.618,4.0,0.84,1192.0,0.295,0.84
6,XGBClassifier (sklearn),1192,,0.888,0.806,0.846,0.78,,,,,
7,XGBClassifier (Concrete ML),1192,3.0,0.841,0.647,0.848,0.619,4.0,0.841,1192.0,0.226,0.841
