# Credit Risk Scorecard Model Validation

## Before you begin
To use the ValidMind Developer Framework with a Jupyter notebook, you need to install and initialize the client library first, along with getting your Python environment ready.

If you don't already have one, you should also create a documentation project on the ValidMind platform. You will use this project to upload your documentation and test results.

## Install the client library

In [1]:
# %pip install --upgrade validmind

## Initialize the client library
In a browser, go to the Client Integration page of your documentation project and click Copy to clipboard next to the code snippet. This code snippet gives you the API key, API secret, and project identifier to link your notebook to your documentation project.

This step requires a documentation project. Learn how you can create one.

Next, replace this placeholder with your own code snippet:

In [2]:
import validmind as vm

vm.init(
  api_host = "http://localhost:3000/api/v1/tracking",
  api_key = "2494c3838f48efe590d531bfe225d90b",
  api_secret = "4f692f8161f128414fef542cab2a4e74834c75d01b3a8e088a1834f2afcfe838",
  project = "clk00h0u800x9qjy67gduf5om"
)

2023-08-14 11:40:38,598 - INFO(validmind.api_client): Connected to ValidMind. Project: [6] Credit Risk Scorecard - Initial Validation (clk00h0u800x9qjy67gduf5om)


## Setup

#### Introduction

The **Credit risk Scorecard** model created from the Lending Club dataset is instrumental in computing the Probability of Default (PD), a key factor in ECL calculations. This scorecard assesses several credit characteristics of potential borrowers, like their credit history, income, outstanding debts, and more, each of which is assigned a specific score. By combining these scores, we derive a total score for each borrower, which translates into an estimated Point-in-Time (PiT) PD. The PiT PD reflects the borrower's likelihood of default at a specific point in time, accounting for both current and foreseeable future conditions.

Additionally, for a holistic view of credit risk, it's essential to estimate the Lifetime PD. The Lifetime PD, as the name suggests, predicts the borrower's likelihood of default throughout the life of the exposure, taking into account potential future changes in the economic and financial conditions.

#### Import Libraries

In [3]:
# Load API key and secret from environment variables
%load_ext dotenv
%dotenv .env

from IPython.display import HTML
from notebooks.probability_of_default.helpers.Developer import Developer
from notebooks.probability_of_default.helpers.scorecard_tasks import *
from notebooks.probability_of_default.helpers.model_development_tasks import *

# Visualization imports
%matplotlib inline

#### Model Validation Parameters

In [4]:
default_column = "default"

#### Load Datasets and Models

In [5]:
developer = Developer()
scorecard = developer.load_objects_from_pickle("datasets/scorecard_data_and_models.pkl")

df_1 = scorecard["df_1"]
df_3 = scorecard["df_3"]
df_4 = scorecard["df_4"]

df_train_1 = scorecard["df_train_1"]
df_train_2 = scorecard["df_train_2"]
df_train_3 = scorecard["df_train_3"]

df_train_6 = scorecard["df_train_6"]
df_test_6 = scorecard["df_test_6"]

model_fit_2 = scorecard["model_fit_2"]

INFO: Loaded 10 objects from datasets/scorecard_data_and_models.pkl


## Model validation

#### Validation Plan

In [6]:
validaiton_plan = scorecard["df_validation"]
display(HTML(validaiton_plan.to_html(escape=False)))

Unnamed: 0,Area ID,Task ID,Input,Output,Validation Tests
0,data_description,import_raw_data,lending_club_url,df_1,descriptive_statistics missing_values_bar_plot
1,data_preparation,drop_features,"df_1, preliminary_features_to_drop",df_2,none
2,data_preparation,add_default_definition,"df_2, default_column",df_3,missing_values_bar_plot class_imbalance iqr_outliers_table
3,data_preparation,remove_features_missing_values,"df_3, min_missing_percentage",df_4,missing_values_bar_plot
4,data_preparation,convert_term_column,df_4,df_5,none
5,data_preparation,convert_emp_length_column,df_5,df_6,none
6,data_preparation,convert_inq_last_6mths_column,df_6,df_7,none
7,data_sampling,data_split,"df_7, default_column","df_train_1,df_test_1",tabular_numerical_histograms high_cardinality tabular_categorical_bar_plots
8,exploratory_data_analysis,drop_categories,df_train_1,df_train_2,target_rate_bar_plots
9,exploratory_data_analysis,drop_features,"df_train_2, final_features_to_drop",df_train_3,chi_squared_features_table anova_one_way_table pearson_correlation_matrix feature_target_correlation_plot woe_bin_table woe_bin_table woe_bin_plots


#### Create ValidMind Datasets

In [7]:
vm_df_1 = vm.init_dataset(dataset=df_1, target_column=default_column)
vm_df_3 = vm.init_dataset(dataset=df_3, target_column=default_column)
vm_df_4 = vm.init_dataset(dataset=df_4, target_column=default_column)
vm_df_train_1 = vm.init_dataset(dataset=df_train_1, target_column=default_column)
vm_df_train_2 = vm.init_dataset(dataset=df_train_2, target_column=default_column)
vm_df_train_3 = vm.init_dataset(dataset=df_train_3, target_column=default_column)

2023-08-14 11:40:39,315 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
INFO: Pandas dataset detected. Initializing VM Dataset instance...
2023-08-14 11:40:46,480 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
INFO: Pandas dataset detected. Initializing VM Dataset instance...
2023-08-14 11:40:48,028 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
INFO: Pandas dataset detected. Initializing VM Dataset instance...
2023-08-14 11:40:49,080 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
INFO: Pandas dataset detected. Initializing VM Dataset instance...
2023-08-14 11:40:50,029 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
INFO: Pandas dataset detected. Initializing VM Dataset instance...
2023-08-14 11:40:50,716 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...

#### Create ValidMind Model

In [8]:
vm_df_train = vm.init_dataset(dataset=df_train_6, target_column=default_column)
vm_df_test = vm.init_dataset(dataset=df_test_6, target_column=default_column)

vm_model_fit_2 = vm.init_model(
    model = model_fit_2, 
    train_ds=vm_df_train, 
    test_ds=vm_df_test)

2023-08-14 11:40:51,289 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
INFO: Pandas dataset detected. Initializing VM Dataset instance...
2023-08-14 11:40:52,116 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
INFO: Pandas dataset detected. Initializing VM Dataset instance...


#### Run All Validation Tests

In [9]:
from validmind.vm_models.test_context import TestContext
from validmind.tests.data_validation.DescriptiveStatistics import DescriptiveStatistics

test_context_1 = TestContext(dataset=vm_df_1)

metric = DescriptiveStatistics(test_context_1)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>This section provides descriptive statistics for numerical and categorical varia…

In [10]:
from validmind.tests.data_validation.MissingValuesBarPlot import MissingValuesBarPlot

params = {"threshold": 80,
          "fig_height": 1100}

metric = MissingValuesBarPlot(test_context_1, params)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>Generates a visual analysis of missing values by plotting horizontal bar plots w…

In [11]:
test_context_3 = TestContext(dataset=vm_df_3)

params = {"threshold": 80,
          "fig_height": 1100}

metric = MissingValuesBarPlot(test_context_3, params)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>Generates a visual analysis of missing values by plotting horizontal bar plots w…

In [12]:
from validmind.tests.data_validation.ClassImbalance import ClassImbalance

metric = ClassImbalance(test_context_3)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='\n            <h2>Class Imbalance ❌</h2>\n            <p>The class imbalance test m…

In [13]:
from validmind.tests.data_validation.IQROutliersTable import IQROutliersTable

num_features = get_numerical_columns(df_3)
params = {"num_features": num_features,
          "threshold": 1.5
        }

metric = IQROutliersTable(test_context_3, params)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>Analyzes the distribution of outliers in numerical features using the Interquart…

In [14]:
from validmind.tests.data_validation.IQROutliersBarPlot import IQROutliersBarPlot

num_features = get_numerical_columns(df_3)
params = {"num_features": num_features,
          "threshold": 1.5,
          "fig_width": 500}

metric = IQROutliersBarPlot(test_context_3, params)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>Generates a visual analysis of the outliers for numeric variables based on perce…

In [15]:
from validmind.tests.data_validation.TabularNumericalHistograms import TabularNumericalHistograms

test_context_train_1 = TestContext(dataset=vm_df_train_1)

metric = TabularNumericalHistograms(test_context_train_1)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>Generates a visual analysis of numerical data by plotting the histogram. The inp…

In [16]:
from validmind.tests.data_validation.HighCardinality import HighCardinality
metric = HighCardinality(test_context_train_1)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='\n            <h2>Cardinality ✅</h2>\n            <p>The high cardinality test meas…

In [17]:
from validmind.tests.data_validation.TabularCategoricalBarPlots import TabularCategoricalBarPlots
metric = TabularCategoricalBarPlots(test_context_train_1)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>Generates a visual analysis of categorical data by plotting bar plots. The input…

In [18]:
from validmind.tests.data_validation.TargetRateBarPlots import TargetRateBarPlots

test_context_train_2 = TestContext(dataset=vm_df_train_2)

# Configure the metric
params = {
    "default_column": default_column,
    "columns": None
}

metric = TargetRateBarPlots(test_context_train_2, params=params)
metric.run()
await metric.result.log()
metric.result.show()

The column default is correct and contains only 1 and 0.


VBox(children=(HTML(value='<p>Generates a visual analysis of target ratios by plotting bar plots. The input da…

In [19]:
from validmind.tests.data_validation.ChiSquaredFeaturesTable import ChiSquaredFeaturesTable

test_context_train_3 = TestContext(dataset=vm_df_train_3)

cat_features = get_categorical_columns(df_train_3)
params = {"cat_features": cat_features,
          "p_threshold": 0.05}

metric = ChiSquaredFeaturesTable(test_context_train_3, params)
metric.run()
await metric.result.log() 
metric.result.show()

VBox(children=(HTML(value='<p>Perform a Chi-Squared test of independence for each categorical variable with th…

In [20]:
from validmind.tests.data_validation.ANOVAOneWayTable import ANOVAOneWayTable

num_features = get_numerical_columns(df_train_3)
params = {"num_features": num_features,
          "p_threshold": 0.05}

metric = ANOVAOneWayTable(test_context_train_3, params)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>Perform an ANOVA F-test for each numerical variable with the target. The input d…

In [21]:
from validmind.tests.data_validation.PearsonCorrelationMatrix import PearsonCorrelationMatrix

params = {"declutter": False,
          "features": None,
          "fontsize": 13}

metric = PearsonCorrelationMatrix(test_context_train_3, params)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>Extracts the Pearson correlation coefficient for all pairs of numerical variable…

In [22]:
from validmind.tests.data_validation.FeatureTargetCorrelationPlot import FeatureTargetCorrelationPlot

params = {"features": None}

metric = FeatureTargetCorrelationPlot(test_context_train_3, params)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>Generates a visual analysis of correlations between features and target by plott…

In [23]:
from validmind.tests.data_validation.WOEBinTable import WOEBinTable

metric = WOEBinTable(test_context_train_3)
metric.run()
await metric.result.log()
metric.result.show()

Running with breaks_adj: None
Performing binning with breaks_adj: None
[INFO] creating woe binning ...



There are blank strings in 1 columns, which are replaced with NaN. 
 (ColumnNames: emp_length)



VBox(children=(HTML(value="<p>Implements WoE-based automatic binning for features in a dataset and calculates …

In [24]:
params = {
    "breaks_adj": {
        "int_rate": [5,10,15]}  
     }

metric = WOEBinTable(test_context_train_3, params)
metric.run()
await metric.result.log()
metric.result.show()

Running with breaks_adj: {'int_rate': [5, 10, 15]}
Performing binning with breaks_adj: {'int_rate': [5, 10, 15]}
[INFO] creating woe binning ...



There are blank strings in 1 columns, which are replaced with NaN. 
 (ColumnNames: emp_length)



VBox(children=(HTML(value="<p>Implements WoE-based automatic binning for features in a dataset and calculates …

In [25]:
from validmind.tests.data_validation.WOEBinPlots import WOEBinPlots

params = {
    "breaks_adj": {"int_rate": [5,10,15]},
    "fig_height": 500,
}

metric = WOEBinPlots(test_context_train_3, params=params)
metric.run()
await metric.result.log()
metric.result.show()

[INFO] creating woe binning ...



There are blank strings in 1 columns, which are replaced with NaN. 
 (ColumnNames: emp_length)



VBox(children=(HTML(value='<p>Generates a visual analysis of the WoE and IV values distribution for categorica…

In [26]:
from validmind.tests.model_validation.statsmodels.RegressionCoeffsPlot import RegressionCoeffsPlot

test_context_models_fit_2 = TestContext(models = [vm_model_fit_2])

metric = RegressionCoeffsPlot(test_context_models_fit_2)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value="<p>Regression Coefficients with Confidence Intervals Plot</p>\n<p>This class is use…

In [27]:
from validmind.tests.model_validation.statsmodels.RegressionModelsCoeffs import RegressionModelsCoeffs

metric = RegressionModelsCoeffs(test_context_models_fit_2)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>This section shows the coefficients of different regression models that were tra…

In [28]:
from validmind.tests.model_validation.statsmodels.LogRegressionConfusionMatrix import LogRegressionConfusionMatrix

test_context_model_fit_2 = TestContext(model= vm_model_fit_2)

# Configure test parameters
params = {
    "cut_off_threshold": 0.5,
}

metric = LogRegressionConfusionMatrix(test_context_model_fit_2, params)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>A confusion matrix is a table that is used to describe the performance of a clas…

In [29]:
from validmind.tests.model_validation.statsmodels.RegressionROCCurve import RegressionROCCurve

metric = RegressionROCCurve(test_context_model_fit_2)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>A receiver operating characteristic (ROC), or simply ROC curve, is a graphical p…

In [30]:
from validmind.tests.model_validation.statsmodels.GINITable import GINITable

metric = GINITable(test_context_model_fit_2)
metric.run()
await metric.result.log() 
metric.result.show()

Predicted scores obtained...
Computing AUC...
Computing GINI...
Computing AUC...
Computing KS...
Predicted scores obtained...
Computing AUC...
Computing GINI...
Computing AUC...
Computing KS...


VBox(children=(HTML(value='<p>Compute and display the AUC, GINI, and KS for train and test sets.</p>'), HTML(v…

In [31]:
from validmind.tests.model_validation.statsmodels.LogisticRegPredictionHistogram import LogisticRegPredictionHistogram

# Configure test parameters
params = {
    "title": "Histogram of Probability of Default",
}

metric = LogisticRegPredictionHistogram(test_context_model_fit_2, params)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>This metric calculates the probability of default (PD) for each instance in the …

In [32]:
from validmind.tests.model_validation.statsmodels.LogisticRegCumulativeProb import LogisticRegCumulativeProb

# Configure test parameters
params = {
    "title": "Cumulative Probability of Default",
}

metric = LogisticRegCumulativeProb(test_context_model_fit_2, params)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>This metric calculates the cumulative probabilities for each instance in the tra…

In [33]:
from validmind.tests.model_validation.statsmodels.ScorecardHistogram import ScorecardHistogram

# Configure test parameters
params = {
    "target_score": 600,
    "target_odds": 50,
    "pdo": 20,
    "title": "Histogram of Credit Scores",
}

metric = ScorecardHistogram(test_context_model_fit_2, params)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>This metric calculates the credit score for each instance in the training and te…