# Lifetime PD Model POC

## Introduction

## Setup

### Initialize the client library

In [1]:

import validmind as vm

vm.init(
  api_host = "http://localhost:3000/api/v1/tracking",
  api_key = "2494c3838f48efe590d531bfe225d90b",
  api_secret = "4f692f8161f128414fef542cab2a4e74834c75d01b3a8e088a1834f2afcfe838",
  project = "cllaz74gb067dszy6donpqm98"
)

2023-08-15 12:13:32,143 - INFO(validmind.api_client): Connected to ValidMind. Project: [9] Credit Risk Scorecard - Initial Validation (cllaz74gb067dszy6donpqm98)


### Import Libraries

In [2]:
from notebooks.probability_of_default.helpers.Developer import Developer
from notebooks.probability_of_default.helpers.scorecard_tasks import *
from notebooks.probability_of_default.helpers.model_development_tasks import *

### Input Parameters

In [3]:
default_column = 'default'
macro_to_micro_target_column = 'DRSFRMACBS'
macro_to_micro_feature_columns = ['GS3', 'GS5', 'GS10', 'GDPC1', 'UNRATE', 'MORTGAGE30US', 'CPIAUCSL', 'FEDFUNDS', 'CSUSHPISA']

### Load Macroeconomic Data

In [4]:
from validmind.datasets.regression import fred as fred

df_macro_micro_raw = fred.load_all_data()

# Combining the target column with feature columns
selected_columns = [macro_to_micro_target_column] + macro_to_micro_feature_columns

# Filtering the dataframe to only have the desired columns
df_macro_micro_raw = df_macro_micro_raw[selected_columns]

### Load Credit Risk Scorecard

In [5]:
developer = Developer()
scorecard = developer.load_objects_from_pickle("datasets/scorecard_data_and_models.pkl")

df_train_feateng = scorecard["df_train_feateng"]
df_test_feateng = scorecard["df_test_feateng"]

model_fit_final = scorecard["model_fit_final"]

INFO: Loaded 7 objects from datasets/scorecard_data_and_models.pkl


### Create ValidMind Model

In [6]:
from validmind.vm_models.test_context import TestContext

vm_df_train = vm.init_dataset(dataset=df_train_feateng, target_column=default_column)
vm_df_test = vm.init_dataset(dataset=df_test_feateng, target_column=default_column)

vm_model_fit_final = vm.init_model(
    model = model_fit_final, 
    train_ds=vm_df_train, 
    test_ds=vm_df_test)

test_context_models_fit_final = TestContext(models = [vm_model_fit_final])

2023-08-15 12:13:32,758 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
INFO: Pandas dataset detected. Initializing VM Dataset instance...
2023-08-15 12:13:33,422 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
INFO: Pandas dataset detected. Initializing VM Dataset instance...


## Data Description

### Macroeconomic Data

**Target Variable**

**DRSFRMACBS (Delinquency Rate on Single-Family Residential Mortgages, Booked in Domestic Offices, All Commercial Banks)**: This stands for Delinquency Rate on Single-Family Residential Mortgages, Booked in Domestic Offices, All Commercial Banks. It reflects the percentage of loans that are past due.

Why is the delinquency rate a good target variable for building a lifetime PD and ECL Models? 
  
- Measure of credit risk: Delinquency rate directly captures the proportion of borrowers who are behind on their payments. It's a straightforward and intuitive measure of credit risk.
- Relevance to ECL: ECL requires a forward-looking assessment of credit risk. Delinquencies can provide early warning signals about loans that might eventually result in credit losses, making it directly relevant to ECL modeling.

**Features**

- **GDPC1 (Real Gross Domestic Product)**: Economic downturns, indicated by shrinking GDP, can lead to an increase in loan delinquencies as borrowers may face financial difficulties. A growing economy, on the other hand, may correlate with fewer delinquencies.

- **UNRATE (U.S. Unemployment Rate)**: A rise in unemployment rates usually correlates with an increase in delinquencies. When people lose jobs, they may have difficulty meeting financial obligations, including loan payments.

- **MORTGAGE30US (30-year fixed rate mortgage average)**: The interest rate environment can have an influence on the propensity for delinquencies, especially for adjustable-rate loans. High-interest rates can lead to larger monthly payments, increasing the chances of delinquency for some borrowers.

- **CPIAUCSL (Consumer Price Index for All Urban Consumers)**: Inflation can erode purchasing power, making it more challenging for borrowers to meet their debt obligations.

- **FEDFUNDS (Effective federal funds rate)**: The short-term interest rate can impact borrowing costs. It might indirectly influence delinquency rates, especially if borrowers are sensitive to changes in their loan rates or if they have loans with variable rates.

- **GS3, GS5, GS10 (Treasury constant maturity rates)**: These rates can serve as a proxy for the broader interest rate environment. They can influence both the borrowing cost and the appetite of financial institutions to lend. Fluctuations in these rates can potentially impact delinquency rates.

- **CSUSHPISA (S&P/Case-Shiller U.S. National Home Price Index)**: For mortgage loans, changes in home values can play a significant role. Borrowers are more likely to default on a mortgage if the value of the underlying property falls below the loan amount.

In [7]:
from validmind.tests.data_validation.TimeSeriesOutliers import TimeSeriesOutliers

vm_df = vm.init_dataset(dataset=df_macro_micro_raw, target_column=default_column)
test_context = TestContext(dataset=vm_df)

params = {"zscore_threshold": 3}

metric = TimeSeriesOutliers(test_context, params)
metric.run()
await metric.result.log()
metric.result.show()

2023-08-15 12:13:33,660 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
INFO: Pandas dataset detected. Initializing VM Dataset instance...
INFO: No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
INFO: No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
INFO: No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
INFO: No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
INFO: No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
INFO: N

VBox(children=(HTML(value='\n            <h2>Time Series Outliers ❌</h2>\n            <p>Test that find outlie…

In [8]:
from validmind.tests.data_validation.TimeSeriesMissingValues import TimeSeriesMissingValues

params = {"min_threshold": 2}

metric = TimeSeriesMissingValues(test_context, params)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='\n            <h2>Time Series Missing Values ❌</h2>\n            <p>Test that the n…

In [9]:
from validmind.tests.data_validation.TimeSeriesFrequency import TimeSeriesFrequency

metric = TimeSeriesFrequency(test_context)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='\n            <h2>Time Series Frequency ❌</h2>\n            <p>Test that detects fr…

### GLM Logistic Regression Model

In [10]:
print(model_fit_final.summary())

                 Generalized Linear Model Regression Results                  
Dep. Variable:                default   No. Observations:               109746
Model:                            GLM   Df Residuals:                   109732
Model Family:                Binomial   Df Model:                           13
Link Function:                  Logit   Scale:                          1.0000
Method:                          IRLS   Log-Likelihood:                -47702.
Date:                Tue, 15 Aug 2023   Deviance:                       95403.
Time:                        12:13:50   Pearson chi2:                 1.10e+05
No. Iterations:                     5   Pseudo R-squ. (CS):            0.08209
Covariance Type:            nonrobust                                         
                              coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------------------
const                     

In [11]:
from validmind.tests.model_validation.statsmodels.RegressionModelsCoeffs import RegressionModelsCoeffs

metric = RegressionModelsCoeffs(test_context_models_fit_final)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>This section shows the coefficients of different regression models that were tra…

In [12]:
from validmind.tests.model_validation.statsmodels.RegressionCoeffsPlot import RegressionCoeffsPlot

metric = RegressionCoeffsPlot(test_context_models_fit_final)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value="<p>Regression Coefficients with Confidence Intervals Plot</p>\n<p>This class is use…

## Data Preparation

### Macroeconomic Data

In [13]:
# Sample frequencies to Monthly
resampled_df = df_macro_micro_raw.resample("QS-OCT").last()

# Remove all missing values
nona_df = resampled_df.dropna()

# Take the first different across all variables
preprocessed_df = nona_df.diff().dropna()

In [14]:
vm_df = vm.init_dataset(dataset=preprocessed_df, target_column=default_column)
test_context = TestContext(dataset=vm_df)

params = {"min_threshold": 2}

metric = TimeSeriesMissingValues(test_context, params)
metric.run()
await metric.result.log()
metric.result.show()

2023-08-15 12:13:52,740 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
INFO: Pandas dataset detected. Initializing VM Dataset instance...


VBox(children=(HTML(value='\n            <h2>Time Series Missing Values ✅</h2>\n            <p>Test that the n…

In [15]:
metric = TimeSeriesFrequency(test_context)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='\n            <h2>Time Series Frequency ✅</h2>\n            <p>Test that detects fr…

In [16]:
from validmind.tests.data_validation.TimeSeriesLinePlot import TimeSeriesLinePlot

metric = TimeSeriesLinePlot(test_context)
metric.run()
await metric.result.log()
metric.result.show()

VBox(children=(HTML(value='<p>Generates a visual analysis of time series data by plotting the raw time series.…