# Lifetime PD Model POC

## Introduction

## Setup

## Initialize the client library

ValidMind generates a unique _code snippet_ for each registered model to connect with your developer environment. You initialize the client library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

Get your code snippet:

1. In a browser, log into the [Platform UI](https://app.prod.validmind.ai).

2. In the left sidebar, navigate to **Model Inventory** and click **+ Register new model**.

3. Enter the model details and click **Continue**. ([Need more help?](https://docs.validmind.ai/guide/register-models-in-model-inventory.html))

4. Go to **Getting Started** and click **Copy snippet to clipboard**.

Next, replace this placeholder with your own code snippet:

In [None]:

import validmind as vm

vm.init(
  api_host = "https://api.prod.validmind.ai/api/v1/tracking",
  api_key = "...",
  api_secret = "...",
  project = "..."
)

### Import Libraries

In [None]:
from notebooks.probability_of_default.helpers.Developer import Developer
from notebooks.probability_of_default.helpers.scorecard_tasks import *
from notebooks.probability_of_default.helpers.model_development_tasks import *

### Input Parameters

In [None]:
default_column = 'default'

### Load Credit Risk Scorecard

In [None]:
developer = Developer()
scorecard = developer.load_objects_from_pickle("datasets/scorecard_data_and_models.pkl")

df_train_feateng = scorecard["df_train_feateng"]
df_test_feateng = scorecard["df_test_feateng"]

model_fit_final = scorecard["model_fit_final"]

### Create ValidMind Model

In [None]:
from validmind.vm_models.test_context import TestContext

vm_df_train = vm.init_dataset(
    dataset=df_train_feateng,
    target_column=default_column)
vm_df_test = vm.init_dataset(
    dataset=df_test_feateng,
    target_column=default_column)

vm_model_fit_final = vm.init_model(
    model = model_fit_final,
    train_ds=vm_df_train,
    test_ds=vm_df_test)

test_context_models_fit_final = TestContext(models = [vm_model_fit_final])

## Data Description

### Macroeconomic Data

**Target Variable**

**DRSFRMACBS (Delinquency Rate on Single-Family Residential Mortgages, Booked in Domestic Offices, All Commercial Banks)**: This stands for Delinquency Rate on Single-Family Residential Mortgages, Booked in Domestic Offices, All Commercial Banks. It reflects the percentage of loans that are past due.

Why is the delinquency rate a good target variable for building a lifetime PD and ECL Models? 
  
- Measure of credit risk: Delinquency rate directly captures the proportion of borrowers who are behind on their payments. It's a straightforward and intuitive measure of credit risk.
- Relevance to ECL: ECL requires a forward-looking assessment of credit risk. Delinquencies can provide early warning signals about loans that might eventually result in credit losses, making it directly relevant to ECL modeling.

**Features**

- **GDPC1 (Real Gross Domestic Product)**: Economic downturns, indicated by shrinking GDP, can lead to an increase in loan delinquencies as borrowers may face financial difficulties. A growing economy, on the other hand, may correlate with fewer delinquencies.

- **UNRATE (U.S. Unemployment Rate)**: A rise in unemployment rates usually correlates with an increase in delinquencies. When people lose jobs, they may have difficulty meeting financial obligations, including loan payments.

- **MORTGAGE30US (30-year fixed rate mortgage average)**: The interest rate environment can have an influence on the propensity for delinquencies, especially for adjustable-rate loans. High-interest rates can lead to larger monthly payments, increasing the chances of delinquency for some borrowers.

- **CPIAUCSL (Consumer Price Index for All Urban Consumers)**: Inflation can erode purchasing power, making it more challenging for borrowers to meet their debt obligations.

- **FEDFUNDS (Effective federal funds rate)**: The short-term interest rate can impact borrowing costs. It might indirectly influence delinquency rates, especially if borrowers are sensitive to changes in their loan rates or if they have loans with variable rates.

- **GS3, GS5, GS10 (Treasury constant maturity rates)**: These rates can serve as a proxy for the broader interest rate environment. They can influence both the borrowing cost and the appetite of financial institutions to lend. Fluctuations in these rates can potentially impact delinquency rates.

- **CSUSHPISA (S&P/Case-Shiller U.S. National Home Price Index)**: For mortgage loans, changes in home values can play a significant role. Borrowers are more likely to default on a mortgage if the value of the underlying property falls below the loan amount.

In [None]:
from validmind.tests.data_validation.TimeSeriesOutliers import TimeSeriesOutliers

vm_df = vm.init_dataset(
    dataset=df_macro_micro_raw,
    target_column=macro_to_micro_target_column)

test_context = TestContext(dataset=vm_df)

params = {"zscore_threshold": 3}

metric = TimeSeriesOutliers(test_context, params)
metric.run()
metric.result.log()
metric.result.show()

In [None]:
from validmind.tests.data_validation.TimeSeriesMissingValues import TimeSeriesMissingValues

params = {"min_threshold": 2}

metric = TimeSeriesMissingValues(test_context, params)
metric.run()
metric.result.log()
metric.result.show()

In [None]:
from validmind.tests.data_validation.TimeSeriesFrequency import TimeSeriesFrequency

metric = TimeSeriesFrequency(test_context)
metric.run()
metric.result.log()
metric.result.show()

### GLM Logistic Regression Model

In [None]:
print(model_fit_final.summary())

In [None]:
from validmind.tests.model_validation.statsmodels.RegressionModelsCoeffs import RegressionModelsCoeffs

metric = RegressionModelsCoeffs(test_context_models_fit_final)
metric.run()
metric.result.log()
metric.result.show()

In [None]:
from validmind.tests.model_validation.statsmodels.RegressionCoeffsPlot import RegressionCoeffsPlot

metric = RegressionCoeffsPlot(test_context_models_fit_final)
metric.run()
metric.result.log()
metric.result.show()

## Data Preparation

### Macroeconomic Data

In [None]:
# Remove COVID years to avoid outliers
df_macro_micro_filtered = df_macro_micro_raw[df_macro_micro_raw.index <= '2019-12-31']

# Sample frequencies to Monthly
resampled_df = df_macro_micro_filtered.resample("QS-OCT").last()

# Remove all missing values
nona_df = resampled_df.dropna()

# Take the first different across all variables
preprocessed_df = nona_df.diff().dropna()

In [None]:
vm_df = vm.init_dataset(
    dataset=preprocessed_df,
    target_column=macro_to_micro_target_column)

test_context = TestContext(dataset=vm_df)

params = {"min_threshold": 2}

metric = TimeSeriesMissingValues(test_context, params)
metric.run()
metric.result.log()
metric.result.show()

In [None]:
metric = TimeSeriesFrequency(test_context)
metric.run()
metric.result.log()
metric.result.show()

## Exploratory Data Analysis

In [None]:
from validmind.tests.data_validation.TimeSeriesLinePlot import TimeSeriesLinePlot

metric = TimeSeriesLinePlot(test_context)
metric.run()
metric.result.log()
metric.result.show()

In [None]:
from validmind.tests.data_validation.LaggedCorrelationHeatmap import LaggedCorrelationHeatmap

metric = LaggedCorrelationHeatmap(test_context)
metric.run()
metric.result.log()
metric.result.show()

In [None]:
from validmind.tests.data_validation.EngleGrangerCoint import EngleGrangerCoint

metric = EngleGrangerCoint(test_context)
metric.run()
metric.result.log()
metric.result.show()

## Feature Selection

In [None]:
feature_selection_df = preprocessed_df[macro_to_micro_preliminary_features + macro_to_micro_target_column]

## Model Training 

In [None]:
import statsmodels.api as sm

# Split the data into predictors and target
X = feature_selection_df.drop(columns=macro_to_micro_target_column)
y = feature_selection_df[macro_to_micro_target_column]

# Add a constant to the predictors
X = sm.add_constant(X)

# Fit the OLS model
model = sm.OLS(y, X).fit()

# Print the summary statistics of the regression model
print(model.summary())


In [None]:
final_features = ['UNRATE', 'FEDFUNDS', 'CSUSHPISA']

final_features_df = feature_selection_df[final_features + macro_to_micro_target_column]

# Split the data into predictors and target
X = final_features_df.drop(columns=macro_to_micro_target_column)
y = final_features_df[macro_to_micro_target_column]

# Add a constant to the predictors
X = sm.add_constant(X)

# Fit the OLS model
model = sm.OLS(y, X).fit()

# Print the summary statistics of the regression model
print(model.summary())