# Expected Loss (EL)

**Expected Loss (EL)** represents the average loss a lender can anticipate from a loan portfolio, taking into account the probability of default, exposure at default, and loss given default.  

Mathematically, it is defined as:

$$
\text{EL} = \text{PD} \times \text{EAD} \times \text{LGD}
$$

where:

- $ \text{PD} $ = Probability of Default, the likelihood that a borrower will default  
- $ \text{EAD} $ = Exposure at Default, the amount of money at risk at the time of default  
- $ \text{LGD} $ = Loss Given Default, the proportion of the exposure that is lost after recoveries  

Expected Loss is a **key risk metric** used in credit risk management for setting capital requirements, pricing loans, and portfolio monitoring. It provides a forward-looking estimate of losses and forms the foundation for **regulatory capital calculations** under Basel II framework.


In [36]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()

import statsmodels.api as sm

import pickle

import warnings
warnings.filterwarnings("ignore")

# Import Data

In [37]:
df_input_train = pd.read_csv('data/loan_inputs_train.csv')
df_input_test = pd.read_csv('data/loan_inputs_test.csv')

In [38]:
df_input_train.shape

(400000, 325)

In [39]:
df_input_test.shape

(100000, 325)

In [40]:
df_combined = pd.concat((df_input_train, df_input_test), axis=0) #axis=0 to combine rows

Lets import our models

In [41]:
pd_model = pickle.load(open('data/pd_model.pkl', 'rb'))
ead_model = pickle.load(open('data/ead_model.pkl', 'rb'))
lgd_model = pickle.load(open('data/lgd_model.pkl', 'rb'))

Lets also import feature names (minus reference categories) and drop them from data

In [42]:
with open('dummy_variables_revised.txt') as f:
    all_cols_revised = [line.strip() for line in f] #all columns

with open('ref_variables_revised.txt') as f:
    ref_cols_revised = [line.strip() for line in f] #reference columns (safest baseline variables)

df_all_w_ref_cols = df_combined.loc[:, all_cols_revised]
df_all = df_all_w_ref_cols.drop(ref_cols_revised, axis=1)

# Expected Loss

### PD

In [43]:
df_all_const = sm.add_constant(df_all.astype(float))
y_pred_prob = pd_model.predict(df_all_const)

Note, initially we set Y=1 as good borrower, so `y_pred_prob` indicate probability of not deafult

In [44]:
(1-y_pred_prob).describe()

count    500000.000000
mean          0.013104
std           0.010786
min           0.000775
25%           0.004514
50%           0.010511
75%           0.017893
max           0.180404
dtype: float64

In [45]:
df_final = pd.DataFrame(1-y_pred_prob, columns=['PD'])

### EAD = CCF* Funded amnout

In [46]:
ccf_pred = ead_model.predict(df_all_const)

In [47]:
df_final['EAD'] = ccf_pred* df_combined['funded_amnt']

### LGD = 1 - Recovery rate

In [48]:
recoveries_pred = lgd_model.predict(df_all_const)

In [49]:
df_final['LGD'] = 1 - recoveries_pred

### EL = PD x EAD x LGD

In [50]:
df_final['EL'] = df_final['PD']* df_final['EAD']* df_final['LGD']

In [51]:
df_final.head()

Unnamed: 0,PD,EAD,LGD,EL
0,0.005847,18546.578649,0.980948,106.372351
1,0.015987,26610.680578,0.984688,418.904272
2,0.003065,15729.986968,0.981235,47.314991
3,0.003863,21811.049065,0.982086,82.745341
4,0.007478,27438.688331,0.983234,201.740027


Lets look at total portfoilo amount

In [52]:
df_combined['funded_amnt'].sum() / 1e9 #Thats $8B

np.float64(8.003335325)

In [53]:
df_final['EL'].sum() / 1e6

np.float64(92.29046545685813)

In [54]:
(df_final['EL'].sum() / df_combined['funded_amnt'].sum() ) *100

np.float64(1.1531500519361049)

## Portfolio Overview

The total portfolio size (or reserved capital) is approximately $8B. According to our risk model, the estimated credit exposure is $92M, representing about 1.1% of the total portfolio.


# Credit Risk Modeling Project Summary

This project focuses on analyzing and modeling credit risk for a loan portfolio using key risk metrics: Probability of Default (PD), Exposure at Default (EAD), Loss Given Default (LGD), and Expected Loss (EL).

## Key Steps

1. **Data Exploration and Preprocessing**
   - Performed EDA to understand feature distributions, missing values, and correlations.
   - Identified and handled outliers, missing data, and categorical variables.

2. **Probability of Default (PD)**
   - Target variable: `good_bad` (binary default indicator)
   - Features binned using **fine classing** and aggregated into **coarse classes**.
   - Weight of Evidence (WoE) and Information Value (IV) computed for variable selection.
   - Logistic regression applied to model PD.

3. **Loss Given Default (LGD)**
   - Computed as the proportion of loan not recovered after default:  
     $$
     \text{Recovery Rate} = \frac{\text{funded\_amnt} - \text{total\_prcp\_recieved}}{\text{funded\_amnt}}
     $$
   - Beta regression used to model continuous recovery rates between 0 and 1.

4. **Exposure at Default (EAD)**
   - Measured using the **Credit Conversion Factor (CCF)**:  
     $$
     \text{CCF} = \frac{\text{Recoveries} }{\text{Funded Amount}}
     $$
   - Modeled via beta regression due to its continuous range $(0,1)$.

5. **Expected Loss (EL)**
   - Computed as:
     $$
     \text{EL} = \text{PD} \times \text{EAD} \times \text{LGD}
     $$
   - Provides the forward-looking estimate of portfolio loss and informs capital allocation.

6. **Portfolio Insights**
   - Total portfolio/reserved capital: ~$8B
   - Estimated credit exposure: $92M (â‰ˆ1.1% of portfolio)
   - Highlights areas of potential risk concentration and informs mitigation strategies.

## Conclusion

The project demonstrates a systematic approach to credit risk assessment, combining statistical modeling, WoE/IV analysis, and beta regression for continuous targets, producing interpretable metrics (PD, LGD, EAD, EL) for effective risk management and capital planning.
