# **Loan Default Prediction**

This notebook demonstrates **Exploratory Data Analysis (EDA)** on a loan dataset with a **binary target** (`bad_flag`), followed by training a **PyTorch neural network** to predict whether a loan will default.

## **Notebook Outline**
1. **Imports & Setup**
2. **Data Loading and Initial Inspection**
3. **Data Cleaning & Preprocessing**
4. **Exploratory Data Analysis (EDA)**
5. **Feature Selection & Final Preparation**
6. **Train-Test Split & Imbalance Handling**
7. **PyTorch Model Building & Training**
8. **Evaluation & Threshold Tuning**

We'll see both **Markdown explanations** (like this one) and **Code cells** demonstrating each step.

---

In [1]:
# (Cell 1) 1. Imports & Setup
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (confusion_matrix, precision_score, recall_score, 
                             f1_score, roc_auc_score, classification_report)
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

from imblearn.over_sampling import RandomOverSampler

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader

plt.style.use('seaborn')  # optional aesthetics
print("Setup complete.")

Setup complete.


  plt.style.use('seaborn')  # optional aesthetics


## **2. Data Loading and Initial Inspection**
We assume you have a CSV file named `training_loan_data.csv`. Adjust `header` or `skiprows` as needed.

In [3]:
# (Cell 2) 2. Data Loading and Quick Look
df = pd.read_csv('training_loan_data.csv', header=1)  # Adjust path if necessary
print("Data loaded. Shape:", df.shape)
df.head()

Data loaded. Shape: (199121, 23)


Unnamed: 0,id,member_id,loan_amnt,term,int_rate,emp_length,home_ownership,annual_inc,desc,purpose,...,inq_last_6mths,mths_since_recent_inq,revol_util,total_bc_limit,mths_since_last_major_derog,tot_hi_cred_lim,tot_cur_bal,application_approved_flag,internal_score,bad_flag
0,10000001,11983056.0,7550,36 months,16.24%,3 years,RENT,28000.0,,debt_consolidation,...,0.0,17.0,72%,4000.0,,3828.953801,5759.0,1,99,0.0
1,10000002,12002921.0,27050,36 months,10.99%,10+ years,OWN,55000.0,Borrower added on 12/31/13 > Combining high ...,debt_consolidation,...,0.0,8.0,61.20%,35700.0,,34359.94073,114834.0,1,353,0.0
2,10000003,11983096.0,12000,36 months,10.99%,4 years,RENT,60000.0,Borrower added on 12/31/13 > I would like to...,debt_consolidation,...,1.0,3.0,24%,18100.0,,16416.61776,7137.0,1,157,0.0
3,10000004,12003142.0,28000,36 months,7.62%,5 years,MORTGAGE,325000.0,,debt_consolidation,...,1.0,3.0,54.60%,42200.0,,38014.14976,799592.0,1,365,0.0
4,10000005,11993233.0,12000,36 months,13.53%,10+ years,RENT,40000.0,,debt_consolidation,...,0.0,17.0,68.80%,7000.0,53.0,6471.462236,13605.0,1,157,0.0
