
# Silverstone Mortgages ‚Äì Loan Approval with Random Forest üå≤üè°

This notebook demonstrates **Random Forest classification** through a realistic underwriting story at **Silverstone Mortgages**.

**Objective:**  
Predict whether a mortgage application should be **APPROVED (1)** or **DENIED (0)**, especially for *edge cases* like Leo.

This notebook follows a clear ML workflow:

1. Installation  
2. Dataset Preparation  
3. Descriptive Analysis  
4. Train‚ÄìTest Split  
5. Model Building  
6. Evaluation  
7. Edge-case Explanation (Leo‚Äôs case)



## 1. Installation


In [1]:

# Uncomment if running outside Colab
# !pip install numpy pandas matplotlib scikit-learn



## 2. Imports & Setup


In [2]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

np.random.seed(42)
%matplotlib inline



## 3. Dataset Preparation
Synthetic mortgage data inspired by real underwriting logic.


In [3]:

n = 30000

income = np.clip(np.random.lognormal(np.log(80000), 0.5, n), 30000, 400000)
down_pct = np.random.uniform(5, 40, n)
credit = np.clip(np.random.normal(700, 60, n), 500, 850)
dti = np.random.uniform(0.1, 0.6, n)
employment = np.clip(np.random.exponential(5, n), 0, 25)

bankruptcy = (np.random.rand(n) < 0.15).astype(int)
yrs_bankruptcy = np.zeros(n)
yrs_bankruptcy[bankruptcy == 1] = np.random.uniform(0, 15, (bankruptcy == 1).sum())

loan_amount = 3.5 * income * (1 - down_pct / 100)

score = (
    -6
    + 0.00002 * income
    + 0.08 * employment
    - 4 * (dti - 0.3)
    + 0.1 * (credit - 650) / 10
    - 2 * bankruptcy
    + 0.15 * yrs_bankruptcy
)

prob = 1 / (1 + np.exp(-score))
approved = np.random.binomial(1, prob)

df = pd.DataFrame({
    "income": income,
    "down_payment_pct": down_pct,
    "credit_score": credit,
    "dti": dti,
    "employment_years": employment,
    "bankruptcy_flag": bankruptcy,
    "years_since_bankruptcy": yrs_bankruptcy,
    "loan_amount": loan_amount,
    "approved": approved
})

df.head()


Unnamed: 0,income,down_payment_pct,credit_score,dti,employment_years,bankruptcy_flag,years_since_bankruptcy,loan_amount,approved
0,102553.407451,34.115402,802.337142,0.174907,0.045764,0,0.0,236484.152547,0
1,74656.267885,35.857531,562.779709,0.275154,1.774899,1,14.986143,167602.306069,0
2,110594.560233,20.163765,754.843405,0.358741,3.119418,1,2.546343,309030.866166,0
3,171321.44083,18.114081,781.850336,0.107819,1.564187,0,0.0,491008.478582,0
4,71161.358067,26.444463,790.386478,0.58041,0.490613,0,0.0,183200.916357,0



## 4. Descriptive Analysis


In [4]:

df.describe()


Unnamed: 0,income,down_payment_pct,credit_score,dti,employment_years,bankruptcy_flag,years_since_bankruptcy,loan_amount,approved
count,30000.0,30000.0,30000.0,30000.0,30000.0,30000.0,30000.0,30000.0,30000.0
mean,90682.916319,22.489324,700.165216,0.349703,5.027159,0.146933,1.093295,246040.9,0.059867
std,47716.313513,10.098168,59.894934,0.144683,4.876743,0.354045,3.116717,134371.1,0.237244
min,30000.0,5.000295,500.0,0.100014,3.3e-05,0.0,0.0,63142.26,0.0
25%,57021.416615,13.796068,659.750316,0.223904,1.463723,0.0,0.0,151464.9,0.0
50%,80074.953839,22.492889,700.476827,0.34929,3.506032,0.0,0.0,215321.1,0.0
75%,112109.500718,31.301318,740.946367,0.475256,6.990048,0.0,0.0,304848.8,0.0
max,400000.0,39.999025,850.0,0.599961,25.0,1.0,14.999667,1314557.0,1.0



## 5. Train‚ÄìTest Split


In [5]:

X = df.drop("approved", axis=1)
y = df["approved"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42, stratify=y
)



## 6. Model Building
### 6.1 Single Decision Tree


In [12]:
dt = DecisionTreeClassifier(max_depth=4, random_state=42)
dt.fit(X_train.values, y_train)


### 6.2 Random Forest (Council of Trees)


In [13]:
rf = RandomForestClassifier(n_estimators=100, max_depth=8, random_state=42, n_jobs=-1)
rf.fit(X_train.values, y_train)


## 7. Model Evaluation


In [8]:

for name, model in [("Decision Tree", dt), ("Random Forest", rf)]:
    pred = model.predict(X_test)
    print(f"{name} Accuracy:", accuracy_score(y_test, pred))
    print(classification_report(y_test, pred))


Decision Tree Accuracy: 0.9422666666666667
              precision    recall  f1-score   support

           0       0.95      0.99      0.97      7051
           1       0.58      0.12      0.21       449

    accuracy                           0.94      7500
   macro avg       0.77      0.56      0.59      7500
weighted avg       0.93      0.94      0.92      7500

Random Forest Accuracy: 0.9434666666666667
              precision    recall  f1-score   support

           0       0.95      1.00      0.97      7051
           1       0.66      0.11      0.19       449

    accuracy                           0.94      7500
   macro avg       0.80      0.55      0.58      7500
weighted avg       0.93      0.94      0.92      7500




## 8. Leo‚Äôs Edge Case


In [14]:
leo = pd.DataFrame({
    "income": [200000],
    "down_payment_pct": [30],
    "credit_score": [650],
    "dti": [0.2],
    "employment_years": [7],
    "bankruptcy_flag": [1],
    "years_since_bankruptcy": [4],
    "loan_amount": [600000]
})

print("Decision Tree:", "APPROVE" if dt.predict(leo.values)[0] else "DENY")
print("Random Forest:", "APPROVE" if rf.predict(leo.values)[0] else "DENY")

votes = sum(tree.predict(leo.values)[0] for tree in rf.estimators_)
print(f"Votes to APPROVE: {votes} / {len(rf.estimators_)}")

Decision Tree: DENY
Random Forest: DENY
Votes to APPROVE: 14.0 / 100


## 9. New Edge Case: DT Denies, RF Approves?

In [None]:
leo_edge_case = pd.DataFrame({
    "income": [200000],
    "down_payment_pct": [20],
    "credit_score": [785],
    "dti": [0.2],
    "employment_years": [15],
    "bankruptcy_flag": [0],
    "years_since_bankruptcy": [0],
    "loan_amount": [200000 * 3.5 * (1 - 20/100)] # Consistent with data generation
})

print("--- Leo's Edge Case (new) ---")
print("Decision Tree:", "APPROVE" if dt.predict(leo_edge_case.values)[0] else "DENY")

votes_edge = sum(tree.predict(leo_edge_case.values)[0] for tree in rf.estimators_)
print(f"Random Forest (Votes to APPROVE): {votes_edge} / {len(rf.estimators_)}")

# For simplicity, let's say Random Forest approves if more than 50% of trees vote 'approve'
rf_decision = "APPROVE" if votes_edge / len(rf.estimators_) > 0.5 else "DENY"
print(f"Random Forest (Overall Decision): {rf_decision}")

--- Leo's Edge Case (new) ---
Decision Tree: DENY
Random Forest (Votes to APPROVE): 51.0 / 100
Random Forest (Overall Decision): APPROVE
