# Intelligent Loan Approval Prediction System ML Models
The system is designed to predict the likelihood of loan approval by analyzing historical data containing applicants’ financial and personal attributes. By employing advanced ML algorithms, the model can capture complex patterns and relationships between features, enabling more informed and objective predictions.

In this project, multiple machine learning models are explored and compared to identify the most effective approach for predicting loan approvals. The selected models include:

- Logistic Regression
- Random Forest
- XGBoost
- LightGBM
- Support Vector Machine (SVM)
- Neural Network

By training and evaluating these models on historical loan datasets, the system aims to provide a reliable, data-driven mechanism for loan approval prediction. This approach not only helps financial institutions minimize default risk but also ensures a fair and consistent evaluation process for applicants.

# Import Libraries

In [1]:
# records and calculations
import pandas as pd
import numpy as np

# visualizations
import matplotlib.pyplot as plt
import seaborn as sns

# core ML
from sklearn.model_selection import train_test_split, StratifiedKFold, cross_val_score, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import roc_auc_score, average_precision_score, f1_score, precision_recall_curve, confusion_matrix
from sklearn.metrics import classification_report, roc_curve, auc, accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.calibration import calibration_curve

# gradient boosting model
import lightgbm as lgb
import xgboost as xgb

# neural network
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam
from scikeras.wrappers import KerasClassifier

# optimization
import optuna
from skopt import BayesSearchCV
from skopt.space import Real, Integer, Categorical

# Interpretability
import shap

# Fairness (optional)
from fairlearn.metrics import demographic_parity_difference, equalized_odds_difference

# Persistence
import joblib
import pickle

#  Set seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# avoid minor warnings
import warnings
warnings.filterwarnings('ignore')

# read file path
from pathlib import Path

  from .autonotebook import tqdm as notebook_tqdm


# Load and Prepare Datset for Modeling 

## Load Dataset

In [2]:
# reading file path
file = Path(r"../data/CleanedLoanData.csv")

# reading csv file
df = pd.read_csv(file)

In [3]:
df.head()

Unnamed: 0,Dependents,LoanAmount,CreditHistory,LoanStatus,TotalIncome,IncomeLoanRatio,LoanTermYears,Has_CoApplicantIncome,Gender_Male,Married_Yes,Education_Not Graduate,PropertyArea_Semiurban,PropertyArea_Urban,SelfEmployed_Yes
0,0,146.369492,1,1,5849.0,25.024704,30.0,0,True,False,False,False,True,False
1,1,128.0,1,0,6091.0,21.014612,30.0,1,True,True,False,False,False,False
2,0,66.0,1,1,3000.0,22.0,30.0,0,True,True,False,False,True,True
3,0,120.0,1,1,4941.0,24.286582,30.0,1,True,True,True,False,True,False
4,0,141.0,1,1,6000.0,23.5,30.0,0,True,False,False,False,True,False


## Split the dataset into input features (x) and target variable (y)

In [4]:
# input features
x = df.drop(columns=['LoanStatus'],  axis=1)

# target variable
y = df['LoanStatus']

## split dataset into training and test and prepare for other models

In [5]:
x_train_full, x_test, y_train_full, y_test = train_test_split(x, y, test_size=0.2, stratify=y, random_state=42)
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scoring_metric = 'roc_auc'

In [6]:
# Initialize storages
all_results = {}
model_objects = {}
probability_predictions = {}
test_predictions = {}

# Advanced Modeling

# Logistic Regression

## Hyperparameter Tuning

In [7]:
# hyperparameter tuning for logistic regression
lr_param_grid = {
    'C' : [0.1, 1, 10, 100],
    'penalty': ['l2', 'l1'],
    'solver': ['liblinear'],
    'class_weight': ['balanced']
}

## GridSearch Optimization

In [8]:
# create empty model 
lr = LogisticRegression(random_state=42, max_iter=1000)

# GridSearch Optimization
lr_grid = GridSearchCV(
    lr,
    lr_param_grid,
    cv=cv,
    scoring=scoring_metric,
    n_jobs=-1
)

## Train model with Hyperparameter Optimization

In [9]:
lr_grid.fit(x_train_full, y_train_full)
model_objects['Logistic Regression'] = lr_grid.best_estimator_