# Loan Default Prediction using Deep Learning
In this notebook, we will:
1. Perform data preprocessing and feature transformation.
2. Conduct exploratory data analysis (EDA).
3. Perform additional feature engineering.
4. Build a deep learning model using Keras.
5. Model Evaluation
6. Use hyperparameter tuning and cross-validation to optimize the model.

### Step 1: Importing Libraries and Loading Data


In [4]:
!pip install pandas
!pip install numpy
!pip install sklearn

Defaulting to user installation because normal site-packages is not writeable


In [2]:
# Step 1: Importing Libraries and Loading Data
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, RandomizedSearchCV, StratifiedKFold
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.utils.class_weight import compute_class_weight
from scikeras.wrappers import KerasClassifier
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import matplotlib.pyplot as plt
import seaborn as sns
import warnings # - v1.1
warnings.filterwarnings('ignore') # - v1.1

# Load dataset
df = pd.read_csv('loan_data.csv')
df.head()

ModuleNotFoundError: No module named 'sklearn'

### Step 2: EDA - Data Preprocessing and Feature Transformation

In [None]:
# Step 2.0: Data Preprocessing and Feature Transformation
# Transform categorical values into numerical values (discrete)
label_encoder = LabelEncoder()
df['purpose'] = label_encoder.fit_transform(df['purpose'])

# Check for missing values
df.isnull().sum()

In [None]:
# Step 2.1: Exploratory Data Analysis (EDA)
# Visualize class imbalance
sns.countplot(x='credit.policy', data=df)
plt.show()

# Correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.show()

### Step 3: Additional Feature Engineering

In [None]:
# Step 3: Additional Feature Engineering
# Scale numerical features
scaler = StandardScaler()
numerical_features = ['installment', 'log.annual.inc', 'dti', 'fico', 'days.with.cr.line', 'revol.bal', 'revol.util', 'inq.last.6mths', 'delinq.2yrs', 'pub.rec']
df[numerical_features] = scaler.fit_transform(df[numerical_features])

# Split the data into features and target
X = df.drop('credit.policy', axis=1)
y = df['credit.policy']

# Split into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Step 4: Build the Deep Learning Model

In [None]:
# Step 4: Build the Deep Learning Model
def create_model(optimizer='adam', activation='relu'):
    model = Sequential()
    model.add(Dense(128, input_shape=(X_train.shape[1],), activation=activation))
    model.add(Dense(64, activation=activation))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return model

# Initialize the KerasClassifier
model = KerasClassifier(model=create_model, verbose=1)

# Train the model
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))

### Step 5: Model Evaluation 

In [None]:
# Evaluate the model
accuracy = model.score(X_test, y_test)
print(f'Test Accuracy: {accuracy}')

In [None]:
# !pip install keras-tuner
# !pip install scikeras

### Step 6: Model Tuning, Optimization Hyperparameter Tuning and Cross-Validation

In [None]:
# Step 6.1: Hyperparameter Tuning
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import RandomizedSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input

# Function to create model, with hyperparameters as arguments
def create_model(optimizer='adam', activation='relu'):
    model = Sequential()
    # Use an Input layer
    model.add(Input(shape=(X_train.shape[1],)))
    model.add(Dense(128, activation=activation))
    model.add(Dense(64, activation=activation))
    model.add(Dense(1, activation='sigmoid'))
    
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return model

# Wrap Keras model with KerasClassifier using scikeras
model = KerasClassifier(model=create_model, verbose=0)

# Define the grid of hyperparameters, using 'model__' prefix for model-specific parameters
param_grid = {
    'batch_size': [16, 32, 64],
    'epochs': [10, 20],
    'model__optimizer': ['adam', 'rmsprop'],
    'model__activation': ['relu', 'tanh']
}

# Perform Randomized Search with Cross-Validation
print("Starting Randomized Search for Hyperparameter Tuning...")
random_search = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=10, cv=3, verbose=2)
random_search_result = random_search.fit(X_train, y_train)

# Best parameters and model
best_params = random_search_result.best_params_
best_model = random_search_result.best_estimator_

print("Hyperparameter Tuning Complete!")
print("Best Parameters:", best_params)


In [None]:
# Step 6.2: Hyperparameter Tuning
# Define the grid of hyperparameters, using 'model__' prefix for model-specific parameters
param_grid = {
    'batch_size': [16, 32, 64],
    'epochs': [10, 20],
    'model__optimizer': ['adam', 'rmsprop'],
    'model__activation': ['relu', 'tanh']
}

# Perform Randomized Search with Cross-Validation
random_search = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=10, cv=3, verbose=1)
random_search_result = random_search.fit(X_train, y_train)

# Best parameters and model
best_params = random_search_result.best_params_
best_model = random_search_result.best_estimator_

print('Best Parameters:', best_params)

In [None]:
# Step 6.3: Cross-Validation
# Convert to numpy arrays if not already
X_train_array = X_train.to_numpy() if isinstance(X_train, pd.DataFrame) else X_train
y_train_array = y_train.to_numpy() if isinstance(y_train, pd.Series) else y_train

# Define StratifiedKFold
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Perform cross-validation
results = []
for train_idx, test_idx in kfold.split(X_train_array, y_train_array):
    history = model.fit(X_train_array[train_idx], y_train_array[train_idx])
    score = model.score(X_train_array[test_idx], y_train_array[test_idx])
    results.append(score)

print(f'Cross-Validation Scores: {results}')
print(f'Mean Accuracy: {np.mean(results)}')

### Summary and Conclusion
In this project, a deep learning model to predict loan defaults using historical data. The steps involved data preprocessing, exploratory data analysis, feature engineering, model building, hyperparameter tuning, and cross-validation. 
1. Data Preprocessing: Encoding categorical variables and scaling numerical features to prepare the data for modeling.
2. Exploratory Data Analysis: Understanding the data distribution and relationships between features.
3. Feature Engineering: Ensuring that all features are scaled appropriately for model training.
4. Model Building: Using Keras to build a neural network capable of capturing complex patterns in the data.
5. Hyperparameter Tuning: Optimizing the model’s hyperparameters to improve its performance.
6. Cross-Validation: Validating the mode

Following these steps, a deep learning model that can predict the likelihood of a loan default. The use of hyperparameter tuning and cross-validation ensures that the model is well-optimized and generalizes effectively. This approach provides a foundation for building predictive models in financial domains, where accurate risk assessment is crucial.

Implementing a loan default prediction model in a real-world scenario involves more than just building this model. It requires planning around data collection, deployment, integration, monitoring, and compliance. By automating the data pipeline and decision-making processes, and incorporating continuous monitoring and retraining, the model can provide significant value to financial institutions in managing risk and making informed lending decisions. This approach can help streamline the loan approval process, reduce default rates, and ultimately improve the profitability and efficiency of the lending operations.