# 1. Business Understanding

Financial institutions face significant challenges due to customer loan defaults, which impact
profitability and risk exposure. Predicting whether a borrower is likely to default on a loan is
essential for risk management, credit scoring, and decision-making.


This project focuses on building an automated loan default prediction system using machine
learning techniques to assess the probability

# 2. Importing Libraries

In [2]:
!pip install --upgrade pip
!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install seaborn
!pip install scikit-learn



In [3]:
# pandas (pd) provides DataFrame data structures and data manipulation utilities.
import pandas as pd  # pd.DataFrame, pd.read_csv, pd.to_datetime, etc.

# numpy (np) provides numerical operations, random number generation, and arrays.
import numpy as np  # np.array, np.random, arithmetic operations

# matplotlib.pyplot for plotting (low-level plotting API). We use it for fine control.
import matplotlib.pyplot as plt  # plt.figure, plt.plot, plt.show

# seaborn is a statistical plotting library built on matplotlib; it provides high-level plot functions.
import seaborn as sns  # sns.histplot, sns.boxplot, sns.heatmap, sns.scatterplot

%matplotlib inline
sns.set(font_scale=1)
pd.set_option ('display.max_columns', None)
pd.set_option ('display.max_rows', None)

#import warnings to surpress warnings

import warnings
warnings.filterwarnings("ignore")

# 3. EDA

## 3.1 Data Understanding

In [4]:
# Load dataset from CSV into a pandas DataFrame.
df = pd.read_csv("loan_default_sample.csv")

# Show the number of rows and columns (shape): (rows, columns)
print('Shape:', df.shape)

# Display the first few rows for a quick sanity check. display() is Jupyter-friendly.
display(df.head(5))

# Show data types for each column so we can detect incorrect types (e.g., strings representing dates)
print('\nData types:') 
print(df.dtypes)

Shape: (500, 14)


Unnamed: 0,loan_id,age,annual_income,employment_length,home_ownership,purpose,loan_amount,term_months,interest_rate,dti,credit_score,delinquency_2yrs,num_open_acc,target_default
0,L100000,24,48586.06,2,MORTGAGE,other,18943.19,60,13.31,17.75,746.0,3,7,0
1,L100001,55,23634.07,17,MORTGAGE,debt_consolidation,15802.09,36,14.34,13.33,713.0,1,14,0
2,L100002,49,27994.32,6,OWN,other,17309.6,36,8.37,18.95,601.0,3,12,1
3,L100003,40,81938.71,17,MORTGAGE,debt_consolidation,20443.27,36,12.15,20.54,717.0,3,12,0
4,L100004,40,30688.66,20,RENT,other,9333.6,36,5.0,11.73,552.0,4,1,1



Data types:
loan_id               object
age                    int64
annual_income        float64
employment_length      int64
home_ownership        object
purpose               object
loan_amount          float64
term_months            int64
interest_rate        float64
dti                  float64
credit_score         float64
delinquency_2yrs       int64
num_open_acc           int64
target_default         int64
dtype: object


In [5]:
# Show the descriptive statistics
df.describe ()

Unnamed: 0,age,annual_income,employment_length,loan_amount,term_months,interest_rate,dti,credit_score,delinquency_2yrs,num_open_acc,target_default
count,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0
mean,42.456,49372.77744,14.29,13913.15824,43.008,12.28892,18.19742,679.894,2.0,7.612,0.216
std,12.58367,15221.358836,8.727786,6834.649042,10.923304,3.917982,7.77403,59.541255,1.401402,4.055904,0.411926
min,21.0,15000.0,0.0,1000.0,36.0,5.0,0.0,522.0,0.0,1.0,0.0
25%,32.0,38808.9075,7.0,9278.4075,36.0,9.41,12.8125,636.0,1.0,4.0,0.0
50%,42.0,49765.93,14.0,14000.12,36.0,12.155,18.13,677.5,2.0,8.0,0.0
75%,53.25,59155.7775,22.0,18607.7825,60.0,15.1425,23.5025,719.0,3.0,11.0,0.0
max,64.0,93576.01,29.0,35399.71,60.0,24.24,42.71,850.0,4.0,14.0,1.0


## 3.2 Missing Values

## 3.3 Null Values

## 3.4 Numerical Values

## 3.5 Categorical Values