#  Credit Analysis Mini Project

##  Objective

 The objective of this project is to analyze credit-related customer data to understand risk patterns and 
 build machine learning models for credit prediction and decision support.

##  Dataset Description

 1) The dataset includes credit and customer-related features such as:
 2) Customer demographic details
 3) Financial and credit attributes
 4) Payment / balance related fields
 5) Derived risk indicators
 6) Target credit outcome label

##  Models Used

 1) Logistic Regression
 2) KNN
 3) Naive Bayes

##  Evaluation Metrics

 1) Accuracy
 2) Precision and Recall

## Importing Libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder,StandardScaler
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.metrics import accuracy_score,precision_score,recall_score

## Data Loading

In [None]:
data=pd.read_csv("loan_approval_data.csv")

In [None]:
data.info()

## Data Cleaning

In [None]:
categorical_col= data.select_dtypes(include=["object"]).columns
numeric_col=data.select_dtypes(include=["float64"]).columns

In [None]:
from sklearn.impute import SimpleImputer

In [None]:
#filling null values using imputer
mean_fill= SimpleImputer(strategy="mean")
mode_fill=SimpleImputer(strategy="most_frequent")

In [None]:
data[categorical_col]=mode_fill.fit_transform(data[categorical_col])
data[numeric_col]=mean_fill.fit_transform(data[numeric_col])

## Feature Engineering

In [None]:
data=data.drop("Applicant_ID",axis=1)

In [None]:
data.head()

In [None]:
one_hot=["Employment_Status","Marital_Status","Loan_Purpose","Gender","Employer_Category"]
label=["Property_Area","Education_Level","Loan_Approved"]

In [None]:
data=pd.get_dummies(data,columns=one_hot,drop_first=True,dtype=int)

In [None]:
label_model=LabelEncoder()

In [None]:
for i in label:
    data[i]=label_model.fit_transform(data[i])

## Data Exploration

In [None]:
nums=data.select_dtypes(include="number")
matrix =nums.corr()

In [None]:
import seaborn as sns 
import matplotlib.pyplot as plt

In [None]:
#correlation heatmap
plt.figure(figsize=(18,10))
sns.heatmap(matrix,cmap="plasma",annot=True,fmt=".2f")


## Data Preprocessing

In [None]:
#we can see in loan approved section two factors are highlighting that are credit score and dti ratio .... so we will increase their weight
data["DTI_Ratio"]=data["DTI_Ratio"]**2
data["Credit_Score"]=data["Credit_Score"]**2

In [None]:
X=data.drop(["Loan_Approved"],axis=1)
Y=data["Loan_Approved"]

## Train-Test-Split

In [None]:
x_train,x_test,y_train,y_test=train_test_split(X,Y,test_size=0.2,random_state=42)

## Feature Scaling

In [None]:
scaler=StandardScaler()
x_train=scaler.fit_transform(x_train)
x_test=scaler.transform(x_test)

## Models Training and Evaluation

In [None]:
#Logistic Regression -Best
from sklearn.linear_model import LogisticRegression

lr_model=LogisticRegression()
lr_model.fit(x_train,y_train)
y_predict=lr_model.predict(x_test)
print("Accuracy score : ",accuracy_score(y_test,y_predict))
print("Precision score : ",precision_score(y_test,y_predict))
print("Recall score : ",recall_score(y_test,y_predict))

In [None]:
#KNN
from sklearn.neighbors import KNeighborsClassifier

knn_model=GridSearchCV(KNeighborsClassifier(),{"n_neighbors":[2,3,4,5,6,7,8,9,10,11,12,13,14]},scoring="precision")

knn_model.fit(x_train,y_train)
y_predict=knn_model.predict(x_test)
print("Accuracy score : ",accuracy_score(y_test,y_predict))
print("Precision score : ",precision_score(y_test,y_predict))
print("Recall score : ",recall_score(y_test,y_predict))

In [None]:
#Naive Bayes
from sklearn.naive_bayes import GaussianNB

nb_model=GaussianNB()
nb_model.fit(x_train,y_train)
y_predict=nb_model.predict(x_test)
print("Accuracy score : ",accuracy_score(y_test,y_predict))
print("Precision score : ",precision_score(y_test,y_predict))
print("Recall score : ",recall_score(y_test,y_predict))

##  Conclusion

Three models were trained and compared on the credit dataset: Logistic Regression, KNN, and Naive Bayes. Based on the evaluation metrics," LOGISTIC REGRESSION " performed the best, achieving the highest accuracy (~ 0.88) along with strong precision and recall values. Naive Bayes showed very similar performance (~ 0.87 accuracy), while KNN performed comparatively lower.

Therefore, Logistic Regression is the most suitable model for this credit prediction task on the given dataset, providing the best overall balance of performance metrics.
