**Loan Eligibility Prediction using Machine Learning**

The data set consists of 2000 samples from each of two categories. Five variables are:


1.   Income
2.   Age
1.   Loan
2.   Credit Score
1.   Approved































In [1]:
# Step 1: Import Libraries
import pandas as pd

In [5]:
# Step 2: Load Dataset
df = pd.read_csv('https://raw.githubusercontent.com/vishalbairwapyd/Internship_project/main/loan_approval.csv')

In [6]:
print(df.head())

   Income  Age   Loan  CreditScore  Approved
0   76422   60  14865          666         1
1   35795   51  10569          512         0
2   20860   43  18823          836         0
3   58158   58  28756          531         0
4   74343   20  25114          539         0


In [10]:
# Step 3: Dataset Information
print("\nDataset Info:")
print(df.info())


Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   Income       2000 non-null   int64
 1   Age          2000 non-null   int64
 2   Loan         2000 non-null   int64
 3   CreditScore  2000 non-null   int64
 4   Approved     2000 non-null   int64
dtypes: int64(5)
memory usage: 78.3 KB
None


In [11]:
print("\nDataset Description:")
print(df.describe())


Dataset Description:
             Income          Age          Loan  CreditScore     Approved
count   2000.000000  2000.000000   2000.000000  2000.000000  2000.000000
mean   49810.707000    41.100000  15670.275000   673.714000     0.427000
std    17582.257252    13.538931   8406.414513   100.875092     0.494766
min    20009.000000    18.000000   1024.000000   500.000000     0.000000
25%    34394.750000    29.000000   8316.250000   589.000000     0.000000
50%    49254.500000    41.000000  15771.500000   675.000000     0.000000
75%    65094.750000    53.000000  23092.250000   763.250000     1.000000
max    79965.000000    64.000000  29985.000000   849.000000     1.000000


In [12]:
print("\nClass Distribution:")
print(df['Approved'].value_counts())


Class Distribution:
Approved
0    1146
1     854
Name: count, dtype: int64


In [13]:
# Step 4: Split Features and Target
X = df.drop('Approved', axis=1)
y = df['Approved']

In [15]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, random_state=42)

In [16]:
print("\nTrain/Test Shape:")
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)


Train/Test Shape:
(1400, 4) (600, 4) (1400,) (600,)


In [17]:
from sklearn.linear_model import LogisticRegression
# Step 5: Select and Train Model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

In [18]:
print("\nModel Coefficients:")
print(model.coef_)
print("Intercept:", model.intercept_)


Model Coefficients:
[[ 1.06073057e-04  6.57827749e-03 -7.25945144e-05  3.04318314e-02]]
Intercept: [-25.77985968]


In [19]:
# Step 6: Predictions
y_pred = model.predict(X_test)
print("\nSample Predictions:")
print(y_pred[:20])


Sample Predictions:
[1 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 1 1 1 1]


In [20]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
# Step 7: Model Evaluation
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))


Confusion Matrix:
[[312  35]
 [ 50 203]]


In [21]:
print("\nAccuracy Score:")
print(accuracy_score(y_test, y_pred))


Accuracy Score:
0.8583333333333333


In [22]:
print("\nClassification Report:")
print(classification_report(y_test, y_pred))


Classification Report:
              precision    recall  f1-score   support

           0       0.86      0.90      0.88       347
           1       0.85      0.80      0.83       253

    accuracy                           0.86       600
   macro avg       0.86      0.85      0.85       600
weighted avg       0.86      0.86      0.86       600



📊 Classification Report Explained

Our target variable (loan approval) has 2 classes:


*   0 → Loan Not Approved
*   1 → Loan Approved

The report shows precision, recall, f1-score, and support for each class.



*   **Class 0 (Loan Not Approved)**


*   Precision = 0.86
    Out of all the predictions made as "Loan Not Approved," 86% were correct.

*  Recall = 0.90
   Out of all the actual "Loan Not Approved" cases, the model correctly found 90%.



*  F1-score = 0.88
   Balance between precision and recall → very good performance.


*   **Class 1 (Loan Approved)**


*   Precision = 0.85
    Out of all predictions made as "Loan Approved," 85% were correct.

*  Recall = 0.80
   Out of all actual "Loan Approved" cases, the model detected 80%.
  (This is a bit lower → means your model missed some approved loans.)



*  F1-score = 0.83
   Balanced performance, slightly weaker than Class 0.













**Overall Performance**


*   Accuracy = 0.86 (86%)
    Out of all predictions, 86% were correct.

*   Macro Avg (0.85)
    Simple average of precision/recall/F1 across both classes.


*   Weighted Avg (0.86)
    Average weighted by number of samples (since class 0 has 347 and class 1 has 253 cases).



