# 🏦 AllLife Bank: Personal Loan Prediction

## 🎯 Objective
Predict whether a liability customer will buy a personal loan and identify the key customer attributes influencing this decision.



## 📘 Problem Context

AllLife Bank is looking to expand its loan customer base. A previous campaign converted over 9% of depositors to loan customers. You, as a data scientist, must identify potential buyers and guide marketing efforts for higher ROI campaigns.

### 🧠 Problem Statement

To identify potential personal loan buyers among existing depositors using classification models.

### 🔍 Data Dictionary
- **ID**: Customer ID
- **Age**: Customer age in years
- **Experience**: Years of professional experience
- **Income**: Annual income (in $1000s)
- **ZIP Code**: Customer's ZIP code
- **Family**: Family size
- **CCAvg**: Average monthly credit card spending (in $1000s)
- **Education**: 1 = Undergrad, 2 = Graduate, 3 = Advanced/Professional
- **Mortgage**: Value of house mortgage (if any)
- **Personal_Loan**: Target variable (1 if accepted loan, else 0)
- **Securities_Account**: 1 if customer has one, else 0
- **CD_Account**: 1 if has certificate of deposit, else 0
- **Online**: 1 if uses online banking, else 0
- **CreditCard**: 1 if has external bank credit card, else 0


In [None]:
import pandas as pd

df = pd.read_csv("/mnt/data/Loan_Modelling.csv")
df.head()

In [None]:
# Dataset overview
{
    "shape": df.shape,
    "columns": df.columns.tolist(),
    "dtypes": df.dtypes,
    "missing_values": df.isnull().sum(),
    "duplicate_rows": df.duplicated().sum(),
    "target_distribution": df["Personal_Loan"].value_counts(normalize=True)
}

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

def plot_univariate_distributions(data):
    numerical = data.select_dtypes(include=["int64", "float64"]).columns.drop(["ID", "ZIPCode", "Personal_Loan"])
    categorical = ["Education", "Family", "Securities_Account", "CD_Account", "Online", "CreditCard"]

    for col in numerical:
        plt.figure(figsize=(6, 4))
        sns.histplot(data[col], kde=True, bins=30)
        plt.title(f"Distribution of {col}")
        plt.show()

    for col in categorical:
        plt.figure(figsize=(6, 4))
        sns.countplot(x=col, data=data)
        plt.title(f"Count plot of {col}")
        plt.show()

plot_univariate_distributions(df)

## 🔎 Key Observations

- No missing values or duplicates.
- Data is moderately skewed in `CCAvg` and `Mortgage`.
- Education, family size, and online usage appear categorical and useful.
- The positive loan acceptance rate is ~9.6%, showing class imbalance.
