# 🏦 AllLife Bank: Personal Loan Campaign Analysis

## 🎯 Objective
To help AllLife Bank identify which deposit customers (liability customers) are more likely to purchase a **Personal Loan**, using predictive models and clustering. This will enable **targeted marketing** and increase loan conversion.

## 🧠 Problem Statement
We aim to:
- Predict whether a customer will accept a personal loan.
- Identify important attributes driving this decision.
- Segment the customer base using clustering (K-Means).
- Recommend marketing strategies based on results.

## 📘 Dataset Dictionary
- `ID`: Customer ID
- `Age`, `Experience`: Customer’s age and experience in years
- `Income`: Annual income (in $1000s)
- `ZIP Code`: Residential ZIP code
- `Family`: Family size
- `CCAvg`: Average credit card spending per month
- `Education`: Education level (1 = Undergrad, 2 = Graduate, 3 = Professional)
- `Mortgage`: Value of house mortgage
- `Personal_Loan`: Target variable (1 = accepted loan)
- `Securities_Account`, `CD_Account`, `Online`, `CreditCard`: Banking and product usage flags


In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("/mnt/data/Loan_Modelling.csv")
df.info(), df.duplicated().sum(), df.isnull().sum()

In [None]:
# Univariate
num_vars = ['Age', 'Experience', 'Income', 'CCAvg', 'Mortgage']
cat_vars = ['Family', 'Education', 'Securities_Account', 'CD_Account', 'Online', 'CreditCard']

for col in num_vars:
    sns.histplot(df[col], kde=True)
    plt.title(f"Distribution of {col}")
    plt.show()

for col in cat_vars:
    sns.countplot(data=df, x=col)
    plt.title(f"Count plot of {col}")
    plt.show()

In [None]:
# Bivariate with target
for col in num_vars + cat_vars:
    plt.figure(figsize=(6,4))
    if col in num_vars:
        sns.boxplot(data=df, x='Personal_Loan', y=col)
    else:
        sns.barplot(data=df, x=col, y='Personal_Loan')
    plt.title(f"{col} vs Personal_Loan")
    plt.show()

## 🧹 Data Preprocessing

In [None]:
df.drop(columns=["ID", "ZIPCode"], inplace=True)

# Capping outliers
for col in ['Age', 'Experience', 'Income', 'CCAvg', 'Mortgage']:
    Q1 = df[col].quantile(0.25)
    Q3 = df[col].quantile(0.75)
    IQR = Q3 - Q1
    lower, upper = Q1 - 1.5 * IQR, Q3 + 1.5 * IQR
    df[col] = np.clip(df[col], lower, upper)


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = df.drop("Personal_Loan", axis=1)
y = df["Personal_Loan"]

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42)

scaler = StandardScaler()
num_cols = ['Age', 'Experience', 'Income', 'CCAvg', 'Mortgage']
X_train[num_cols] = scaler.fit_transform(X_train[num_cols])
X_test[num_cols] = scaler.transform(X_test[num_cols])


## 🌳 Decision Tree Model

In [None]:
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import classification_report, roc_auc_score

dt = DecisionTreeClassifier(random_state=0)
dt.fit(X_train, y_train)
y_pred_dt = dt.predict(X_test)
print(classification_report(y_test, y_pred_dt))
print("ROC AUC Score:", roc_auc_score(y_test, dt.predict_proba(X_test)[:,1]))

In [None]:
plt.figure(figsize=(20,10))
plot_tree(dt, feature_names=X.columns, class_names=["No", "Yes"], filled=True)
plt.title("Decision Tree")
plt.show()

## 📈 Logistic Regression

In [None]:
from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)
print(classification_report(y_test, y_pred_lr))
print("ROC AUC Score:", roc_auc_score(y_test, lr.predict_proba(X_test)[:,1]))

## 📊 K-Means Clustering

In [None]:
from sklearn.cluster import KMeans

km = KMeans(n_clusters=3, random_state=0)
clusters = km.fit_predict(X[num_cols])
df["Cluster"] = clusters
sns.boxplot(data=df, x='Cluster', y='Income')
plt.title("Clustered Income Groups")
plt.show()

## ✂️ Decision Tree Pruning (Pre & Post)

In [None]:
# Pre-pruning
dt_pre = DecisionTreeClassifier(max_depth=4, min_samples_leaf=50, random_state=0)
dt_pre.fit(X_train, y_train)
print(classification_report(y_test, dt_pre.predict(X_test)))
print("ROC AUC Score:", roc_auc_score(y_test, dt_pre.predict_proba(X_test)[:,1]))

## ✅ Recommendations for Marketing

- Focus on **high-income**, **graduate/advanced degree**, and **high CCAvg** customers.
- Customers in **cluster 2** (highest income) are prime targets for campaigns.
- Use the **decision tree rules** to create if-else campaign filters.
- Maintain a balance of acquisition cost vs. loan conversion rate using logistic scores.

## 🏁 Conclusion

- **Decision Tree** performed best with high accuracy and interpretability.
- **Logistic Regression** confirms linear dependence on features like Income & Education.
- **KMeans** provided good segmentation for marketing.

This model is now ready to be deployed for campaign targeting. ✅
