# Acceptance of Personal Loan

Excerpt From: Galit Shmueli. “Data Mining for Business Analytics.”

Universal Bank is a relatively young bank that is growing rapidly in terms of overall customer acquisition. The majority of these customers are liability customers with varying sizes of relationship with the bank. The customer base of asset customers is quite small, and the bank is interested in growing this base rapidly to bring in more loan business. In particular, it wants to explore ways of converting its liability (deposit) customers to personal loan customers.

A campaign the bank ran for liability customers showed a healthy conversion rate of over 9% successes. This has encouraged the retail marketing department to devise smarter campaigns with better target marketing. The goal of our analysis is to model the previous campaign’s customer behavior to analyze what combination of factors make a customer more likely to accept a personal loan. This will serve as the basis for the design of a new campaign.

The bank’s dataset includes data on 5000 customers. The data include customer demographic information (age, income, etc.), customer response to the last personal loan campaign (Personal Loan), and the customer’s relationship with the bank (mortgage, securities account, etc.).


In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import RandomForestClassifier
from dmba import classificationSummary

In [2]:
bank_df = pd.read_csv('data/UniversalBank.csv')
bank_df.drop(columns=['ID', 'ZIP Code'], inplace=True)
# split into training and validation
X = bank_df.drop(columns=['Personal Loan'])
y = bank_df['Personal Loan']
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.40, 
                                                      random_state=3)

In [3]:
# single tree
defaultTree = DecisionTreeClassifier(random_state=1)
defaultTree.fit(X_train, y_train)
classes = defaultTree.classes_
classificationSummary(y_valid, defaultTree.predict(X_valid), class_names=classes)

Confusion Matrix (Accuracy 0.9825)

       Prediction
Actual    0    1
     0 1778   15
     1   20  187


In [4]:
# bagging: (bootstrapping) vary the sample set each model is trained on 
bagging =BaggingClassifier(DecisionTreeClassifier(random_state=1), 
                            n_estimators=100, random_state=1)
bagging.fit(X_train, y_train)
classificationSummary(y_valid, bagging.predict(X_valid), class_names=classes)

Confusion Matrix (Accuracy 0.9855)

       Prediction
Actual    0    1
     0 1781   12
     1   17  190


In [5]:
# boosting: AdaBoost: sequentially train and update the weights based on error
boost = AdaBoostClassifier(DecisionTreeClassifier(random_state=1), 
                            n_estimators=100, random_state=1)
boost.fit(X_train, y_train)
classificationSummary(y_valid, boost.predict(X_valid), class_names=classes)

Confusion Matrix (Accuracy 0.9835)

       Prediction
Actual    0    1
     0 1776   17
     1   16  191


In [6]:
# Gradient boost: use residuals to fit the models
from sklearn.ensemble import GradientBoostingClassifier

gbrt = GradientBoostingClassifier(max_depth=2, n_estimators=3,
                                  learning_rate=1.0, random_state=1)
gbrt.fit(X_train, y_train)
classificationSummary(y_valid, gbrt.predict(X_valid), class_names=classes)

Confusion Matrix (Accuracy 0.9610)

       Prediction
Actual    0    1
     0 1733   60
     1   18  189
