Custumer-Churn

Customer Churn Analysis in R: Logistic, Classification Tree, XGBoost, Random Forest.

Content:

Preprocessing & Data cleaning
Exploratory Data Analysis (EDA)
Feature selection & Chi-square Test
Predictive Models: Logistic, Classification Tree, XGBoost, Random Forest
Compare Models’ Performance
Feature Importance

Code:

https://github.com/trajceskijovan/Custumer-Churn/blob/main/Customer%20Churn%20Analysis.R

Context:

This analysis focuses on the behavior of bank customers who are more likely to leave the bank (i.e. close their bank account). The goal here is to identify the behavior of customers through Exploratory Data Analysis and later on use predictive analytics techniques to determine the customers who are most likely to churn (leave).

EDA

- CreditScore: from 350 to 850
- Geography:France, Germany and Spain
- Age: from 18 to 92
- Tenure: how long customer has stayed with the bank
- Balance: the amount of money available for withdrawal
- NumOfProducts: number of products customers use in the bank
- IsActiveMember: 0,1 -> Inactive, Active
- EstimatedSalary: customer’s annual salary
- Exited: whether the customer has churned (closed the bank account) where 0,1 -> Stay, Churn

No NA`s:

Target [ Stay (0), Churn/Leave(1) ]:

Distribution - Continuous Variables:

Age is a bit right-skewed
Balance is fairly normal distributed
Most credit scores are above 600, it is possible that high quality customers will churn

Correlation Matrix

No high correlation between the continuous variables (i.e. no multicollinearity)
We will keep all of the continuous variables

Distribution - Categorical Variables:

More male customers than females
Customers are mostly from France
Most customers have the bank’s credit card
Almost equal number of active and non-active members, not a very good sign
Most customers use one or two kind of products, with a very few use three or four products
Almost equal number of customers in different tenure groups, except 0 and 10.

Variables Deep Dive:

AGE

Non-churned customers have a right-skewed distribution (tend to be young)
Outliers above 60 years old - perhaps our stable customers
Churned customers are mostly around 40 to 50. They might need to switch to other banking service for retirement purpose or whole family
We cab see very clear difference between these two groups

BALANCE

Distribution of these two groups is similar
Surprisingly some non-churned customers have lower balance than churned customers

CREDIT SCORE

Similar distribution
Some customers with extremely low credit score (on the left tail) as well as with high credit score also churned
Indicates that really low and high quality customer can easily churn than the average quality customer

CUSTOMER ESTIMATED SALARY

Both groups have a very similar distribution
Esimated Salary might not be a very important infomation to decide if a customer will churn or not

CATEGORICAL VARIABLES

Female are more likely to churn than male
Customers in Germany are more likely to churn than customers in France and Spain
In-active customers are more likely to churn than active
HasCrCard may not be a useful feature as we cannot really tell if a customer has credit card will churn or not
Customers in different tenure groups don’t have an apparent tendency to churn or stay
Customers who use 3 or 4 product are extremely likely to churn

Feature selection by using chi-square test:

Preprocessing & fixing Class Balance for Train data set:

Split the data using a stratified sampling approach (70/30 ratio)
Inspect target distibution
Our dataset is not balanced as we have 80% for Stay and 20% for Leave/Churn
We will perform oversampling & undersampling to balance the data set via "ovun.sample" function
Now the data set is balanced, however, you see that we’ve lost significant information from the sample
To fix this we will do both undersampling and oversampling on this imbalanced data via the method = “both“
In this case, the minority class is oversampled with replacement and majority class is undersampled without replacement

Build and Train Predictive Models:

Logistic Model
Decison Tree
Random Forest
XGBoost

Compare model perfomance via ROC:

Feature Importance:

Feature Importance is similar based on the two approaches
Age has the highest impact on customer churn. The bank can come up with some preferential policies to engage with these older customers
The number of products used by the customer has an impact on customer churn. As we have discussed in the exploratory analysis, people who use more than two products are very likely to churn. So the bank should start examing why this is happening. Maybe the bank’s reward program has limitation, or perhaps the loan interest is not very attractive?
Customers from Germany are more likely to churn compared to customers in France and Spain. The marketing team should focus on retaining German customers

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
samples		samples
Customer Churn Analysis.R		Customer Churn Analysis.R
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Custumer-Churn

Content:

Code:

Context:

EDA

No NA`s:

Target [ Stay (0), Churn/Leave(1) ]:

Distribution - Continuous Variables:

Correlation Matrix

Distribution - Categorical Variables:

Variables Deep Dive:

AGE

BALANCE

CREDIT SCORE

CUSTOMER ESTIMATED SALARY

CATEGORICAL VARIABLES

Feature selection by using chi-square test:

Preprocessing & fixing Class Balance for Train data set:

Build and Train Predictive Models:

Compare model perfomance via ROC:

Feature Importance:

About

Releases

Packages

Languages

trajceskijovan/Custumer-Churn

Folders and files

Latest commit

History

Repository files navigation

Custumer-Churn

Content:

Code:

Context:

EDA

No NA`s:

Target [ Stay (0), Churn/Leave(1) ]:

Distribution - Continuous Variables:

Correlation Matrix

Distribution - Categorical Variables:

Variables Deep Dive:

AGE

BALANCE

CREDIT SCORE

CUSTOMER ESTIMATED SALARY

CATEGORICAL VARIABLES

Feature selection by using chi-square test:

Preprocessing & fixing Class Balance for Train data set:

Build and Train Predictive Models:

Compare model perfomance via ROC:

Feature Importance:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages