Credit Cards Customers Project

Project Description

Customer Churn (customer attrition) is the most challenge problem for business such as credit cards or telecommunication companies etc. Building models to predict who is going to churn would help improve business, and companies can prevent from losing their customers.

In this project, I analyzed credit card customers' dataset and built machine learning (ML) models to predict who churn the service. The roc-auc score of the final model is 0.993.

More details for the dataset is here. I also wrote a blog for this project and posted in Kaggle.

Machine Learning Model Pipeline

I compared 6 models (Logistic Regression, Support Vector Machine, KNeighbor, Random Forest (RF), AdaBoost (ADA), and GradientBoostingModel (GBM)) by using results' metrics (Accuracy, Recall, Precision, and ROC AUC score).

GBM model showed the best result compared to other models.

Tuned GBM model shows better performance compared to the baseline GBM model.

By using the best hyperparamters, we got an improved GBM model showing 7% increased performance (0.993 of ROC-AUC score) compared to the baseline (0.916 of ROC-AUC).

NOTE: Please see Building_Model_Pipeline.ipynb, if you want to see more detail.

Dealing with Imbalanced dataset

We have a classification problem. We have 84% of Existing customer data and 16% of Attrited customer data. This dataset is imbalanced. So, we needed to deal with this very carefully.

I compared 5 different over-sampling methods: RandomOverSampler, SMOTE, ADASYN, BorderlineSMOTE, and SVMSMOTE.

As a result, ADASYN and BorderlineSMOTE showed the best performances so that I chose ADASYN to balance the dataset in the final pipeline.

NOTE: Please see Oversampling.ipynb, if you want to see more detail.

Visualization_EDA

Creating graphs help solve the questions below:

Any differences between Churn and Exist groups depending on each feature? Some features show differences between Churn and Exist groups.

Are there correlations between numerical variables?

NOTE: Please see Cleaning_EDA_and_Visualization.ipynb, if you want to see more detail.

File Description

The files structure is arranged as below:

- Data
    - BankChurners.csv: raw data 
    
- Building_Model_Pipeline.ipynb 
    : Workflow regarding building Gradient Boosting Model.
- Cleaning_EDA_and_Visalization.ipynb 
    : Data cleaning, exploratory data analysis, and visualizing data.
- FeatureEngineeringScaling.ipynb 
    : Investigations regarding feature engineering, feature scaling, and feature importance. 
- HyperparameterTuningForGBM.ipynb 
    : Hyperparameter tuning by using Optuna to find the best parameters for GBM model. 
- Oversampling.ipynb
    : Finding the best over-sampling methods for this project to deal with imbalanced dataset.
- readme.md

Dependencies

Python 3.5+
Machine Learning Libraries:
- Numpy
- Pandas
- Sciki-Learn
- Feature-engine
- Imblearn
- Optuna
Visualization: Matplotlib, Seaborn

Acknowledgements

Data was provided by Kaggle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data

Data

images

images

Building_Model_Pipeline.ipynb

Building_Model_Pipeline.ipynb

Cleaning_EDA_and_Visualization.ipynb

Cleaning_EDA_and_Visualization.ipynb

FeatureEngineeringScaling.ipynb

FeatureEngineeringScaling.ipynb

HyperparameterTuningForGBM.ipynb

HyperparameterTuningForGBM.ipynb

Oversampling.ipynb

Oversampling.ipynb

readme.md

readme.md

Repository files navigation

Credit Cards Customers Project

Project Description

Machine Learning Model Pipeline

Dealing with Imbalanced dataset

Visualization_EDA

File Description

Dependencies

Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
Data		Data
images		images
Building_Model_Pipeline.ipynb		Building_Model_Pipeline.ipynb
Cleaning_EDA_and_Visualization.ipynb		Cleaning_EDA_and_Visualization.ipynb
FeatureEngineeringScaling.ipynb		FeatureEngineeringScaling.ipynb
HyperparameterTuningForGBM.ipynb		HyperparameterTuningForGBM.ipynb
Oversampling.ipynb		Oversampling.ipynb
readme.md		readme.md

zoeyejiseoung/CreditCard_Churn

Folders and files

Latest commit

History

Repository files navigation

Credit Cards Customers Project

Project Description

Machine Learning Model Pipeline

Dealing with Imbalanced dataset

Visualization_EDA

File Description

Dependencies

Acknowledgements

About

Resources

Stars

Watchers

Forks

Languages