Freddie Mac Single Family Loan Classification Project

Summary:

Using data collected at the start of the loan, I predict whether a loan will go default. At 75% recall, my model cut false positive rate almost by half compared to the strict credit score cutoff approach. This project is published in Towards Data Science on medium: Link.

Aim

To predict whether a loan is good or bad using data collected when the loan is originated.

Dataset

Single Family Loan-Level dataset downloaded from Freddie Mac's website

Year range: 1999-2003 Only completed/terminated loans are used. A good loan is a loan that has been fully paid-off; a bad loan is a loan that was terminated by other reasons. Raw data is a stored in sqlite database.

Preprocessing

map values from letter to numeric values in true vs false columns
clear NaN fields
label encode catagorial fields

Exploratory Data Analysis

Distribution of Credit Score by Loan Outcome

Under-sampling

Subsample majority class (good loans) to match minority class (bad loans).

Over-sampling

Resample minority class (bad loans) to match majority class (good loans).

Classifiers for Balanced Data

Gradient Boosting Classifier
Random Forest Classifier
SGD Classfier (Logistic Classifier)
Hard Voting Classifier of all of the above
Light GBM

Isolation Forest

Can we solve this problem with anamoly detection algorithm?

Classifiers for Imbalanced Data

Balanced Bagging Classifier
Balanced Random Forest Classifier
Easy Ensemble Classifier
Voting Classifier
Light GBM

Compared to Credit Score Cutoff

How does it fare compared to status quo?

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
freddie_final.ipynb		freddie_final.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Freddie Mac Single Family Loan Classification Project

Summary:

Aim

Dataset

Preprocessing

Exploratory Data Analysis

Under-sampling

Over-sampling

Classifiers for Balanced Data

Isolation Forest

Classifiers for Imbalanced Data

Compared to Credit Score Cutoff

About

Releases

Packages

Languages

tsofoon/freddie

Folders and files

Latest commit

History

Repository files navigation

Freddie Mac Single Family Loan Classification Project

Summary:

Aim

Dataset

Preprocessing

Exploratory Data Analysis

Under-sampling

Over-sampling

Classifiers for Balanced Data

Isolation Forest

Classifiers for Imbalanced Data

Compared to Credit Score Cutoff

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages