Skip to content
Detecting defaults and predict interest rate in Lending Club loan data
R
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R
data
.gitignore
LendingClub.Rproj
README.md
sub_grade.Rmd

README.md

LendingClub

Detecting defaults and predict interest rate in Lending Club loan data

Summary of analysis:

  • Extracted features from raw lending club loan data containing different types, such as categorial, numerical and time series data, imputed missing data using multivariate imputation by chained equation (MICE) algorithm.

  • Performed feature selection through exploratory analysis.

  • Fitted linear regression model with regularization to control for multicollinearity to predict loan interest rate

  • Upon loan initial application, predict whether it will be charge off or default.

  • Throughout loan payment period, predict whether next payment will be missing,or whether loan status will be changed in next quarter.

Data:

Data from Kaggle Lending Club Loan Data (https://www.kaggle.com/wendykan/lending-club-loan-data)

Complete loan data for all loans issued through the 2007-2015, including the current loan status (Current, Late, Fully Paid, etc.) and latest payment information.

The file containing loan data through the "present" contains complete loan data for all loans issued through the previous completed calendar quarter.

Additional features include credit scores, number of finance inquiries, address including zip codes, and state, and collections among others.

The file is a matrix of about 890 thousand observations and 75 variables.

You can’t perform that action at this time.