Skip to content

vikashnin/Credit_Risk_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Credit_Risk_Analysis

Overview of Analysis:

Using the credit card credit dataset from LendingClub, a peer-to-peer lending services company, I will oversample the data using the RandomOverSampler and SMOTE algorithms, and undersample the data using the ClusterCentroids algorithm. Then, I will use a combinatorial approach of over- and undersampling using the SMOTEENN algorithm. Then, compare two new machine learning models that reduce bias, BalancedRandomForestClassifier and EasyEnsembleClassifier, to predict credit risk. Once I am done I will evaluate the performance of these models and make a written recommendation on whether they should be used to predict credit risk.

Results

1. Naive Random Oversampling

Naive Random Oversampling

Balanced Accuracy Score: 0.6742571941946299. Precision Score: The precision for high_risk loans is low and the low_risk loans is high. Recall Score High_Risk:0.74 Low_Risk:0.61

2. SMOTE Oversampling

SMOTE Oversampling

Balanced Accuracy Score: 0.6623356588465208. Precision Score: The precision for high_risk loans is low and the low_risk loans is high. Recall Score High_Risk:0.63 Low_Risk:0.69

3. Undersampling

Undersampling

Balanced Accuracy Score: 0.5442166848817717. Precision Score: The precision for high_risk loans is low and the low_risk loans is high. Recall Score High_Risk:0.69 Low_Risk:0.40

4. Combination (Under and Over) Sampling

Combination (Over and Under) Sampling

Balanced Accuracy Score: 0.5441784794709592. Precision Score: The precision for high_risk loans is low and the low_risk loans is high. Recall Score High_Risk:0.72 Low_Risk:0.57

5. Balanced Random Forest Classifier

Balanced Random Forest Classifier

Balanced Accuracy Score: 0.7885466545953005. Precision Score: The precision for high_risk loans is low and the low_risk loans is high. Recall Score High_Risk:0.70 Low_Risk:0.87

6. Easy Ensemble AdaBoost Classifier

Easy Ensemble AdaBoost Classifier

Balanced Accuracy Score: 0.9316600714093861. Precision Score: The precision for high_risk loans is low and the low_risk loans is high. Recall Score High_Risk:0.92 Low_Risk:0.94

Summary

All the models that were used showed a very poor precision for for calculating credit risk. The majority of the balanced accuracy score is 54-93%. From my observations the best model to use is the Easy Ensemble AdaBoost Classifier because overall it had the highest score for balance, precision, and recall.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors