Using the credit card credit dataset from LendingClub, a peer-to-peer lending services company, I will oversample the data using the RandomOverSampler and SMOTE algorithms, and undersample the data using the ClusterCentroids algorithm. Then, I will use a combinatorial approach of over- and undersampling using the SMOTEENN algorithm. Then, compare two new machine learning models that reduce bias, BalancedRandomForestClassifier and EasyEnsembleClassifier, to predict credit risk. Once I am done I will evaluate the performance of these models and make a written recommendation on whether they should be used to predict credit risk.
Balanced Accuracy Score: 0.6742571941946299. Precision Score: The precision for high_risk loans is low and the low_risk loans is high. Recall Score High_Risk:0.74 Low_Risk:0.61 Balanced Accuracy Score: 0.6623356588465208. Precision Score: The precision for high_risk loans is low and the low_risk loans is high. Recall Score High_Risk:0.63 Low_Risk:0.69 Balanced Accuracy Score: 0.5442166848817717. Precision Score: The precision for high_risk loans is low and the low_risk loans is high. Recall Score High_Risk:0.69 Low_Risk:0.40 Balanced Accuracy Score: 0.5441784794709592. Precision Score: The precision for high_risk loans is low and the low_risk loans is high. Recall Score High_Risk:0.72 Low_Risk:0.57 Balanced Accuracy Score: 0.7885466545953005. Precision Score: The precision for high_risk loans is low and the low_risk loans is high. Recall Score High_Risk:0.70 Low_Risk:0.87 Balanced Accuracy Score: 0.9316600714093861. Precision Score: The precision for high_risk loans is low and the low_risk loans is high. Recall Score High_Risk:0.92 Low_Risk:0.94All the models that were used showed a very poor precision for for calculating credit risk. The majority of the balanced accuracy score is 54-93%. From my observations the best model to use is the Easy Ensemble AdaBoost Classifier because overall it had the highest score for balance, precision, and recall.





