#Evaluating Credit Risk of Borrowers utilizing random forest machine learning models.

#In this demonstration we highlight the importance of feature selection and the impact it can make on your models irrespective of data augmentation, hyperparameter tuning, algo blending, etc.

#Utilizing Open Source Python library Graphlab-Create, backed by a C++ engine, for quickly building large-scale, high-performance data products

In [2]:
import numpy as np
import pandas as pd

In [3]:
import graphlab
graphlab.canvas.set_target('ipynb')

In [4]:
loanData = graphlab.SFrame('lc-data.gl/')

This non-commercial license of GraphLab Create for academic use is assigned to dbercz@gmail.com and will expire on February 07, 2018.


[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1504192733.log


In [7]:
# Quick Summary overview of the distributions of our feature sets.

In [6]:
loanData.show(view="Summary")

In [7]:
numberOfFeatures = len(loanData.column_names())
numberOfRows = len(loanData)
print 'The number of features in our data set is: ',numberOfFeatures
print 'The number of rows is: ',numberOfRows

The number of features in our data set is:  68
The number of rows is:  122607


#We will be trying to determine the characteristics of what makes a safe loan.

#We will wait until we have run a preliminary Regression Algorithm to determine which features have the greatest impact on our predictions to do a deep dive. We will begin by evaluating the structure of the data. 

#we must delineate the loans column so graphlab-create can pick up on the flags. safe loans are 1 and bad loans are -1

In [9]:
loanData['safe_loans'] = loanData['bad_loans'].apply(lambda x : +1 if x==0 else -1)
loanData = loanData.remove_column('bad_loans')

In [10]:
loanData['safe_loans'].show(view = 'Categorical')

#We can see that the data is heavily weighted towards good loans with approximately 81% safe and 18% risky. In order to avoid an accuracy misclassification and achieve proper weights for our model we will be adjusting the data weights to account for 50/50 safe and risky loans.

#We are going to re-sample our data after performing a random shuffle to minimize potential sampling bias. To achieve a superior and consistent performance for our model we will be dividing up the safe loans into four sub-sets and training our model on all of them to avoid sampling bias.

In [11]:
# setting target data
target = 'safe_loans'

In [12]:
allLoans = loanData.to_dataframe()

In [13]:
# custom one hot encoder
def oneHotEncoding(dframe):
    from tqdm import tqdm
    StringFeatures=[]
    vlength = len(dframe)
    hlength = len(dframe.columns)
    [StringFeatures.append(col) for col in dframe.columns if isinstance(dframe[col][0],str)==True]
    featuresDict = {}
    numFeatures=len(StringFeatures)
    print("The number of features are: ",numFeatures)
    for feature in dframe.columns:
        iVars = set(dframe[feature])
        print('created ivars set')
        iVarsLength = len(iVars)
        if feature in StringFeatures:
            print('The current feature transforming: ',feature) 
            if iVarsLength<=50:
                newVals = dict(list(enumerate(iVars)))
                ListNewVal=[]
                [ListNewVal.append(x) for x in newVals]
                print ('New Values: ',ListNewVal)
                reverse_dict = {v:k for k,v in newVals.iteritems()}

                for m in tqdm(range(vlength)):
                    #for k in range(iVarsLength):
                    for newVal in reverse_dict:
                        #print ('comparing: %s with: %s')%(newVal,dframe[feature].iloc[m])
                        if newVal == dframe[feature].iloc[m]:
                            dframe=dframe.set_value(m,feature,reverse_dict[newVal])
                            #print ('the new val set was: ',reverse_dict[newVal])
                featuresDict[feature]=reverse_dict
            else:
                print('more than 50 vars. deleting: ',feature)
                del dframe[feature]
        else:
            print ('skipping %s, not a string feature')%(feature)
    return (featuresDict,dframe)

In [None]:
featuresDict,dframe=oneHotEncoding(allLoans)

  2%|▏         | 2888/122607 [00:00<00:04, 28873.63it/s]

('The number of features are: ', 24)
created ivars set
skipping id, not a string feature
created ivars set
skipping member_id, not a string feature
created ivars set
skipping loan_amnt, not a string feature
created ivars set
skipping funded_amnt, not a string feature
created ivars set
skipping funded_amnt_inv, not a string feature
created ivars set
('The current feature transforming: ', 'term')
('New Values: ', [0, 1])


100%|██████████| 122607/122607 [00:04<00:00, 29886.14it/s]
  1%|          | 978/122607 [00:00<00:12, 9770.53it/s]

created ivars set
skipping int_rate, not a string feature
created ivars set
skipping installment, not a string feature
created ivars set
('The current feature transforming: ', 'grade')
('New Values: ', [0, 1, 2, 3, 4, 5, 6])


100%|██████████| 122607/122607 [00:12<00:00, 9532.09it/s]
  0%|          | 195/122607 [00:00<01:03, 1941.55it/s]

created ivars set
('The current feature transforming: ', 'sub_grade')
('New Values: ', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])


100%|██████████| 122607/122607 [01:02<00:00, 1951.70it/s]
  0%|          | 534/122607 [00:00<00:22, 5325.78it/s]

created ivars set
('The current feature transforming: ', 'emp_title')
('more than 50 vars. deleting: ', 'emp_title')
created ivars set
('The current feature transforming: ', 'emp_length')
('New Values: ', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])


100%|██████████| 122607/122607 [00:22<00:00, 5476.41it/s]
  1%|          | 1507/122607 [00:00<00:08, 15066.85it/s]

created ivars set
('The current feature transforming: ', 'home_ownership')
('New Values: ', [0, 1, 2, 3])


100%|██████████| 122607/122607 [00:07<00:00, 16333.72it/s]
  2%|▏         | 1972/122607 [00:00<00:06, 19717.25it/s]

created ivars set
skipping annual_inc, not a string feature
created ivars set
('The current feature transforming: ', 'is_inc_v')
('New Values: ', [0, 1, 2])


100%|██████████| 122607/122607 [00:05<00:00, 21054.11it/s]
  1%|          | 1332/122607 [00:00<00:09, 13317.35it/s]

created ivars set
('The current feature transforming: ', 'issue_d')
('more than 50 vars. deleting: ', 'issue_d')
created ivars set
('The current feature transforming: ', 'loan_status')
('New Values: ', [0, 1, 2, 3, 4])


100%|██████████| 122607/122607 [00:09<00:00, 13065.23it/s]
  4%|▍         | 5439/122607 [00:00<00:02, 54382.92it/s]

created ivars set
('The current feature transforming: ', 'pymnt_plan')
('New Values: ', [0])


100%|██████████| 122607/122607 [00:02<00:00, 55459.56it/s]
  0%|          | 576/122607 [00:00<00:21, 5757.86it/s]

created ivars set
('The current feature transforming: ', 'url')
('more than 50 vars. deleting: ', 'url')
created ivars set
('The current feature transforming: ', 'desc')
('more than 50 vars. deleting: ', 'desc')
created ivars set
('The current feature transforming: ', 'purpose')
('New Values: ', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])


100%|██████████| 122607/122607 [00:21<00:00, 5637.09it/s]
  0%|          | 130/122607 [00:00<01:34, 1299.00it/s]

created ivars set
('The current feature transforming: ', 'title')
('more than 50 vars. deleting: ', 'title')
created ivars set
('The current feature transforming: ', 'zip_code')
('more than 50 vars. deleting: ', 'zip_code')
created ivars set
('The current feature transforming: ', 'addr_state')
('New Values: ', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])


100%|██████████| 122607/122607 [01:29<00:00, 1364.98it/s]

In [11]:
#checkpoint save dframe to csv
tmp = '/Users/home/Desktop/devProjects/MLdemo/dframeNEW.csv'
#dframe.to_csv(path_or_buf=tmp)

In [12]:
#import saved dframe
dframe=pd.read_csv(tmp)
del dframe['Unnamed: 0']

In [14]:
dframeCopy = dframe

In [18]:
def delColsNoVariance(dframe,threshold):
    #remove features with no variance as contribution to model isn't accretive
    delCols = []
    print ('col amt before: ',len(dframe.columns))
    for column in dframe.columns:
        try:
            theVar = dframe[column].var()
            if theVar <= threshold:
                delCols.append(column)
                del dframe[column]
        except:
            pass
    print ('col amt after: ',len(dframe.columns))
    print ('removed following 0 var cols: ',delCols)
    return dframe

In [19]:
dframeCopy = delColsNoVariance(dframeCopy,0)

('col amt before: ', 56)
('col amt after: ', 53)
('removed following 0 var cols: ', ['pymnt_plan', 'policy_code', 'inactive_loans'])


In [20]:
def removeNaNColumns(dframe):
    nancolumns=[]
    safe_loans = dframe['safe_loans']
    print (dframe.shape)
    for col in dframe.columns:
        if dframe[col].isnull().values.any():
            nancolumns.append(col)
            del dframe[col]
    print (dframe.shape)
    #dframe.append(safe_loans)
    print('nans removed: ',nancolumns)
    return dframe,safe_loans

In [21]:
dframeCopy,safeLoans = removeNaNColumns(dframeCopy)

(122607, 53)
(122607, 40)
('nans removed: ', ['annual_inc', 'delinq_2yrs', 'inq_last_6mths', 'mths_since_last_delinq', 'mths_since_last_record', 'open_acc', 'pub_rec', 'total_acc', 'collections_12_mths_ex_med', 'delinq_2yrs_zero', 'pub_rec_zero', 'collections_12_mths_zero', 'payment_inc_ratio'])


In [23]:
#remove unnecessary features
del dframeCopy['id']
del dframeCopy['member_id']

In [7]:
#load saved dataset
tmp = '/Users/home/Desktop/devProjects/MLdemo/dframeCopy.csv'
#dframeCopy.to_csv(path_or_buf=tmp)
dframeCopy = pd.read_csv(tmp)
del dframeCopy['Unnamed: 0']
del dframeCopy['Unnamed: 0.1']

In [8]:
def separateAtTarget(dframe,target):
    safeLoans = dframe[dframe[target]==+1]
    riskyLoans = dframe[dframe[target]==-1]
    return safeLoans,riskyLoans

In [9]:
target='safe_loans'
safeLoans,riskyLoans = separateAtTarget(dframeCopy,target)

In [10]:
#shuffle the data to avoid sampling bias
safeLoans = safeLoans.sample(frac=1).reset_index(drop=True)
riskyLoans = riskyLoans.sample(frac=1).reset_index(drop=True)

In [11]:
#split up safe loans into 4 equal subsets
subSafeLoans = np.array_split(safeLoans,4)
set1 = riskyLoans.append(subSafeLoans[0])
set2 = riskyLoans.append(subSafeLoans[1])
set3 = riskyLoans.append(subSafeLoans[2])
set4 = riskyLoans.append(subSafeLoans[3])

In [12]:
def removeTargetColumn(dframe):
    #remove target column from features else duplicate in model
    featureColumns = dframe.columns
    features=[]
    for feature in featureColumns:
        if feature != 'safe_loans':
            features.append(feature)
    return features

In [13]:
features1=removeTargetColumn(set1)
features2=removeTargetColumn(set2)
features3=removeTargetColumn(set3)
features4=removeTargetColumn(set4)

In [14]:
loans1 = graphlab.SFrame(set1)
loans2 = graphlab.SFrame(set2)
loans3 = graphlab.SFrame(set3)
loans4 = graphlab.SFrame(set4)

In [15]:
#split data into training and validation sets
train_data1, validation_data1 = loans1.random_split(.8, seed=1)
train_data2, validation_data2 = loans2.random_split(.8, seed=1)
train_data3, validation_data3 = loans3.random_split(.8, seed=1)
train_data4, validation_data4 = loans4.random_split(.8, seed=1)

In [16]:
train1 = graphlab.SFrame(train_data1)
val1 = graphlab.SFrame(validation_data1)
train2 = graphlab.SFrame(train_data2)
val2 = graphlab.SFrame(validation_data2)
train3 = graphlab.SFrame(train_data3)
val3 = graphlab.SFrame(validation_data3)
train4 = graphlab.SFrame(train_data4)
val4 = graphlab.SFrame(validation_data4)

#Determining high impact features rated with best accuracy

In [17]:
import operator
import itertools
from tqdm import tqdm
from operator import itemgetter

def buildTheTree(trainData,validationData,target,depth):
    trainDataFrame = trainData.to_dataframe()
    allFeatures=[]
    for feature in trainDataFrame.columns:
        allFeatures.append(feature)
    allFeatures.remove(target)
    historicalAccuracies=[]
    #permutations = itertools.permutations(allFeatures,10)
    #combinations = itertools.combinations(allFeatures,10)

    for feature in allFeatures:
        currentFeature=[]
        currentFeature.append(feature)
        print('current feature list is: ',currentFeature)
        dtreeIter = graphlab.decision_tree_classifier.create(trainData,validation_set=None,target=target,features=currentFeature,max_depth=depth)
        sumAcc=0
        for valSets in validationData:
            sumAcc += dtreeIter.evaluate(valSets)['accuracy']
        avgAccuracy=(sumAcc/(len(validationData)))
        historicalAccuracies.append([avgAccuracy,feature])
    sortedA = sorted(historicalAccuracies,key=itemgetter(0),reverse=True)
    return sortedA

In [38]:
bestFeatures = buildTheTree(train1,[val1,val2,val3,val4],'safe_loans',5)

('current feature list is: ', ['loan_amnt'])


('current feature list is: ', ['funded_amnt'])


('current feature list is: ', ['funded_amnt_inv'])


('current feature list is: ', ['term'])


('current feature list is: ', ['int_rate'])


('current feature list is: ', ['installment'])


('current feature list is: ', ['grade'])


('current feature list is: ', ['sub_grade'])


('current feature list is: ', ['emp_length'])


('current feature list is: ', ['home_ownership'])


('current feature list is: ', ['is_inc_v'])


('current feature list is: ', ['loan_status'])


('current feature list is: ', ['purpose'])


('current feature list is: ', ['addr_state'])


('current feature list is: ', ['dti'])


('current feature list is: ', ['revol_bal'])


('current feature list is: ', ['revol_util'])


('current feature list is: ', ['initial_list_status'])


('current feature list is: ', ['out_prncp'])


('current feature list is: ', ['out_prncp_inv'])


('current feature list is: ', ['total_pymnt'])


('current feature list is: ', ['total_pymnt_inv'])


('current feature list is: ', ['total_rec_prncp'])


('current feature list is: ', ['total_rec_int'])


('current feature list is: ', ['total_rec_late_fee'])


('current feature list is: ', ['recoveries'])


('current feature list is: ', ['collection_recovery_fee'])


('current feature list is: ', ['last_pymnt_amnt'])


('current feature list is: ', ['not_compliant'])


('current feature list is: ', ['status'])


('current feature list is: ', ['emp_length_num'])


('current feature list is: ', ['grade_num'])


('current feature list is: ', ['sub_grade_num'])


('current feature list is: ', ['short_emp'])


('current feature list is: ', ['last_delinq_none'])


('current feature list is: ', ['last_record_none'])


('current feature list is: ', ['last_major_derog_none'])


In [39]:
bestFeatures

[[1.0, 'loan_status'],
 [1.0, 'status'],
 [0.8863138114625744, 'total_rec_prncp'],
 [0.8763440860215054, 'last_pymnt_amnt'],
 [0.7645892055538156, 'recoveries'],
 [0.7491909385113269, 'total_pymnt'],
 [0.7467898528030066, 'total_pymnt_inv'],
 [0.7214740578348471, 'collection_recovery_fee'],
 [0.6179663848000836, 'int_rate'],
 [0.6091711034554755, 'sub_grade'],
 [0.6070310053241466, 'grade'],
 [0.6070310053241466, 'grade_num'],
 [0.5807756550788182, 'term'],
 [0.5635765737550893, 'revol_util'],
 [0.557469464453492, 'dti'],
 [0.5432978390228624, 'funded_amnt_inv'],
 [0.5417841110763127, 'loan_amnt'],
 [0.5417841110763127, 'total_rec_late_fee'],
 [0.5417058148032154, 'funded_amnt'],
 [0.5374256185405575, 'total_rec_int'],
 [0.5352855204092286, 'installment'],
 [0.5324668545777221, 'home_ownership'],
 [0.5317882868775446, 'is_inc_v'],
 [0.5302745589309948, 'purpose'],
 [0.5244545359640882, 'addr_state'],
 [0.5221839440442635, 'emp_length'],
 [0.5221839440442635, 'emp_length_num'],
 [0.5180

In [19]:
features = ['grade',                     # grade of the loan
            'sub_grade',                 # sub-grade of the loan
            'short_emp',                 # one year or less of employment
            'emp_length_num',            # number of years of employment
            'home_ownership',            # home_ownership status: own, mortgage or rent
            'dti',                       # debt to income ratio
            'purpose',                   # the purpose of the loan
            'term',                      # the term of the loan
            'last_delinq_none',          # has borrower had a delinquincy
            'last_major_derog_none',     # has borrower had 90 day or worse rating
            'revol_util',                # percent of available credit being used
            'total_rec_late_fee',        # total late fees received to day
           ]

In [20]:
rforest1 = graphlab.random_forest_classifier.create(train1, target=target, features=features, max_iterations=10, validation_set='auto', verbose=True, class_weights=None, random_seed=None, metric='auto')
rforest2 = graphlab.random_forest_classifier.create(train2, target=target, features=features, max_iterations=10, validation_set='auto', verbose=True, class_weights=None, random_seed=None, metric='auto')
rforest3 = graphlab.random_forest_classifier.create(train3, target=target, features=features, max_iterations=10, validation_set='auto', verbose=True, class_weights=None, random_seed=None, metric='auto')
rforest4 = graphlab.random_forest_classifier.create(train4, target=target, features=features, max_iterations=10, validation_set='auto', verbose=True, class_weights=None, random_seed=None, metric='auto')

PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



In [21]:
rforest1.show(view="Tree")

#Evaluating our Random Forest Classifier

In [22]:
print rforest1.evaluate(val1)['accuracy']
print rforest2.evaluate(val2)['accuracy']
print rforest3.evaluate(val3)['accuracy']
print rforest4.evaluate(val4)['accuracy']

0.645265685353
0.640045933814
0.648293141246
0.636600897797


In [23]:
rforest1.evaluate(val1)

{'accuracy': 0.6452656853533771,
 'auc': 0.7003964499648219,
 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      -1      |        1        |  1832 |
 |      1       |        -1       |  1566 |
 |      -1      |        -1       |  2842 |
 |      1       |        1        |  3339 |
 +--------------+-----------------+-------+
 [4 rows x 3 columns],
 'f1_score': 0.6627630011909488,
 'log_loss': 0.6315396801751006,
 'precision': 0.6457164958421968,
 'recall': 0.6807339449541284,
 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 100001
 
 Data:
 +-----------+-----+-----+------+------+
 | threshold | fpr | tpr |  p   |  n   |
 +-----------+-----+-----+------+------+
 |    0.0    | 1.0 | 1.0 | 4905 | 4674 |
 |   1e-05   | 1.0 | 1.0 | 4905 | 4674 |
 |   2e-05   | 1.0 

#Evaluating predictions from our random forest model on preliminary implementation of features

In [25]:
predictions=rforest1.predict(val1)
cmatrix0 = graphlab.evaluation.confusion_matrix(val1['safe_loans'],predictions)
cmatrix0

target_label,predicted_label,count
1,1,3339
-1,1,1832
-1,-1,2842
1,-1,1566


#We will now try feature selection with features ranked by accuracy decision tree algorithm on each individual feature

In [26]:
Newfeatures = ['grade',                     # grade of the loan
            'sub_grade',                 # sub-grade of the loan
            'short_emp',                 # one year or less of employment
            'emp_length_num',            # number of years of employment
            'home_ownership',            # home_ownership status: own, mortgage or rent
            'dti',                       # debt to income ratio
            'purpose',                   # the purpose of the loan
            'term',                      # the term of the loan
            'last_delinq_none',          # has borrower had a delinquincy
            'last_major_derog_none',     # has borrower had 90 day or worse rating
            'revol_util',                # percent of available credit being used
            'total_rec_late_fee',        # total late fees received to day
            "total_rec_prncp",
            "last_pymnt_amnt",
            "recoveries",
            "total_pymnt",
            "total_pymnt_inv",
            "collection_recovery_fee"
           ]

In [27]:
rforest5 = graphlab.random_forest_classifier.create(train1, target=target, features=Newfeatures, max_iterations=10, validation_set='auto', verbose=True, class_weights=None, random_seed=None, metric='auto')
rforest6 = graphlab.random_forest_classifier.create(train2, target=target, features=Newfeatures, max_iterations=10, validation_set='auto', verbose=True, class_weights=None, random_seed=None, metric='auto')
rforest7 = graphlab.random_forest_classifier.create(train3, target=target, features=Newfeatures, max_iterations=10, validation_set='auto', verbose=True, class_weights=None, random_seed=None, metric='auto')
rforest8 = graphlab.random_forest_classifier.create(train4, target=target, features=Newfeatures, max_iterations=10, validation_set='auto', verbose=True, class_weights=None, random_seed=None, metric='auto')

PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



In [28]:
print rforest5.evaluate(val1)['accuracy']
print rforest6.evaluate(val2)['accuracy']
print rforest7.evaluate(val3)['accuracy']
print rforest8.evaluate(val4)['accuracy']

0.96335734419
0.961373838605
0.960434283328
0.95417058148


#As we can see from training our decision trees on individual features then hand picking them for our new model has increased accuracy from approximately 63% to over 95% for all four validation sets.

In [29]:
predictions=rforest5.predict(val1)
cmatrix1 = graphlab.evaluation.confusion_matrix(val1['safe_loans'],predictions)
cmatrix1

target_label,predicted_label,count
1,1,4709
-1,1,155
-1,-1,4519
1,-1,196


In [37]:
allLosses = cmatrix1[1]['count'] + cmatrix1[3]['count']
allGoodLeans = cmatrix1[0]['count']+cmatrix1[2]['count']
print('Good Loans: ',allGoodLeans)
print('All losses: ',allLosses)

('Good Loans: ', 9228)
('All losses: ', 351)


In [42]:
print('Potential loss $',(allLosses*51000)*0.67)

('Potential loss $', 11993670.0)


#Given a potential average loan of $51,000 we require a capital reserve requirement of: 11.9M If loans are sold 23% on dollar.

In [46]:
rforest5.summary

<bound method RandomForestClassifier.summary of Class                          : RandomForestClassifier

Schema
------
Number of examples             : 36551
Number of feature columns      : 18
Number of unpacked features    : 18
Number of classes              : 2

Settings
--------
Number of trees                : 10
Max tree depth                 : 6
Training time (sec)            : 0.3087
Training accuracy              : 0.9639
Validation accuracy            : 0.9645
Training log_loss              : 0.1998
Validation log_loss            : 0.1997
>

#We have successfully built a custom fitted random forest model to our dataset and reduced log loss to under 0.2

#We can now predict credit risk with a strong accuracy, and minimize any potential losses