# MACHINE LEARNING AN END TO END SOLUTION: Section 4
## Loan Application

Presented by Shaun

Throughout the financial sector, machine learning algorithms are being developed to approve loan applications. In this project,we will process a dataset and use techniques to construct three machine learning models to predict on loan approvals. A dataset with 1102 observations will be examined, cleaned and organised; a data exploration will identify the features and their data types and the most important features, which are the best predictors for loan approvals.

### Model Persistence 
*The reason models are persisted is because, large and complex datasets may take days to train; therefore, it is not efficient, to retrain a model each time we want to evaluate new data against the classifier.  To persist the model, we will use a library from scikit-learn called **Pickle** which is the standard way in Python of serializing machine learning models, which will save the model in a serialized format to a file. Once the model is serialized as a file, this can then be loaded and deserialized to classify new data presented to it. The difference between the three classifiers when using Pickle to persist the model, is that the k-nearest neighbour algorithm stores the entire dataset to file, which could pose a problem on very large datasets.* 

#### Final Model
We are at the stage where we now need to create the **final model.** This is the model that will be used to make predictions on new data. In our project the final model will be trained on the full dataset. **Note:** There is alot of debate how the final model should be trained but, as this is a beginner tutorial we will keep it simple, but please review the research and discussions in this area. 

**Pickle:**  The pickle module implements a fundamental, but powerful algorithm for serializing and de-serializing a Python object structure. More details here:  __[Pickle](https://docs.python.org/3.1/library/pickle.html)__ 

In [29]:
import numpy as np
import pandas as pd

##impoting the classifiers
from sklearn.naive_bayes import GaussianNB
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier

##import Pickle to serialize the machine learning models.
import pickle

##import the train test split
from sklearn.model_selection import train_test_split

##scoring metrics
from sklearn.metrics import accuracy_score,confusion_matrix, precision_score, recall_score, f1_score,roc_curve, auc
 
##Visualisations
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('ggplot')
%matplotlib inline

In [30]:
predata = pd.read_csv('c:\\ml\\LoanPredOrig.csv')
predata = predata.dropna()
#Categorical data clean up, this will error if you try to run this twice consecutively, without reloading the dataset
predata['Loan_Status'] = predata.Loan_Status.astype(int)
predata['Employed'] = np.where(predata['Employed'].str.contains('YES'), 1, 0)
predata['Marital_Status'] = np.where(predata['Marital_Status'].str.contains('YES'), 1, 0)
predata['Graduate'] = np.where(predata['Graduate'].str.contains('YES'), 1, 0)
predata['Credit_History'] = np.where(predata['Credit_History'].str.contains('YES'), 1, 0)
predata['PropertyOwner'] = np.where(predata['PropertyOwner'].str.contains('YES'), 1, 0)
df = pd.DataFrame(predata)

#Drop the Loanid column
del df['Loanid']

#One hot encding on the Gender column
data = pd.get_dummies(df,columns=['Gender'])
 

### Specify the classifier you want to pickle
*Here we specify the classsifier and the filename of the final model* <BR>
**Note:** We have kept the code simple here, feel free to rewrite this if you want, and use a loop for this process.<BR>
*Un-Hash # each model you want to pickle *
**Note:** *In our exmple we have the Naive Bayes selected 'NB', this model was clearly the best classifier*

In [63]:
name ='NB'
#name ='KNN'
#name ='ANN'
if name =='KNN':
    clf = KNeighborsClassifier()
    filename = 'KNNClassifier.pkl'
if name =='NB':
    clf =GaussianNB()
    filename = 'NBGauClassifier.pkl'
if name =='ANN':
    clf = MLPClassifier() ## add the optimal parameters here for your MLP model otherwise the default will be used
    filename = 'ANNClassifier.pkl' 

### Train the final model
**Note:** *Here we use the fit() method, to train the final model on the full dataset.*

In [64]:
clf.fit(data.loc[:, data.columns != 'Loan_Status'], data['Loan_Status'])

GaussianNB(priors=None)

### Pickle the Model
**To Pickle the model we need to:**
1. *First we create a varaible, 'dumpmodelfile and assign it the open() function,  which opens the file for writing, this takes two arguments.*
2. *The first argument is the name of our file, 'filename', we already assigned this a value in the 'Specify the classifier you want to pickle', section *
3. *For second argument we specify 'wb'. The w means that we will be writing to the file, and the b refers to binary mode.*
4. *Now we can dump the data using the pickle.dump() fuction, which takes two arguments. *
5. *The first, is 'clf', which is the object we want to pickle.*
6. *The second, the object that needs to be saved, the 'dumpmodelfile'.*
7. *Lastly we close the file with the close() function.*

In [81]:
dumpmodelfile = open(filename,'wb')
pickle.dump(clf,dumpmodelfile)
dumpmodelfile.close()

### UnPickle the Model
#### To UnPickle the model we need to:

1. *First we create a varaible 'model_pkl' and assign it the open() function  which oppens the file for reading, this takes two arguments.*
2. *The first argument is 'filename', we specified earlier.*
3. *For second argument we specify 'rb'. The r means that we will be reading the file, and the b refers to binary mode,which is the way the file was saved in.*
4. *Now we need to load the file, we use pickle.load(), with infile 'model_pkl', as the argument.*
5. *Lastly we close the file with the close() function*

In [82]:
model_pkl = open(filename, 'rb')
FinalModel = pickle.load(model_pkl)
model_pkl.close()

### Review the UnPickled Model.
**Review the unpickled model using print and type().**

In [85]:
print ("Loaded the saved Final Model :: ", FinalModel)
print('Model Class Type',type(FinalModel))

Loaded the saved Final Model ::  GaussianNB(priors=None)
Model Class Type <class 'sklearn.naive_bayes.GaussianNB'>


### Test the model with new data
We can test the model by providing it with data in 2 d arrays [[]], this is how the Machine Learning Model will expect to receive data.
I have provided examples below of new_data1 with 1 set of data and new_data2 with 2 sets of data.

In [86]:
#(Marital_Status,Dependents,Graduate,Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,PropertyOwner,Gender_Female,Gender_Male)
## 1 set of data
new_data1 =[[1,2,1,1,5000,1508,120,360,1,1,0,1]]
## 2 sets of data
new_data2 =[[0,7,0,0,0,0,120,360,0,0,1,0],[1,2,1,1,5000,1508,120,360,1,1,0,1]]


FinalModelPred = FinalModel.predict(new_data1)
print('Saved Final Model Prediction: ', FinalModelPred)

FinalModelPred = FinalModel.predict(new_data2)
print('Saved Final Model Prediction: ', FinalModelPred)



Saved Final Model Prediction:  [1]
Saved Final Model Prediction:  [0 1]



## Congratualations! You have built a final machine learning model and have completed Section 4