### Naive Bayes (Classifier) - Gaussian Naive Bayes

#### Predicting Personal Loan Acceptance Based on Naive Bayes Classifier

- The file **UniversalBank.csv** contains data on 5000 customers of Universal Bank. The data include customer demographic information (age, income, etc.), the customer's relationship with the bank (mortgage, securities account, etc.), and the customer response to the last personal loan campaign (Personal Loan). Among these 5000 customers, only 480 (=9.6%) accepted the personal loan that was offered to them in the earlier campaign. 

A brief description of the 14 variables are given below:

1. ID: Customer ID 
2. Age: Customer's age in completed year 
3. Experience: # years of professional experience 
4. Income: Annual income of the customer (1,000) 
5. ZIPcode: Home address ZIP code 
6. Family: Family size of the customer 
7. CCAvg: Average monthly credit card spending (1, 000) 
8. Education: Education level: 1: undergrad; 2, Graduate; 3; Advance/Professional 
9. Mortgage: Value of house mortgage if any (1, 000) 
10. Personal loan: Did this customer accept the personal loan offered in he last campaign? 1, yes; 0, no 
11. Securities Acct: Does the customer have a securities account with the bank? 
12. CD Account: Does the customer have a certifcate of deposit (CD) account with the bank? 
13. Online: Does the customer use internet bank facilities? 
14. CreditCard: Does the customer use a credit card issued by the Bank?

In [None]:
# Import useful libararies used for data management

import numpy as np
import pandas as pd

# load dataset 'UniversalBank.csv', using 'ID' as index column

data = pd.read_csv('UniversalBank.csv', index_col='ID')

In [None]:
data.head()

In [None]:
data.describe()

In [None]:
# define independent attributes 
features = ['Age','Experience','Income','Family','CCAvg','Education','Mortgage','Securities Account', 'CD Account', 'Online', 'CreditCard']

# assign values for independent variables and target variable ('Personal Loan')
X = data [features]
y = data ['Personal Loan']

In [None]:
# import cross validation
from sklearn.model_selection import cross_val_predict, cross_val_score

In [None]:
#Import Gaussian Naive Bayes model from sklearn
from sklearn.naive_bayes import GaussianNB

#Create a Gaussian Naive Bayes Classifier
# https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html
gnb = GaussianNB()

In [None]:
pred_y = cross_val_predict(gnb,X,y,cv=10)

In [None]:
pred_y

In [None]:
score = cross_val_score(gnb,X,y,cv=10)

In [None]:
score.mean()

In [None]:
# Do majority-class prediction (not accept: 0) -> Benchmark model
# https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.htm
pred_val_maj = np.zeros((5000,),dtype=int)

In [None]:
pred_val_maj

In [None]:
# import libararies for evaluation measures
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

In [None]:
# print the accurary rate
# https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html
print("Accuracy of Benchmark Model:","\n", accuracy_score(y, pred_val_maj, normalize=True, sample_weight=None))
print("Accuracy of Naive Bayes Model:","\n", accuracy_score(y, pred_y, normalize=True, sample_weight=None))

In [None]:
# print the confusion matrix
# https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
print("Confusion Matrix of Benchmark Model:", "\n", confusion_matrix(y, pred_val_maj))
print("Confusion Matrix of Naive Bayes Model:", "\n", confusion_matrix(y, pred_y))

In [None]:
# print the classification report with precision, recall
# https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html
print("Classification Report of Benchmark Model:", "\n", classification_report(y, pred_val_maj))
print("Classification Report of Naive Bayes Model:","\n",classification_report(y, pred_y))

In [None]:
prob = cross_val_predict(gnb,X,y,cv=10, method = 'predict_proba')

In [None]:
prob

In [None]:
prob[:,1]

In [None]:
from sklearn.metrics import roc_curve, auc
from sklearn import metrics

In [None]:
# y is the true label, pred_prob[:,1] measures the predicted probability of belonging to class 1 (which is defined as positive class)
# https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html
fpr, tpr, thresholds = roc_curve(y, prob[:,1], pos_label=1)

In [None]:
# Aread under ROC curve
roc_auc=auc(fpr,tpr)
roc_auc

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.figure()
lw = 2
plt.plot(fpr, tpr, marker='o',color='darkorange',
         lw=lw, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([-0.05, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic')
plt.legend(loc="lower right")
plt.show()

In [None]:
# train model using all training dataset
# https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB.fit
gnb.fit(X, y)

In [None]:
# test model (output probability estimate)
# https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB.predict_proba
pred_prob = gnb.predict_proba(?)