# A Notebook to Use Naïve Bayes Classifiers

This notebook shows how to train Naïve Bayes classifiers to classify unseen instances.

For those of you interested in understanding the code, it uses predefined functions from the [sklearn](http://scikit-learn.org) library of machine learning primitives. 

Naïve Bayes Classifer is a probabilistic classifier which is based on Bayes Theorem. It can be used to find the probability of Hypothesis (H) being True given an event (E) has occurred. 

$$
P(H|E)= \frac{P(E|H)*P(H)}{P(E)}
$$

First import the data and load custom functions

In [None]:
!wget https://raw.githubusercontent.com/khider/INF549/master/Homework%20Assignments/Homework%204/Dataset/lenses.csv
!wget https://raw.githubusercontent.com/khider/INF549/master/Homework%20Assignments/Homework%204/Dataset/iris.csv 

In [None]:
import numpy as np
from sklearn.naive_bayes import BernoulliNB,GaussianNB,MultinomialNB
from sklearn.model_selection import cross_val_score

def loadDataSet(dataset): 
    with open(dataset) as f:
        data=f.readlines()
        attributes=data[0].rstrip().split(',')[:-1]
        instances=[entry.rstrip().split(',')[:-1] for entry in data[1:]]
        dataArray=[]
        for i in range(len(instances[0])):
            dataArray.append([float(instance[i]) for instance in instances])
        instances=np.array(dataArray).T
        labels=[entry.rstrip().split(',')[-1] for entry in data[1:]]
        return attributes,instances,labels



def predict(testset):
    if "clf_G" in globals():
        prediction=clf_G.predict(testset)
        print("GaussianNB: ",prediction)
    if "clf_M" in globals():
        prediction=clf_M.predict(testset)
        print("MultinomialNB: ",prediction)

## Building and Evaluating Naïve Bayes Classifiers##

We will be looking at the performance of two different Naïve Bayes Classifier. 

* Multinomial Naïve Bayes: suitable for classification with discrete features.
* Gaussian Naïve Bayes: suitable for classification with continuous features.

### Gaussian Naïve Bayes Classifier

In [None]:
dataset=input('Please Enter Your Dataset:')
attributes,instances,labels=loadDataSet(dataset)
clf_G = GaussianNB()
clf_G.fit(instances, labels)
print("Gaussian Naïve Bayes is used.")

In [None]:
n_foldCV=int(input("Please Enter the Number of Folds:"))
attributes,instances,labels=loadDataSet(dataset)
clf_G = GaussianNB()
scores = cross_val_score(clf_G, instances, labels, cv=n_foldCV)
print("======GaussianNB======")
print(np.array2string(scores,separator=","))
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

### Multinomial Naïve Bayes Classifier





In [None]:
dataset=input('Please Enter Your Dataset:')
attributes,instances,labels=loadDataSet(dataset)
clf_M = MultinomialNB()
clf_M.fit(instances, labels)
print("Multinomial Naïve Bayes is used.")

In [None]:
n_foldCV=int(input("Please Enter the Number of Folds:"))
attributes,instances,labels=loadDataSet(dataset)
clf_M = MultinomialNB()
scores = cross_val_score(clf_M, instances, labels, cv=n_foldCV)
print("======MultinomialNB======")
print(np.array2string(scores,separator=","))
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))