## Gaussian Naïve Bayes algorithm

A **Gaussian** Naïve Bayes algorithm is a special type of Naïve Bayes algorithm.

It's specifically used when the features have **continuous** values.

It's also assumed that all the features are following a **gaussian distribution** (i.e, **normal distribution**).

In [1]:
import pandas as pd
import numpy as np

from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn import model_selection

import seaborn as sns

## Example: Iris dataset (predict-a-flower)

In [2]:
iris = sns.load_dataset("iris")

iris.head(7)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
5,5.4,3.9,1.7,0.4,setosa
6,4.6,3.4,1.4,0.3,setosa


In [3]:
Y = iris["species"]  # response

X = iris.drop("species", axis=1)  # predictors


In [4]:
# split the data
# 25% hold out for testing
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=.25, random_state=25)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

((112, 4), (38, 4), (112,), (38,))

### Implement the default Gaussian NB function

In [5]:
gnb = GaussianNB()

gnb.fit(X_train, y_train)

GaussianNB()

In [6]:
Y_pred = gnb.predict(X_test)

Y_pred

array(['setosa', 'versicolor', 'virginica', 'versicolor', 'virginica',
       'versicolor', 'virginica', 'setosa', 'versicolor', 'versicolor',
       'setosa', 'setosa', 'setosa', 'versicolor', 'setosa', 'versicolor',
       'virginica', 'virginica', 'versicolor', 'versicolor', 'versicolor',
       'versicolor', 'versicolor', 'setosa', 'setosa', 'virginica',
       'versicolor', 'virginica', 'virginica', 'setosa', 'versicolor',
       'virginica', 'virginica', 'setosa', 'virginica', 'virginica',
       'versicolor', 'setosa'], dtype='<U10')

In [7]:
from sklearn.metrics import confusion_matrix

confusion_matrix = confusion_matrix(y_test, Y_pred)
confusion_matrix

array([[11,  0,  0],
       [ 0, 14,  2],
       [ 0,  1, 10]])

In [9]:
from sklearn.metrics import accuracy_score

print(accuracy_score(y_test, Y_pred))

0.9210526315789473


## Glass classification

In [10]:
glass = pd.read_csv("glassClass.csv")

glass.head()

Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Type
0,1.52101,13.64,4.49,1.1,71.78,0.06,8.75,0.0,0.0,1
1,1.51761,13.89,3.6,1.36,72.73,0.48,7.83,0.0,0.0,1
2,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.0,0.0,1
3,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.0,0.0,1
4,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.0,0.0,1


In [14]:
Y = glass["Type"]  # response
X = glass.drop("Type", axis=1)  # predictors

X.head(1)
# Y.head(1)  # 0 and 1

Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe
0,1.52101,13.64,4.49,1.1,71.78,0.06,8.75,0.0,0.0


In [15]:
# SPLIT. 25% hold out for testing
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=.25, random_state=25)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

((160, 9), (54, 9), (160,), (54,))

In [18]:
gnb2 = GaussianNB()

gnb2.fit(X_train, y_train)

GaussianNB()

In [19]:
Y_pred = gnb2.predict(X_test)

Y_pred

array([3, 3, 3, 1, 2, 2, 1, 3, 6, 7, 1, 6, 5, 1, 1, 6, 3, 3, 1, 1, 1, 1,
       3, 3, 1, 3, 2, 1, 7, 2, 1, 1, 7, 1, 7, 3, 7, 3, 1, 1, 1, 5, 7, 2,
       5, 6, 1, 3, 3, 1, 7, 1, 3, 1])

In [20]:
from sklearn.metrics import confusion_matrix

confusion_matrix = confusion_matrix(y_test, Y_pred)
confusion_matrix

array([[ 9,  0,  7,  0,  0,  0],
       [11,  3,  5,  3,  1,  0],
       [ 0,  0,  2,  0,  0,  0],
       [ 0,  2,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  3,  0],
       [ 1,  0,  0,  0,  0,  7]])

In [21]:
accuracy_score(y_test, Y_pred)

0.4444444444444444

## Multinomial NB

Multinomial Naïve Bayes classifier is suitable for classification with discrete features

In [23]:
from sklearn.naive_bayes import MultinomialNB

mnb = MultinomialNB()

mnb.fit(X_train, y_train)

MultinomialNB()

In [24]:
Y_pred = mnb.predict(X_test)

Y_pred

array([1, 1, 1, 1, 7, 2, 1, 1, 1, 7, 1, 2, 5, 1, 1, 2, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 2, 2, 2, 7, 1, 1, 1, 7, 2, 7, 1, 7, 2, 1, 1, 1, 6, 7, 2,
       2, 2, 1, 1, 1, 1, 7, 1, 1, 1])

In [25]:
confusion_matrix

array([[ 9,  0,  7,  0,  0,  0],
       [11,  3,  5,  3,  1,  0],
       [ 0,  0,  2,  0,  0,  0],
       [ 0,  2,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  3,  0],
       [ 1,  0,  0,  0,  0,  7]])

In [26]:
accuracy_score(y_test, Y_pred)

0.46296296296296297