# **Naive Bayes Algorithm (NB)**

## What is it?

 * Simple but surprisingly powerful algorithms for predictive modeling and machine learning.
 * Based on **Bayes' Theorem**
 * Particulary useful in classification task
   * Text classification tasks such as spam filtering
   * sentiment analysis
  
 * Naive Bayes is a probabilistic machine learning model that is used for classification tasks.
  it's based on Bayes' theorem, which describes the probability of an event based on prior knowledge of conditions related to the event.

  The 'naive' aspect of the algorithm comes from the assumption that the features used to predict the target variable are independent of each other.

**Mathematicaly**

    P(A|B) = P(B|A)*P(A)
                P(B)

    Where

  * P(A|B) is the probability of hypothesis A give the data B.
  * P(B|A) is the probability of the data B given that the hypothesis A is true.
  * P(A) is the probability of hypothesis A being true (regardless of the data).
  * P(B) is the probability of the data (regardless of the hypothesis).

## Types of Nive Bayes Classifiers:

There are mainly three types of naive bayes models, which are used depending on the nature o fhte feature variables:

1. **Gaussian Naive Bayes:** Used when features are continuous and normally distributed.

2. **Multinomial Naive Bayes:** Often used for document classification, where the features are the frequencies of the words or tokens in the documents.

3. **Bernoulli Naive Bayes:** Used when ffeatrues are binary (0s and 1s)

## Applications:

1. Email spam filtering
2. Sentiment Analysis
3. Document Categorization
4. Medical Diagnosis

In [1]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.datasets import load_iris

# load the datasets
iris = load_iris()
x = iris.data
y = iris.target

# train test split the data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# initialize the model
gnb = GaussianNB()

#fit the model
gnb.fit(x_train, y_train)

#predict the model 
y_pred = gnb.predict(x_test)

# evaluate the model
print("accuracy", accuracy_score(y_test, y_pred))
print("confusion metrix:\n", confusion_matrix(y_test, y_pred))
print("classification_report:\n", classification_report(y_test, y_pred))

accuracy 1.0
confusion metrix:
 [[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
classification_report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



# Use Multinomial NB

In [2]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.datasets import load_iris

# load the datasets
iris = load_iris()
x = iris.data
y = iris.target

# train test split the data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# initialize the model
model = MultinomialNB()

#fit the model
model.fit(x_train, y_train)

#predict the model 
y_pred = model.predict(x_test)

# evaluate the model
print("accuracy", accuracy_score(y_test, y_pred))
print("confusion metrix:\n", confusion_matrix(y_test, y_pred))
print("classification_report:\n", classification_report(y_test, y_pred))

accuracy 0.9
confusion metrix:
 [[10  0  0]
 [ 0  9  0]
 [ 0  3  8]]
classification_report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       0.75      1.00      0.86         9
           2       1.00      0.73      0.84        11

    accuracy                           0.90        30
   macro avg       0.92      0.91      0.90        30
weighted avg       0.93      0.90      0.90        30



## BernoulliNB()

In [3]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.datasets import load_iris

# load the datasets
iris = load_iris()
x = iris.data
y = iris.target

# train test split the data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# initialize the model
mobel = BernoulliNB()

#fit the model
model.fit(x_train, y_train)

#predict the model 
y_pred = model.predict(x_test)

# evaluate the model
print("accuracy", accuracy_score(y_test, y_pred))
print("confusion metrix:\n", confusion_matrix(y_test, y_pred))
print("classification_report:\n", classification_report(y_test, y_pred))

accuracy 0.9
confusion metrix:
 [[10  0  0]
 [ 0  9  0]
 [ 0  3  8]]
classification_report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       0.75      1.00      0.86         9
           2       1.00      0.73      0.84        11

    accuracy                           0.90        30
   macro avg       0.92      0.91      0.90        30
weighted avg       0.93      0.90      0.90        30

