# NAIVE Bayes algorithm

Naive Bayes Algorithm is a classification algorithm based on Bayes Theorem. It is called naive because it assumes that the features in a dataset are independent of each other. This assumption is not true in real life but it simplifies the computation and gives good results in most of the cases.

## Bayes Theorem

Bayes Theorem is a mathematical formula used for calculating conditional probability. It is defined as:

$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$

where A and B are events and P(B) != 0

## Naive Bayes Algorithm

Naive Bayes Algorithm is based on Bayes Theorem. It is defined as:

$$P(y|x_1,x_2,...,x_n) = \frac{P(x_1,x_2,...,x_n|y)P(y)}{P(x_1,x_2,...,x_n)}$$

where y is the class variable and x1, x2, ..., xn are the features.

The algorithm assumes that the features are independent of each other. So, the above equation can be written as:

$$P(y|x_1,x_2,...,x_n) = \frac{P(x_1|y)P(x_2|y)...P(x_n|y)P(y)}{P(x_1,x_2,...,x_n)}$$

The denominator is constant for a given input. So, the equation can be written as:

$$P(y|x_1,x_2,...,x_n) \propto P(x_1|y)P(x_2|y)...P(x_n|y)P(y)$$

The class with the highest probability is the output of the algorithm.

## Types of Naive Bayes Algorithm

There are three types of Naive Bayes Algorithm:

1. Gaussian Naive Bayes: It is used when the features are continuous. It assumes that the features follow a normal distribution.

2. Multinomial Naive Bayes: It is used when the features are discrete. It is used for text classification.

3. Bernoulli Naive Bayes: It is used when the features are binary.

## Steps to implement Naive Bayes Algorithm

1. Load the dataset
2. Split the dataset into training and testing sets
3. Initialize the parameters
4. Calculate the prior probabilities
5. Calculate the likelihood
6. Calculate the posterior probabilities
7. Make predictions
8. Evaluate the model

## Advantages of Naive Bayes Algorithm

**1-Simplicity** ( straightforward to implement and understand )

**2-Efficiency** ( requires a small amount of training data )

**3-Speed** ( very fast, making them suitable for real-time prediction )

**4-Good performance** ( often performs well in multi-class prediction )

**5-Works well with high-dimensional data** ( performs well even when the number of features is large )


1. It is simple and easy to implement
2. It is fast
3. It gives good results in most of the cases
4. It can be used for both binary and multiclass classification

## Disadvantages of Naive Bayes Algorithm

**Feature Independence:** Naive Bayes Algorithm assumes that the features are independent of each other. This assumption is not true in real life. So, the algorithm may not give accurate results in some cases.

**Data Scarcity:** Naive Bayes Algorithm does not work well with small datasets. It requires a large amount of data to give accurate results.

**Imbalanced Datasets:** Naive Bayes Algorithm does not work well with imbalanced datasets. It gives more weight to the majority class and less weight to the minority class.

**Non-Linear Data:** Naive Bayes Algorithm does not work well with non-linear data. It assumes that the features are linearly independent of each other. So, if the features are non-linearly dependent on each other, the algorithm may not give accurate results.

**Highly Correlated Features:** Naive Bayes Algorithm does not work well with highly correlated features. It assumes that the features are independent of each other. So, if the features are highly correlated, the algorithm may not give accurate results.
Missing Values: Naive Bayes Algorithm does not work well with missing values. It requires complete data to give accurate results.


1. It assumes that the features are independent of each other
2. It does not work well with large datasets
3. It does not work well with imbalanced datasets
4. It does not work well with missing values
5. It does not work well with non-linear data

## Applications of Naive Bayes Algorithm

1. Email spam detection (e.g. spam or not spam)
2. Text classification (e.g. news articles, web pages, etc.)
3. Sentiment analysis (e.g. movie reviews, product reviews, etc.)
4. Medical diagnosis (e.g. cancer or not cancer)
5. Credit scoring (e.g. good credit or bad credit)
6. Recommendation systems (e.g. movie recommendations, product recommendations, etc.)
7. Fraud detection (e.g. fraudulent or not fraudulent)
8. Weather prediction ( e.g. rainy or sunny day)
9. Face recognition 
10. Handwriting recognition 



There are five types of NB models under the scikit-learn library:

`Gaussian Naive Bayes:` gaussiannb is used in classification tasks and it assumes that feature values follow a gaussian distribution.

`Multinomial Naive Bayes:` It is used for discrete counts. For example, let’s say,  we have a text classification problem. Here we can consider Bernoulli trials which is one step further and instead of “word occurring in the document”, we have “count how often word occurs in the document”, you can think of it as “number of times outcome number x_i is observed over the n trials”.\
`Bernoulli Naive Bayes:` The binomial model is useful if your feature vectors are boolean (i.e. zeros and ones). One application would be text classification with ‘bag of words’ model where the 1s & 0s are “word occurs in the document” and “word does not occur in the document” respectively.\
`Complement Naive Bayes:` It is an adaptation of Multinomial NB where the complement of each class is used to calculate the model weights. So, this is suitable for imbalanced data sets and often outperforms the MNB on text classification tasks.\
`Categorical Naive Bayes:` Categorical Naive Bayes is useful if the features are categorically distributed. We have to encode the categorical variable in the numeric format using the ordinal encoder for using this algorithm.

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.datasets import load_iris

In [2]:
# load the dataset
iris = load_iris()
X = iris.data
y = iris.target

# train test split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [3]:
# model initialize
gnb = GaussianNB()

# train the model
gnb.fit(X_train, y_train)

# predict the test data
y_pred = gnb.predict(X_test)

# evaluate the model
print("Accuracy Score: ", accuracy_score(y_test, y_pred))
print("Confusion Matrix: \n", confusion_matrix(y_test, y_pred))
print("Classification Report: \n", classification_report(y_test, y_pred))

Accuracy Score:  0.9777777777777777
Confusion Matrix: 
 [[19  0  0]
 [ 0 12  1]
 [ 0  0 13]]
Classification Report: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      0.92      0.96        13
           2       0.93      1.00      0.96        13

    accuracy                           0.98        45
   macro avg       0.98      0.97      0.97        45
weighted avg       0.98      0.98      0.98        45



In [4]:
# model initialize
mnb = MultinomialNB()

# train the model
mnb.fit(X_train, y_train)

# predict the test data
y_pred = mnb.predict(X_test)

# evaluate the model
print("Accuracy Score: ", accuracy_score(y_test, y_pred))
print("Confusion Matrix: \n", confusion_matrix(y_test, y_pred))
print("Classification Report: \n", classification_report(y_test, y_pred))

Accuracy Score:  0.9555555555555556
Confusion Matrix: 
 [[19  0  0]
 [ 0 12  1]
 [ 0  1 12]]
Classification Report: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       0.92      0.92      0.92        13
           2       0.92      0.92      0.92        13

    accuracy                           0.96        45
   macro avg       0.95      0.95      0.95        45
weighted avg       0.96      0.96      0.96        45



In [5]:
# model initialize
bnb = BernoulliNB()

# train the model
bnb.fit(X_train, y_train)

# predict the test data
y_pred = bnb.predict(X_test)

# evaluate the model
print("Accuracy Score: ", accuracy_score(y_test, y_pred))
print("Confusion Matrix: \n", confusion_matrix(y_test, y_pred))
print("Classification Report: \n", classification_report(y_test, y_pred))

Accuracy Score:  0.28888888888888886
Confusion Matrix: 
 [[ 0 19  0]
 [ 0 13  0]
 [ 0 13  0]]
Classification Report: 
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        19
           1       0.29      1.00      0.45        13
           2       0.00      0.00      0.00        13

    accuracy                           0.29        45
   macro avg       0.10      0.33      0.15        45
weighted avg       0.08      0.29      0.13        45



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
