# What is Naive Bayes theorem?

naïve Bayes classifiers TESTING are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naïve) independence assumptions between the features

- In machine learning we are often interested in selecting the best hypothesis (h) given data (d).

- In a classification problem, our hypothesis (h) may be the class to assign for a new data instance (d).

Bayes’ Theorem provides a way that we can calculate the probability of a hypothesis given the data 

\begin{equation}
P(h|d)= \frac{(P(d|h) P(h))}{ P(d)}
\end{equation}

- P(h|d) = Posterior probability. The probability of hypothesis h being true, given the data d, where P(h|d)= P(d1| h) P(d2| h)….P(dn| h) P(d)
- P(d|h) = Likelihood. The probability of data d given that the hypothesis h was true.
- P(h) = Class prior probability. The probability of hypothesis h being true (irrespective of the data)
- P(d) = Predictor prior probability. Probability of the data (irrespective of the hypothesis)

=> Conditional probability

After calculating the posterior probability for a number of different hypotheses, you can select the hypothesis with the highest probability. This is the maximum probable hypothesis and may formally be called the maximum a posteriori (MAP) hypothesis.

# Naive Bayes classifiers
Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem. It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.

- Naive Bayes is a classification algorithm for binary (two-class) and multi-class classification problems. 
- The technique is easiest to understand when described using binary or categorical input values.

Now, with regards to our dataset, we can apply Bayes’ theorem in following way:

\begin{equation}
P(y|X) = \frac{P(X|y)P(y)}{P(X)}
\end{equation}

where, y is class variable and X is a dependent feature vector (of size n) where:

If we have a set of independent variables:
\begin{equation}
P(y|x_1,x_2,...,x_n) = \frac{P(x_1|y)P(x_2|y)...P(x_n|y)P(y)}{P(x_1)P(x_2)...P(x_n)}
\end{equation}

\begin{equation}
P(y|x_1,x_2,...,x_n) = \frac{P(y)\prod_{i=1}^nP(x_i|y)}{𝑃(x_1)𝑃(x_2)...𝑃(x_𝑛)}
\end{equation}

In [1]:
# load the iris dataset 
from sklearn.datasets import load_iris 
iris = load_iris() 
  
# store the feature matrix (X) and response vector (y) 
X = iris.data 
y = iris.target 
  
# splitting X and y into training and testing sets 
from sklearn.model_selection import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1) 
  
# training the model on training set 
from sklearn.naive_bayes import GaussianNB 
gnb = GaussianNB() 
gnb.fit(X_train, y_train) 
  
# making predictions on the testing set 
y_pred = gnb.predict(X_test) 
  
# comparing actual response values (y_test) with predicted response values (y_pred) 
from sklearn import metrics 
print("Gaussian Naive Bayes model accuracy(in %):", metrics.accuracy_score(y_test, y_pred)*100)

Gaussian Naive Bayes model accuracy(in %): 95.0
