###  Naive Bayes Implementation
In Naive Bayes Classification method, we use Bayes Theorem and make a naive assumption to make mathmatical computation easier. This is where the name comes from.

#### Bayes Theorem
$$P(A|B) = \frac{P(B|A).P(A)}{P(X)}$$

#### Bayes Theorem in case of classification
$$(y|X) = \frac{P(X|y).P(y)}{P(X)}$$
where,<br/>
y: class <br/>
X: n-dimensional Feature Vector with $X = (x_1, x_2, x_3,..., x_n)$

#### Assumption
We assume that all features are independent($P(A \cap B) = P(A).P(B)$).

Now we have,
$$P(y|X) = \frac{(x_1|y).P(x_2|y).P(x_3|y).,...,.P(x_n|y).P(y)}{P(X}$$
where, <br/>
$P(y)$: Prior Probability of class <br/>
$P(x_i|y)$: class conditional Prabability of feature $i$

$P(X)$ works only as a normalization factor, so it can be ignored.

Then,
$$P(y|X) = (x_1|y).P(x_2|y).P(x_3|y).,...,.P(x_n|y).P(y)$$

Now, we select the classer with highest probability given the features.

$$y = argmax_y \left( P(x_1|y).P(x_2|y).P(x_3|y).,...,.P(x_n|y).P(y) \right)$$

For mathmatical covenience, we use $log$,

$$y = argmax_y \left( log(P(x_1|y)) + log(P(x_2|y)) + log(P(x_3|y)),...,log(P(x_n|y)) + log(P(y)) \right)$$

where. $P(x_i|y)$ is Gaussian Distribution:

$$P(x_i|y) = \frac{1}{\sqrt{2 \pi \sigma_y^2}}.exp \left( - \frac{(x_i - \mu_y)^2}{2\sigma_y^2} \right)$$



In [16]:
import numpy as np

class NaiveBayes:
    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.classes = np.unique(y)
        n_classes = len(self.classes)
        # initialize mean, var, priors
        self.mean_ = np.zeros((n_classes, n_features), dtype=np.float64)
        self.var_ = np.zeros((n_classes, n_features), dtype=np.float64)
        self.priors_ = np.zeros(n_classes, dtype=np.float64)
        
        for c in self.classes:
            X_c = X[c==y] # samples that has the label 'c'
            self.mean_[c,:] = X_c.mean(axis=0)
            self.var_[c,:] = X_c.var(axis=0)
            self.priors_[c] = X_c.shape[0] / float(n_samples)
            
    def predict(self, X):
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)

    def _predict(self, x):
        posteriors = []

        # calculate posterior probability for each class
        for idx, c in enumerate(self.classes):
            prior = np.log(self.priors_[idx])
            posterior = np.sum(np.log(self._pdf(idx, x)))
            posterior = prior + posterior
            posteriors.append(posterior)

        # return class with highest posterior probability
        return self.classes[np.argmax(posteriors)]

    def _pdf(self, class_idx, x):
        mean = self.mean_[class_idx]
        var = self.var_[class_idx]
        numerator = np.exp(-((x - mean) ** 2) / (2 * var))
        denominator = np.sqrt(2 * np.pi * var)
        return numerator / denominator
        

In [21]:
from sklearn.model_selection import train_test_split
from sklearn import datasets

def accuracy(y_true, y_pred):
    accuracy = np.sum(y_true == y_pred) / len(y_true)
    return accuracy

X, y = datasets.make_classification(
    n_samples=1000, n_features=10, n_classes=2, random_state=123
)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=123
)

nb = NaiveBayes()
nb.fit(X_train, y_train)
predictions = nb.predict(X_test)

print("Naive Bayes classification accuracy", accuracy(y_test, predictions))



Naive Bayes classification accuracy 0.965
