### Probabilistic model
Naive Bayes models the conditional probability of classes $C_k$, given an instance represented by a feature vector $x=(x_1, \dots, x_n)$, as 
\begin{align}
p(C_k \mid x) = \frac{p(x \mid C_k) p(C_k)}{p(x)}.
\end{align}
The most important feature of the Naive Bayes model is that it assumes that all features are mutually independent conditional on the category $C_k$, e.g., 
\begin{align}
p(x_i \mid x_1, \dots, x_{i-1}, x_{i+1}, \dots, x_n, C_k) = p(x_i \mid C_k)
\end{align}

### Naive Bayes classifier
The Naive Bayes classifier is based on the MAP (maximum a posteriori) estimate, i.e., given a feature vector $x$, we predict it being of the class
\begin{align}
\hat{y} = \text{argmax}_{k \in [K]} p(C_k) \prod_{i=1}^n p(x_i \mid C_k),
\end{align}
or equivalently (for computational reasons)
\begin{align}
\hat{y} = \text{argmax}_{k \in [K]} \left[ \log \left( p(C_k) \right) + \sum_{i=1}^n \log \left( p(x_i \mid C_k \right) \right].
\end{align}

#### Modeling  the conditional probabilities
One can choose any model for the conditional probabilities $p(x_i \mid C_k)$, e.g., Gaussian, Bernoulli, Multinomial, etc. Note that the Naive Bayes classifier can easily handle mixtures of categorical and real-valued features.

In [12]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [13]:
class naive_bayes:
    def __init__(self, class_prior, features, n_classes):
        self.class_prior = class_prior
        self.features = features
        self.n_classes = n_classes
        
        _init_weights()
        
    def _init_weights(self):
        self.weights = []
        
        for i, feature in enumerate(self.features):
            if feature == 'gaussian':
                self.weights.append((self._random_normal(), self._random_normal()))
            
            elif feature == 'bernoulli':
                self.weights.append((self._random_uniform()))
     
    @staticmethod
    def _random_normal(loc_in=0.0, scale_in=1.0, size_in=None):
        if size_in is None:
            size_in = self.n_classes
        
        return np.random.normal(loc=loc_in, scale=scale_in, size=size_in)
    
    @staticmethod
    def _random_uniform(low_in=0.0, high_in=1.0, size_in=None):
        if size_in is None:
            size_in = self.n_classes
        
        return np.random.uniform(low=low_in, high=high_in, size=size_in)
        
    def _log_probability(self, feature, class_index, feature_index):
        if self.features[feature_index] == 'gaussian':
            return - (1/2)*np.log(2*np.pi*self.weights[feature_index][1]**2) \
                   - (feature-self.weights[feature_index][0])**2/(2*self.weights[feature_index][1]**2)
        
        elif self.features[feature_index] == 'bernoulli':
            if feature == 0:
                return np.log(1-self.weights[feature_index])
            else:
                return np.log(self.weights[feature_index])
        
    def prediction(self, X):
        for x in X:
            predictions = []
            for i in range(self.n_classes):
                sum_of_logs = 0.0
                for j, _ in enumerate(self.features):
                    sum_of_logs += self._log_probability(x[j], i, j)