## 1. Algorithm
### Naive Bayes Classifier
$$\begin{eqnarray}
    c & = & \arg\max_c p(c | \mathbf{x}) & (1) \\
      & = & \arg\max_c \frac{p(\mathbf{x} | c) p(c)}{p(\mathbf{x})} ~~~& (2)\\
      & = & \arg\max_c p(\mathbf{x} | c) p(c) & (3)\\
\end{eqnarray}$$
Explain:
- (1): We choose the class with have highest probability.
- (2): Using Bayes' theorem.
- (3): Simplify the formular by removing $p(\mathbf{x})$ because it is not related to $c$

### Multinomial Naive Bayes
$$\lambda_{ci} = p(x_i | c) = \frac{N_{ci}}{N_c}$$
Note:
- To avoid numerator equal to 0, we use Laplace smoothing.
- Laplace smoothing: $\hat{\lambda}_{ci} = \frac{N_{ci} + \alpha}{N_{c} + d\alpha}$
    - $\alpha$ is an integer (ussually equal to 1).
    - $d$ is the number of unique words. 

### Bernoulli Naive Bayes
$$p(x_i | c) = p(i | c)^{x_i} (1 - p(i | c)) ^{1 - x_i}$$

## 2.Code
Multinomial Naive Bayes:
- Step 1: Calculate the probabilities of each class in y_train $p(c)$.
- Step 2: Calculate the probabilities of each element in each class $p(x_i | c)$.
- Step 3: Use Naive Bayes formula to predict labels of new dataset.

In [99]:
import numpy as np


class NaiveBayesClassifier(object):
    def __init__(self):
        self.X_train = np.array([])
        self.y_train = np.array([])
        self.X_predict = np.array([])
        self.y_predict = np.array([])
        self.classes = np.array([])
        self.count = np.array([])
        self.lambda_array = np.array([])
        self.probability = np.array([])

    def fit(self, X_train, y_train):
        self.X_train = X_train
        self.y_train = y_train

    def separate_classes(self):
        self.classes, self.count = np.unique(self.y_train, return_counts=True)
        num_classes = self.classes.shape[0]
        d = self.X_train.shape[1]
        self.lambda_array = np.zeros((num_classes, d))
        for i in range(num_classes):
            table = self.X_train[np.where(self.y_train == self.classes[i])]
            total = np.sum(table, axis=0)
            self.lambda_array[i] = (total+1)/(total.sum()+d)
    def test(self):
            self.X_train[self.X_train >= 1] = 1
            self.classes, self.count = np.unique(self.y_train, return_counts=True)
            num_classes = self.classes.shape[0]
            d = self.X_train.shape[1]
            appear = np.zeros(self.X_train.shape)
            appear[self.X_train >= 1] = 1
            self.lambda_array = np.zeros((num_classes, d))
            for i in range(num_classes):
                table = self.X_train[np.where(self.y_train == self.classes[i])]
                total = np.sum(table, axis=0)
                self.lambda_array[i] = (total+1)/(table.shape[0]+2)
            
    def predict_test(self, X_predict):
        self.X_predict = X_predict
        self.X_predict[self.X_predict >= 1] = 1
        self.test()
        num_classes = self.classes.shape[0]
        N = self.X_predict.shape[0]
        d = self.X_predict.shape[1]
        self.probability = np.zeros((N, num_classes))
        for i in range(N):
            for j in range(num_classes):
                power = self.lambda_array[j]*self.X_predict[i]+(1-self.lambda_array[j])*(1-self.X_predict[i])
                self.probability[i][j] = np.prod(power)
        self.probability = np.multiply(self.probability, (self.count/self.count.sum(axis=0)))
        self.probability = np.divide(self.probability, self.probability.sum(axis=1).reshape(-1, 1))
        return self.probability
    def predict(self, X_predict):
        self.X_predict = X_predict
        self.separate_classes()
        num_classes = self.classes.shape[0]
        N = self.X_predict.shape[0]
        d = self.X_predict.shape[1]
        self.probability = np.zeros((N, num_classes))
        for i in range(N):
            for j in range(num_classes):
                power = np.power(self.lambda_array[j],self.X_predict[i])
                self.probability[i][j] = np.prod(power)
        self.probability = np.multiply(self.probability, (self.count/self.count.sum(axis=0)))
        self.probability = np.divide(self.probability, self.probability.sum(axis=1).reshape(-1, 1))
        self.y_predict = self.classes[np.argmax(self.probability, axis=1)]
        return self.y_predict

In [100]:
# Create data
d1 = [2, 1, 1, 0, 0, 0, 0, 0, 0]
d2 = [1, 1, 0, 1, 1, 0, 0, 0, 0]
d3 = [0, 1, 0, 0, 1, 1, 0, 0, 0]
d4 = [0, 1, 0, 0, 0, 0, 1, 1, 1]

train_data = np.array([d1, d2, d3, d4])
label = np.array(['B', 'B', 'B', 'N']) 

In [101]:
model = NaiveBayesClassifier()
model.fit(train_data,label)

In [102]:
# test data
d5 = np.array([[2, 0, 0, 1, 0, 0, 0, 1, 0]])
d6 = np.array([[0, 1, 0, 0, 0, 0, 0, 1, 1]])

In [103]:
model.predict(d6)

array(['N'], dtype='<U1')

In [104]:
model.probability

array([[0.29175335, 0.70824665]])

In [105]:
model.X_train

array([[2, 1, 1, 0, 0, 0, 0, 0, 0],
       [1, 1, 0, 1, 1, 0, 0, 0, 0],
       [0, 1, 0, 0, 1, 1, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 1, 1, 1]])