<h2>Naive Bayes</h2>
<p>The main idea of Naive Bayes Algorithm is that each data point is mutually independent of other data points</p>

<h3>Bayes Theorem</h3>
<p>The probability of <i>A</i> given <i>B</i> is the product of the probability of B given A and the probability of A divided by the probability of B </p>
\begin{align}
P(A|B) = \frac{P(B|A) \cdot P(A)} {P(B)}
\end{align}

<p>which can be written as: </p>

\begin{align}
P(y|X) = \frac{P(X|y) \cdot P(y)} {P(X)}
\end{align}

<p>Where X is a vector consisting of serveral features</p>

\begin{align}
X = (x_{1}, x_{2}, x_{3}, ..., x_{n})
\end{align}

<p>So, Assuming all features are mutually independent </p>

\begin{align}
P(y|X) = \frac{P(x_{1}|y) \cdot P(x_{2}|y) \cdot ...\cdot P(x_{n}|y) \cdot P(y)} {P(X)}
\end{align}


<p>We want to select the class with highest probability</p>
\begin{align}
y = argmax_{y} P(y|X) = argmax_{y} \frac{P(x_{1}|y) \cdot P(x_{2}|y) \cdot ... \cdot P(x_{n}|y) \cdot P(y)} {P(X)}
\end{align}

<p>but since we are not interested in P(X), we can write it as: </p>

\begin{align}
y = argmax_{y} P(x_{1}|y) \cdot P(x_{2}|y) \cdot ... \cdot P(x_{n}|y) \cdot P(y)
\end{align}

<p>Note: these products will be much less than 1, so we can apply the log function to get nicer number sto work with.</p>
<p>applying the log function becomes:</p>
\begin{align}\newline\end{align}
\begin{align}
y = argmax_{y} \log(P(x_{1}|y)) + \log(P(x_{2}|y)) + ... + \log(P(x_{n}|y)) + \log(P(y))
\end{align}



<p>From that equation, we see that we also need to compute P(y) ( also called frequency )</p>
<p> To calculate each of P(x_i|y) (class conditional probability) we'll use: (which is the gaussian distribution)</p>
\begin{align}
P(x_{i}|y) = \frac{1}{\sqrt{2\pi\sigma_{y}^{2}}} \cdot exp(-\frac{(x_{i}-\mu_{y})^{2}}{2\sigma_{y}^{2}})
\end{align}

<img src="../images/gaussian.png">


<h3>Code the algorithm</h3>


In [26]:
import numpy as np

class NaiveBayes:
    def fit(self,X,y):
        #X is a numpy array
        n_samples,n_features = X.shape #number of rows is number of samples and number of columns of number of features
        self._classes = np.unique(y) #list of unique entries is the classes/labels
        n_classes = len(self._classes) #number of classes/labels
        
        #initiate parameters
        self._mean = np.zeros((n_classes,n_features),dtype=np.float64)
        self._var = np.zeros((n_classes,n_features),dtype=np.float64)
        self._priors = np.zeros(n_classes,dtype=np.float64)
        
        #calculate the parameters
        for c in self._classes:
            X_c = X[c==y] 
            self._mean[c,:] = X_c.mean(axis=0)
            self._var[c,:] = X_c.var(axis=0)
            self._priors[c] = X_c.shape[0]/float(n_samples)
            
    
    def predict(self,X):
        #create list of predictions
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)
    
    def _predict(self,x):
        #only gets one sample
        posteriors = []
        for idx,c in enumerate(self._classes):
            prior = np.log(self._priors[idx])
            class_conditional = np.sum(np.log(self._pdf(idx,x)))
            posterior = prior + class_conditional
            posteriors.append(posterior)
        return self._classes[np.argmax(posteriors)]
            
    def _pdf(self,class_idx,x):
        #probability density function
        
        #helper method to calculate conditional probability
        mean = self._mean[class_idx]
        var = self._var[class_idx]
        numerator = np.exp(-(x-mean)**2/(2*var))
        denominator = np.sqrt(2*np.pi*var)
        return numerator/denominator
    

<p>Watch the algorithm in action</p>

In [27]:
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt

#accuracy
def accuracy(y_true,y_pred):
    accuracy = np.sum(y_true == y_pred)/len(y_true)
    return accuracy
X,y = datasets.make_classification(n_samples=1000, n_features=10,n_classes=2,random_state=123)
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2,random_state=123)

nb = NaiveBayes()
nb.fit(X_train,y_train)
predictions = nb.predict(X_test)

print('accuracy',accuracy(y_test,predictions))

accuracy 0.965
