## Bayes Theorem

\begin{theorem}[Bayes theorem]

We input features $X$, then compute $\mathbb{P}(y=i)$, $\mathbb{P}(X=x)$ and $\mathbb{P}(X=x\mid y=i)$, we will obtain the posterior.

$$\mathbb{P}(y=i\mid X=x)=\frac{\mathbb{P}(X=x\mid y=i)\mathbb{P}(y=i)}{\mathbb{P}(X=x)}$$
\end{theorem}

## Naive Bayes Classifier

Assume the data has $m$ features. According to the Bayes theorem, we need to compute the joint distribution of features and class. To do this, we have to compute $2^m$ combinations for those binary features.

Naive Bayes classifiers assume that given class label $Y$ the features are conditionally independent of each other:

$$P(X\mid y)=\prod_jP_j(X_j\mid y)$$


The full classification rule is:

\begin{align*}
\hat{y}&=\arg\max_i\mathbb{P}(y=i\mid X)\\
       &=\arg\max_i\frac{\mathbb{P}(X=x\mid y=i)\mathbb{P}(Y=i)}{\mathbb{P}(X=x)}\\
       &=\arg\max_i\prod_j\mathbb{P}_j(X_j=x_j\mid y=i)\mathbb{P}(y=i)
\end{align*}

And we can estimate $\mathbb{P}(y=i)$ by $\frac{n_i}{n}$ (i.e. $y\sim Multinomial(p_1,p_2,\dots,p_m)$), estimate $\mathbb{P}_j(X_j=x_j\mid y=i)$ by $\frac{n_{ij}}{n_i}$ - only compute $m$ probabilities for the joint probability! 

If the assumption holds, NB is optimal classifier.

In [1]:
from __future__ import division
import numpy as np
x=np.array([[1,1,1],[0,0,0],[1,0,0],[0,0,1],[1,1,0],[1,0,1]])
y=np.array([1,0,1,1,0,1])

def Naive_Bayes(f):
    p_1=y.sum()/len(y)
    p_0=1-p_1
    p=np.zeros(len(f))
    p_c=np.zeros(len(f))
    q_c=np.zeros(len(f))
    for i in range(len(f)):
        if f[i]==0:
            p_c[i]=1-x[np.where(y==1)][:,i].sum()/len(x[np.where(y==1)])
            q_c[i]=1-x[np.where(y==0)][:,i].sum()/len(x[np.where(y==0)])
        else:
            p_c[i]=x[np.where(y==1)][:,i].sum()/len(x[np.where(y==1)])
            q_c[i]=1-x[np.where(y==0)][:,i].sum()/len(x[np.where(y==0)])
    posterior_1=p_c.prod()*p_1
    posterior_2=q_c.prod()*p_0
    if (posterior_1>posterior_2):
        return 1
    else:
        return 0
def validation(sample,output):
    count=0
    for i in range(len(sample)):
        if (Naive_Bayes(sample[i])==output[i]):
            count=count+1
    print(count/len(output))
x_test=np.array([[0,1,0],[0,1,1]])
y_test=np.array([0,0])
validation(x_test,y_test)

1.0


## Gaussian Bayes Classifier

\begin{theorem}[Continuous version of Bayes theorem]

$$\mathbb{P}(Y=y_k\mid X_1,X_2,\cdots,X_n)=\frac{\mathbb{P}(Y=y_k)\prod_{i=1}^n\mathbb{P}(X_i\mid Y=y_k)}{\sum_j\mathbb{P}(Y=y_j)\prod_{i=1}^n\mathbb{P}(X_i\mid Y=y_j)}$$

Where $\mathbb{P}(X_i\mid Y=y_k)$ follows a Gaussian distribution.
\end{theorem}

Normal Distribution: $$f(x;\mu,\Sigma)=\frac{1}{(2\pi)^\frac{p}{2}\left\vert\Sigma\right\vert^\frac{1}{2}}e^{-(x-\mu)^T\Sigma^{-1}(x-\mu)}$$

In [2]:
from __future__ import division
import numpy as np

# Compute the multinormal distribution pdf with expectation mu and covariance sigma
def Gaussian_function(x,mu,sigma):
    return 1/((np.sqrt(2*np.pi))**len(mu)*(np.linalg.det(sigma))**0.5)*\
    np.exp(np.dot(np.dot(-(x-mu),np.linalg.inv(sigma)),(x-mu))/2)
# Parametric inference of mu and sigma (use MLE)
def parametric_estimation(x,y):
# The shape of mu,sigma is the number of classes by the number of features
    mu_trans=np.zeros((len(np.unique(y)),x.shape[1]))
    sigma_trans=np.zeros((len(np.unique(y)),x.shape[1],x.shape[1]))
    prior=np.zeros(len(np.unique(y)))
    for i in range(len(np.unique(y))):
# For every class, compute the mean and cov (about features)
        mu_trans[i]=np.mean(x[np.where(y==i)],axis=0)
        sigma_trans[i]=(np.cov(x[np.where(y==i)].T))
        prior[i]=len(np.where(y==i))/len(y)
    return (mu_trans,sigma_trans,prior)
def Gaussian_NB(test_x,x,y):
    mu,sigma,prior=parametric_estimation(x,y)
    posterior=np.zeros(len(mu))
    for i in range(len(mu)):
        posterior[i]=Gaussian_function(test_x,mu[i],sigma[i])*prior[i]
    return np.argmax(posterior)

def test(sample,output,N):
# Shuffle the population
    randomize = np.arange(len(sample))
    np.random.shuffle(randomize)
    sample = sample[randomize]
    output = output[randomize]
# Divide the training sample and testing sample
    x=sample[:N]
    y=output[:N]
    x_1=sample[N:]
    y_1=output[N:]
    count=0
    for i in range(len(x_1)):
        if (Gaussian_NB(x_1[i],x,y)==y_1[i]):
            count=count+1
    print(count/len(x_1))

In [3]:
from sklearn.datasets import load_iris
data=load_iris().data
target=load_iris().target
test(data,target,100)

0.96


In [4]:
from sklearn.datasets import load_breast_cancer
data=load_breast_cancer().data
target=load_breast_cancer().target
test(data,target,320)

0.947791164659
