# Gaussian Naive Bayes Classifier from Scratch
***
## Table of Contents
***

## 1. Introduction
Naive Bayes classifiers are probabilistic classification models based on Bayes' Theorem, assuming conditional independence between features given the class labels or values. Naive Bayes is a general framework; the specific variant should be chosen based on the nature of your data:

- **Categorical Naive Bayes**

    - **Features**: Categorical labels (e.g., colours, countries, product types).

    - **Use Case**: Classification with discrete, categorically distributed features.

- **Multinomial Naive Bayes**

    - **Features**: Counts or frequencies (e.g., word occurrences, event counts).

    - **Use** **Case**: Text classification, document classification, or any scenario where features are discrete counts.

- **Gaussian Naive Bayes**

    - **Features**: Continuous data (e.g., measurements, sensor readings).

    - **Use Case**: Classification with numerical features assumed to follow a Gaussian distribution.

- **Bernoulli Naive Bayes**

    - **Features**: Binary features (e.g., True/False, 0/1).

    - **Use Case**: Text classification (presence/absence of words), binary feature spaces.



### Bayes' Theorem
Bayes' theorem describes the probability of a class $C_{i}$ given a set of features $X = (x_{1}, x_{2},\ldots,x_{N})$:

\begin{align*}
P(C_{i}|X) = \dfrac{P(X|C_{i}) \cdot P(C_{i})}{P(X)}
\end{align*}

where:
- $P(C_{i}|X)$: Posterior probability of class $C_{i}$ given features $X$.
- $P(X|C_{i})$: Likelihood of features $X$ given class $C_{i}$.
- $P(C_{i})$: Prior probability of class $C_{i}$.
- $P(X)$: Evidence (normalising constant, same for all classes)

Gaussian Naive Bayes assumes features $X = (x_{1}, x_{2},\ldots,x_{N})$ are conditionally independent given the class $C_{i}$ and features follow a Gaussian (normal) distribution within each class. Therefore, the likelihood is expressed as:

\begin{align*}
P(x_j|C_i) = \frac{1}{\sqrt{2\pi\sigma_{ij}^2}} \exp\left(-\frac{(x_j - \mu_{ij})^2}{2\sigma_{ij}^2}\right)
\end{align*}

where:

$\mu_{ij}$ = Mean of feature $x_j$ in class $C_i$

$\sigma_{ij}^2$ = Variance of feature $x_j$ in class $C_i$


Replacing $P(X|C_{i})$ in Bayes' theorem, the equation becomes:

\begin{align*}
P(C_{i}|X) = \dfrac{P(C_{i}) \cdot \prod_{j=1}^{N} \frac{1}{\sqrt{2\pi\sigma_{ij}^2}} \exp\left(-\frac{(x_j - \mu_{ij})^2}{2\sigma_{ij}^2}\right)}{P(X)}
\end{align*}

Since $P(X)$ is constant for all classes,

\begin{align*}
P(C_{i}|X) \propto P(C_{i}) \cdot \prod_{j=1}^{N} \frac{1}{\sqrt{2\pi\sigma_{ij}^2}} \exp\left(-\frac{(x_j - \mu_{ij})^2}{2\sigma_{ij}^2}\right)
\end{align*}

The symbol $\propto$ denotes proportionality, meaning we ignore the denominator $P(X)$ when comparing probabilities across classes.