# Naive Bayes

One of the **simplest** yet effective algorithm that should be tried to solve the classification problem is Naive Bayes Algorithm. It’s a probabilistic modell which is based on the Bayes’ theorem which is an equation describing the **relationship of conditional probabilities of statistical quantities**.

The Naive Bayes algorithm has **hardly any hyperparameters** and is recommended to use first in the event of classification problems. If this does not give satisfactory results, then more complex algorithms should be used.

## Conditional Probability

The probability of $A$ **given we already know** that $B$ has occured, is defined as

$$
P(A|B) = \frac{P(A \cap B)}{P(B)}
$$

Only the _portion_ of $A$ that is contained in $B$ could occur, hence the original probability of $A \cap B$ must be recalculated (or **scaled**) to reflect the fact that the _new_ sample space is $B$.

In a slightly redundandt way, the conditional probability can also be written as

$$
P(A|B) = P(A \cap B\,|B)
$$

which makes it easier to see how we calculate the probability for an event $A$: Writing $P(A \cap B\,|B)$ makes it obvious that we want the probability that an event $A \cap B$ happens, **scaled by the knowledge we already have** about the event $B$:

$$
P({\color{orange}{A \cap B}}\,|{\color{purple}B}) = \frac{P({\color{orange}{A \cap B}})}{P({\color{purple}B})}
$$

## Bayes Theorem

Bayes theorem describes the probability of an event, based on **prior knowledge** that might be related to the event. For example, if the risk of health problems is known to increase with age, Bayes theorem allows the risk to an individual of a known age to be assessed more accurately than simply assuming that the individual is typical of the population as a whole.

$$
\begin{aligned}
P(A|B) &= \frac{P(B|A) \cdot P(A)}{P(B)} \\[10pt]
\text{Posterior} &= \, \frac{\text{Likelihood} \cdot \text{Prior}}{\text{Evidence}}
\end{aligned}
$$

with

* the conditional probability $P(A|B)$ of event $A$ occurring given that $B$ is true. This is also called **posterior probability**.
* the conditional probability $P(B|A)$ of event $B$ occurring given that $A$ is true. This is also called the **likelyhood**.
* the probability $P(A)$. This is also called the **prior probability**.
* the probability $P(B)$. This is also called the **evidence** which **normalizes** our probabilities.

See also [Bayes' Theorem with Lego](https://www.countbayesie.com/blog/2015/2/18/bayes-theorem-with-lego).

### Alternative Form

Another form of Bayes theorem for **two competing statements** or hypotheses is

$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B|A) \cdot P(A) + P(B|\neg A) \cdot P(\neg A)}
$$

For proposition $A$ and evidence or background $B$,

* $P(A)$ is the prior probability, the initial degree of belief in $A$.
* $P(\neg A)$ is the corresponding initial degree of belief in not $A$, that $A$ is false, where $P(\neg A) = 1 - P(A)$
* $P(B|A)$ is the conditional probability or likelihood, the degree of belief in $B$ given that proposition $A$ is true.
* $P(B|\neg A)$ is the conditional probability or likelihood, the degree of belief in $B$ given that proposition $A$ is false.
* $P(A|B)$ is the posterior probability, the probability of $A$ after taking into account $B$.

### Example

Given a medical test having a **99% accuracy** (for true positives and true negatives). Already knowing that **1 out of 10000 people are sick**, what is the probability of being sick?

#### Solution

![false-positives](images/bayes-theorem.svg)

What we knew before we knew the test is positive, is the prior probability $P(sick) = 0.0001$ and $P(healthy) = 0.9999$. As only the **positive tests** actually occured, we scale the likelyhood and the prior with them.

The **posterior probability**, what we infered after we knew that the test is positive, is:

$$
\begin{align}
P(sick|positive) &= \frac{P(sick) \cdot P(positive|sick)}{P(sick) \cdot P(positive|sick) + P(healthy) \cdot P(positive|healthy)} \\[10pt]
&= \frac{0.0001 \cdot 0.99}{0.0001 \cdot 0.99 + 0.9999 \cdot 0.01} \\[10pt]
&= 0.0098 \approx 1 \%
\end{align}
$$



## Naive Bayes