# Lesson 4: Implementing the Naive Bayes Classifier from Scratch in Python

## Introduction
Welcome to our exploration tour of the Naive Bayes Classifier! This robust classification algorithm is renowned for its simplicity and effectiveness. We will implement it from scratch in Python, allowing you to leverage its sheer power without the need for any prebuilt libraries. Let's get started!

## Recall
Let's do a quick recall of probability theory.

- **P(A)** usually denotes the likelihood of a certain event A occurring.
- **P(A|B)**, on the other hand, indicates the probability of event A taking place, assuming event B has already happened.

For instance, let's imagine there's a bag housing three marbles - one red and two blue. Denote A as the event where a red marble is picked, and B when a blue one is drawn. The probability of A, **P(A)**, is 1/3 in this case.

Now, let's consider a scenario where a blue marble has already been drawn from the bag. This leaves us with one red and one blue marble in the bag. The probability of drawing a red marble (event A), given that a blue marble has already been extracted (event B), is denoted by **P(A|B)**. In this case, **P(A|B)** would be 1/2, highlighting a higher likelihood of drawing a red marble following the initial removal of a blue one.

## The Principle of Naive Bayes
The Naive Bayes algorithms rely on Bayes' theorem. Let's recall it quickly. This theorem calculates the probability of an event based on prior knowledge of potentially related events. It is represented mathematically as:

\[
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
\]

Where:

- **P(A|B)** is the posterior probability of class **A** given predictor **B**. It's what we are trying to calculate.
- **P(B|A)** is the likelihood, which is the probability of the predictor given a class.
- **P(B)** is the marginal probability of the predictor.
- **P(A)** is the prior probability of the class.

This formula forms the backbone of the Naive Bayes classifier.

The term 'naive' refers to the assumption that all variables in a dataset are independent of each other, which may not always be the case in real-life data. Nonetheless, it still offers robust performance and can be easily implemented.

## Deriving the Naive Bayes Classifier Algorithm
In the context of machine learning, the Naive Bayes Classifier uses the Bayes theorem to compute the posterior probability of a class given a set of features and then classifies the outcome based on the highest posterior probability.

Assuming a binary class variable **Y** (binary means it can be equal to either 0 or 1) and features **X<sub>1</sub>, X<sub>2</sub>, ..., X<sub>n</sub>**, our task is to compute the posterior probability **P(Y=1|X<sub>1</sub>=x<sub>1</sub>, X<sub>2</sub>=x<sub>2</sub>, ..., X<sub>n</sub>=x<sub>n</sub>)**. By shedding the denominator from Bayes' theorem (since it doesn't depend on **Y** and is constant for all classes), we are left with the task of maximizing the probability of **Y** and **X** happening together **P(Y,X) = P(X|Y) \cdot P(Y)**, which forms the basis for Naive Bayes classification.

## Implementing Naive Bayes Classifier
We approach the implementation of the Naive Bayes Classifier by first calculating the prior probabilities of each class, and then the likelihood of each feature given a class:

```python
def calculate_prior_probabilities(y):
    return y.value_counts(normalize=True)  # calculates the proportion of each class in the data

def calculate_likelihoods(X, y):
    likelihoods = {}
    for class_ in y.unique():
        for column in X.columns:
            likelihoods[column + "|" + str(class_)] = X[y == class_][column].value_counts(normalize=True)
    return likelihoods  # returns a dict with likelihood of each class given a feature
```

Armed with these utility functions, we can implement the Naive Bayes Classifier function:

```python
from collections import defaultdict

def naive_bayes_classifier(X_test, priors, likelihoods):
    class_probabilities = defaultdict(float)

    for index, data_point in X_test.iterrows():
        for class_ in priors.index:
            class_likelihood = 1
            for feature in X_test.columns:
                class_likelihood *= likelihoods[feature + "|" + str(class_)].get(data_point[feature], 0)

            class_probabilities[class_] += priors[class_] * class_likelihood

    return max(class_probabilities, key=class_probabilities.get)
```

## Understanding and Handling Data Issues in Naive Bayes
A recurring challenge in Naive Bayes is the handling of zero probabilities, i.e., when a category does not appear in the training data for a given class, resulting in a zero probability for that category. A known fix for this problem is applying Laplace or Add-1 smoothing, which adds a '1' to each category count to circumvent zero probabilities.

You can integrate Laplace smoothing into the `calculate_likelihoods` function as follows:

```python
def calculate_likelihoods_with_smoothing(X, y):
    likelihoods = {}
    for class_ in y.unique():
        for column in X.columns:
            likelihoods[column + "|" + str(class_)] = (X[y == class_][column].value_counts() + 1) / (X[y == class_][column].count() + len(X[column].unique()))
    return likelihoods  # returns a dict with likelihood of each class given a feature
```

The numerator is increased by 1 and the denominator by the count of unique categories to accommodate the added 1's.

## Using Naive Bayes Classifier
Here is a short example of predicting weather with our classifier:

```python
import pandas as pd

data = {
    'Temperature': ['Hot', 'Hot', 'Cold', 'Hot', 'Cold', 'Cold', 'Hot'],
    'Humidity': ['High', 'High', 'Normal', 'Normal', 'High', 'Normal', 'Normal'],
    'Weather': ['Sunny', 'Sunny', 'Snowy', 'Rainy', 'Snowy', 'Snowy', 'Sunny']
}
df = pd.DataFrame(data)

# Split features and labels
X = df[['Temperature', 'Humidity']]
y = df['Weather']

# Calculate prior probabilities
priors = calculate_prior_probabilities(y)

# Calculate likelihoods with smoothing
likelihoods = calculate_likelihoods_with_smoothing(X, y)

# New observation
X_test = pd.DataFrame([{'Temperature': 'Hot', 'Humidity': 'Normal'}])

# Make prediction
prediction = naive_bayes_classifier(X_test, priors, likelihoods)
print("Predicted Weather: ", prediction)  # Output: Predicted Weather:  Sunny
```

The Naive Bayes Classifier predicts a class label based on the observed features. Owing to its simplicity, power, and speed, this classifier lends itself to challenging scenarios, including text classification, spam detection, and sentiment analysis.

## Lesson Summary and Practice
Superb work! You've mastered the essentials of the Naive Bayes Classifier, from understanding its theory to crafting a Naive Bayes Classifier from scratch. The next phase is practice, which will consolidate your newly acquired skills. Enjoy the hands-on exercises lined up next. Delve deeper into your machine learning journey with the forthcoming lessons!


## Predicting the Weather with Naive Bayes

## Forecast Predictor: Calculating Prior Probabilities

## Calculating Normalized Values in Weather Data Analysis

## Laplace Smoothing in the Naive Bayes Classifier

## Predict the Play Day with Naive Bayes