# Categorical Naive Bayes Classifier from Scratch
***
## Table of Contents
1. [Introduction](#1-introduction)
    - [Bayes' Theorem](#bayes-theorem)
2. [Loading Data](#2-loading-data)
3. [Prior Probability](#3-prior-probability)
4. [Likelihood](#4-likelihood)
5. [Posterior Probability](#5-posterior-probability)
6. [Prediction](#6-prediction)
7. [Evaluation Metrics](#7-evaluation-metrics)
    - [Binary Confusion Matrix](#binary-confusion-matrix)
    - [Multi-Class Confusion Matrix](#multi-class-confusion-matrix)
    - [Accuracy](#accuracy)
    - [Precision](#precision)
    - [Recall](#recall)
    - [F1-Score](#f1-score)
8. [Encapsulation](#8-encapsulation)
9. [Comparison with Scikit-Learn](#9-comparison-with-scikit-learn) 
***

In [1]:
import numpy as np
import pandas as pd
from typing import Tuple, List, Dict
from numpy.typing import NDArray

## 1. Introduction
Naive Bayes classifiers are probabilistic classification models based on Bayes' Theorem, assuming conditional independence between features given the class labels or values. Naive Bayes is a general framework; the specific variant should be chosen based on the nature of your data:

- **Multinomial Naïve Bayes**: Assumes features follow multinomial distributions; ideal when features are **discrete** values.

- **Gaussian Naïve Bayes**: Assumes features follow a Gaussian (normal) distribution; used for **continuous** features. Fits the model by calculating the mean and standard deviation for each class.

- **Bernoulli Naïve Bayes**: Works with **binary** features (e.g., True/False, 0/1).


### Bayes' Theorem
Bayes' theorem describes the probability of a class $C$ given a set of features $X = (x_{1}, x_{2},\ldots,x_{n})$:

\begin{align*}
P(C|X) = \dfrac{P(X|C) \cdot P(C)}{P(X)}
\end{align*}

where:
- $P(C|X)$: Posterior probability of class $C$ given features $X$.
- $P(X|C)$: Likelihood of features $X$ given class $C$.
- $P(C)$: Prior probability of class $C$.
- $P(X)$: Probability of features $X$ (acts as a normalising constant).

Naive Bayes assumes features $X = (x_{1}, x_{2},\ldots,x_{n})$ are conditionally independent given the class $C$, thus the likelihood is expressed as:
\begin{align*}
P(X|C) = P(x_{1}, x_{2}, \dots, x_{n}|C) = \prod_{i=1}^{n}P(x_{i}|C)
\end{align*}

Replacing $P(X|C)$ in Bayes' theorem, the equation becomes:

\begin{align*}
P(C|X) = \dfrac{P(C) \cdot \prod_{i=1}^{n} P(x_{i}|C)}{P(X)}
\end{align*}

Since $P(X)$ is constant for all classes,

\begin{align*}
P(C|X) \propto P(C) \cdot \prod_{i=1}^{n} P(x_{i}|C)
\end{align*}

The symbol $\propto$ denotes proportionality, meaning we ignore the denominator $P(X)$ when comparing probabilities across classes.

## 2. Loading Data

In [2]:
df = pd.read_csv('../_datasets/weather_forecast.csv')
df.head()

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play
0,Sunny,Hot,High,Weak,No
1,Sunny,Hot,High,Strong,No
2,Overcast,Hot,High,Weak,Yes
3,Rain,Mild,High,Weak,Yes
4,Rain,Cool,Normal,Weak,Yes


In [3]:
X = df.drop('Play', axis=1)
y = df['Play']

## 3. Prior Probability
Class $C$ (`y`) has only two discrete variables: `Yes` and `No`:

\begin{align*}
P(C=\text{'Yes'}) = \dfrac{\text{Count(Yes)}}{\text{Total Count}}
\end{align*}

\begin{align*}
P(C=\text{'No'}) = \dfrac{\text{Count(No)}}{\text{Total Count}}
\end{align*}


In [4]:
print(f'Total count: {len(df)}')
print(f'Counts": {y.value_counts().to_dict()}')

Total count: 14
Counts": {'Yes': 9, 'No': 5}


\begin{align*}
P(\text{'Yes'}) = \dfrac{9}{14} = 0.6429
\end{align*}

\begin{align*}
P(\text{'No'}) = \dfrac{5}{14} = 0.3571
\end{align*}

In [5]:
def calculate_priors(y: pd.Series) -> Dict[str, float]:
    """
    Calculate prior probabilities for each class in the target variable.

    Args:
        y (pd.Series): Target variable containing class labels (strings).

    Returns:
        Dict[str, float]: Prior probabilities for each class.
    """
    return y.value_counts(normalize=True).to_dict()

In [6]:
calculate_priors(y)

{'Yes': 0.6428571428571429, 'No': 0.35714285714285715}

## 4. Likelihood

The likelihood quantifies how well parameter $\theta$ explain the observed data. It is defined as:

\begin{align*}
\mathcal{L}(\theta|x) = f(x|\theta)
\end{align*}

where $f$ is the probability density/mass function.

For each feature value and class, we calculate:

\begin{align*}
P(\text{Feature = value|Class})
\end{align*}

For example:

\begin{align*}
P(\text{Outlook = 'Sunny'|Play = 'Yes'}) = \dfrac{\text{Count(Outlook = 'Sunny'|Play = 'Yes')} + \alpha}{\text{Count(Play = 'Yes)} + n \cdot \alpha}
\end{align*}

where $n$ is the number of features and $\alpha$ is the smoothing parameter to handle zero probabilities (**Laplace Smoothing**).

In [7]:
def calculate_likelihoods(X: pd.DataFrame, y: pd.Series,
                          alpha: float = 1.0) -> Dict[str, Dict[str, Dict[str, float]]]:
    """
    Calculate conditional probabilities for feature values given each class.

    Args:
        X (pd.DataFrame): Feature matrix (DataFrame with categorical columns)
        y (pd.Series): Target variable (Series of class labels)
        alpha (float): Smoothing parameter for Laplace smoothing (default=1.0)

    Returns:
        Nested dictionary with structure:
        {feature_name: {class_label: {feature_value: probability}}}
    """

    likelihoods = {}
    for feature in X.columns:  # For each column of X
        likelihoods[feature] = {}
        # Unique feature values in each column
        unique_features = X[feature].unique()

        for c in y.unique():  # Unique target values of y
            class_subset = X[y == c]
            total = len(class_subset)  # Count(C)

            # Count frequencies (e.g., {'Sunny':3, 'Rain':2} for class 'No')
            value_counts = class_subset[feature].value_counts()

            # All features values are included, even if missing in subset
            value_counts = value_counts.reindex(unique_features, fill_value=0)
            probas = round((value_counts + alpha) / \
                (total + len(value_counts) * alpha), 4)

            likelihoods[feature][c] = probas.to_dict()
    return likelihoods

In [8]:
calculate_likelihoods(X, y)

{'Outlook': {'No': {'Sunny': 0.5, 'Overcast': 0.125, 'Rain': 0.375},
  'Yes': {'Sunny': 0.25, 'Overcast': 0.4167, 'Rain': 0.3333}},
 'Temperature': {'No': {'Hot': 0.375, 'Mild': 0.375, 'Cool': 0.25},
  'Yes': {'Hot': 0.25, 'Mild': 0.4167, 'Cool': 0.3333}},
 'Humidity': {'No': {'High': 0.7143, 'Normal': 0.2857},
  'Yes': {'High': 0.3636, 'Normal': 0.6364}},
 'Windy': {'No': {'Weak': 0.4286, 'Strong': 0.5714},
  'Yes': {'Weak': 0.6364, 'Strong': 0.3636}}}

## 5. Posterior Probability
As we discussed [above](#1-introduction), the formula of posterior probability is:


\begin{align*}
P(C|X) \propto P(C) \prod_{i=1}^{n} P(x_{i}|C)
\end{align*}

To prevent underflow, we use log probabilities:

\begin{align*}
\text{log } P(C|X) = \text{log } P(C) + \sum_{i=1}^{n} \text{log } P(x_{i}|C)
\end{align*}

In the following code, `.get(category, 1e-9)` tries to retrieve the probability for the specific value category from this dictionary. If the category was not seen in the training data for this class (i.e., it's missing from the dictionary), it returns a very small default value (1e-9) instead of raising an error.

In [9]:
def calculate_posterior(x: Dict[str, str], priors: Dict[str, float], likelihoods: Dict[str, Dict[str, Dict[str, float]]],
                        X_columns: List[str], classes: List[str]) -> Dict[str, float]:
    """
    Calculate log-posterior probabilities for all classes given a sample.

    Args:
        x (Dict[str, str]): Input sample as dictionary {feature: value}.
        priors (Dict[str, float]): Prior probabilities from calculate_priors().
        likelihoods (Dict[str, Dict[str, Dict[str, float]]]): Conditional probabilities from calculate_likelihoods().
        X_columns (List[str]): List of feature names.
        classes (List[str]): List of possible class labels.

    Returns:
        Dict[str, float]: Dictionary mapping each class to its log-posterior probability.
    """

    log_posteriors = {}
    for c in classes:
        log_proba = np.log(priors[c])  # Log of prior
        for feature in X_columns:  # Sum of the likelihood for each x given c
            category = x[feature]
            # Avoid log(0) if the feature does not exist
            proba = likelihoods[feature][c].get(category, 1e-9)
            log_proba += np.log(proba)
        log_posteriors[c] = round(float(log_proba), 4)
    return log_posteriors  # log-posterior probabilities for all classes

In [10]:
calculate_posterior(X.iloc[0], calculate_priors(
    y), calculate_likelihoods(X, y), X.columns, y.unique())

{'No': -3.8873, 'Yes': -4.6781}

## 6. Prediction


In [11]:
def predict(X: pd.DataFrame, y: pd.Series) -> List[str]:
    """
    Predict class labels using Categorical Naive Bayes.

    Args:
        X (pd.DataFrame): Feature matrix.
        y (pd.Seris): Target variable.

    Returns:
        List[str]: Predicted class labels.
    """
    priors = calculate_priors(y)
    likelihoods = calculate_likelihoods(X, y)
    classes = y.unique()
    X_columns = X.columns

    predictions = []
    for row in X.itertuples(index=False):
        posterior = calculate_posterior(
            row._asdict(), priors, likelihoods, X_columns, classes)
        predictions.append(max(posterior, key=posterior.get))
    return predictions

In [12]:
predict(X, y)

['No',
 'No',
 'Yes',
 'Yes',
 'Yes',
 'Yes',
 'Yes',
 'No',
 'Yes',
 'Yes',
 'Yes',
 'Yes',
 'Yes',
 'No']

## 7. Evaluation Metrics
### Binary Confusion Matrix
In a confusion matrix, the terms True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) describe the classification performance for binary classification. 

|                     | Predicted Negative  | Predicted Positive  |
| ------------------- | ------------------- | ------------------- |
| **Actual Negative** | True Negative (TN)  | False Positive (FP) |
| **Actual Positive** | False Negative (FN) | True Positive (TP)  |


1. True Positive (TP): The number of instances correctly predicted as positive (e.g., a disease correctly identified).

2. True Negative (TN): The number of instances correctly predicted as negative (e.g., no disease correctly identified).

3. False Positive (FP): The number of instances incorrectly predicted as positive (e.g., predicting disease when there isn't any).

4. False Negative (FN): The number of instances incorrectly predicted as negative (e.g., missing a disease when it exists).

### Multi-Class Confusion Matrix
For multi-class classification, the concepts can be extended by treating one class as the "positive" class and all others as "negative" classes in a one-vs-all approach. Rows represent the actual classes (true labels), and columns represent the predicted classes. For a class $C$,
1. True Positive (TP): The count in the diagonal cell corresponding to class $C$ ($\text{matrix} [C][C]$).
2. False Positive (FP): The sum of the column for class $C$, excluding the diagonal ($\sum(\text{matrix} [:, C]) - \text{matrix} [C][C]$).
3. False Negative (FN): The sum of the row for class $C$, excluding the diagonal ($\sum(\text{matrix} [C, :]) - \text{matrix} [C][C]$).
4. True Negative (TN): All other cells not in the row or column for class $C$ ($\text{total} - (FP + FN + TP)$).

|                  | Predicted Class 0 | Predicted Class 1 | Predicted Class 2 |
| ---------------- | ----------------- | ----------------- | ----------------- |
| **True Class 0** | 5                 | 2                 | 0                 |
| **True Class 1** | 1                 | 6                 | 1                 |
| **True Class 2** | 0                 | 2                 | 7                 |


For Class 0:
- TP = 5 (diagonal element for Class 0)
- FP = 1 (sum of column 0 minus TP: 1 + 0)
- FN = 2 (sum of row 0 minus TP: 2 + 0)
- TN = 6 + 1 + 2 + 7 = 16 (all other cells not in row 0 or column 0)

For Class 1:
- TP = 6 (diagonal element for Class 1)
- FP = 4 (sum of column 1 minus TP: 2 + 2)
- FN = 2 (sum of row 1 minus TP: 1 + 1)
- TN = 5 + 0 + 0 + 7 = 12 (all other cells not in row 1 or column 1)

In [13]:
def confusion_matrix(y_true: pd.Series, y_pred: List[str],
                     class_names: List[str] = None) -> Tuple[NDArray[np.int64], List[str]]:
    """
    Calculate the confusion matrix.

    Args:
        y_true (pd.Series): True labels.
        y_pred (List[str]): Predicted labels.
        class_names (List[str], optional): List of class names. Defaults to None.

    Returns:
        Tuple: 
        - NDArray[np.int64]: Confusion matrix.
        - List[str]: List of class names.
    """
    # Encode labels as integers
    unique_classes = np.unique(y_true)
    if class_names is None:
        class_names = [str(cls) for cls in unique_classes]
    class_to_index = {cls: i for i, cls in enumerate(unique_classes)}

    n_classes = len(unique_classes)
    matrix = np.zeros((n_classes, n_classes), dtype=int)

    for true, pred in zip(y_true, y_pred):
        true_idx = class_to_index[true]
        pred_idx = class_to_index[pred]
        matrix[true_idx][pred_idx] += 1

    return matrix, class_names

### Accuracy
Accuracy is the most common evaluation metric for classification problems, representing the percentage of correct predictions out of total predictions. It provides a simple measure of how often the classifier makes correct predictions across all classes.

\begin{align*}
\text{Accuracy} = \dfrac{\text{True Positives (TP)} + \text{True Negatives (TN)}}{\text{Total Samples}}
\end{align*}

In [14]:
def accuracy(y_true: pd.Series, y_pred: List[str]) -> float:
    """
    Calculate the accuracy of predictions by comparing true and predicted labels.

    Args:
        y_true (pd.Series): Ground truth target values. Contains the actual class labels for each sample.
        y_pred (List[str])): Estimated target as returned by a classifier. Contains the predicted class labels for each sample.
    Returns:
        float: Classification accuracy as a percentage (0.0 to 100.0).
    """
    return np.mean(y_true == y_pred)

### Precision
Precision measures the proportion of true positive predictions out of all positive predictions made by the classifier.

\begin{align*}
\text{Precision} = \dfrac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Positives (FP)}}
\end{align*}

In [15]:
def precision(y_true: pd.Series, y_pred: List[str]) -> NDArray[np.float64]:
    """
    Calculate precision for each class.

    Args:
        y_true (pd.Series): True labels.
        y_pred (List[str]): Predicted labels.

    Returns:
        NDArray[np.float64]: Precision values for each class.
    """
    cm, _ = confusion_matrix(y_true, y_pred)
    return np.diag(cm) / (np.sum(cm, axis=0) + 1e-7)

### Recall
Recall measures the proportion of true positive predications out of all actual positive cases.

\begin{align*}
\text{Recall} = \dfrac{\text{True Positives (TP)} }{\text{True Positives (TP)} + \text{False Negatives (FN)}}
\end{align*}

In [16]:
def recall(y_true: pd.Series, y_pred: List[str]) -> NDArray[np.float64]:
    """
    Calculate recall for each class.

    Args:
        y_true (pd.Series): True labels.
        y_pred (List[str]): Predicted labels.

    Returns:
        NDArray[np.float64]: Recall values for each class.
    """
    cm, _ = confusion_matrix(y_true, y_pred)
    return np.diag(cm) / (np.sum(cm, axis=1) + 1e-7)

### F1-Score
The F1-Score is the harmonic mean of precision and recall.

\begin{align*}
\text{F1-Score} = 2 \times \dfrac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
\end{align*}

In [17]:
def f1_score(y_true: pd.Series, y_pred: List[str]) -> NDArray[np.float64]:
    """
    Calculate F1-score for each class.

    Args:
        y_true (pd.Series): True labels.
        y_pred (List[str]): Predicted labels.

    Returns:
        NDArray[np.float64]: F1-scores for each class.
    """
    prec = precision(y_true, y_pred)
    rec = recall(y_true, y_pred)
    return 2 * (prec * rec) / (prec + rec + 1e-7)

In [18]:
def evaluate(y_true: pd.Series, y_pred: List[str],
             class_names: List[str] = None) -> Tuple[float, float, float, float, NDArray[np.int64]]:
    """
    Calculate evaluation metrics including accuracy, precision, recall, and F1-score for each class.

    Args:
        y_true (pd.Series): True labels.
        y_pred (List[str]): Predicted labels.
        class_names (List[str], optional): List of class names. Defaults to None.

    Returns:
        Tuple:
        - float: Overall accuracy.
        - float: Average precision.
        - float: Average recall.
        - float: Average F1-score.
        - NDArray[np.int64]: Confusion matrix.
    """
    cm, class_names = confusion_matrix(y_true, y_pred, class_names)
    acc = accuracy(y_true, y_pred)
    prec = precision(y_true, y_pred)
    rec = recall(y_true, y_pred)
    f1 = f1_score(y_true, y_pred)
    # print("Class\tPrecision\tRecall\tF1-Score")
    # for i, class_name in enumerate(class_names):
    #     print(f"{class_name}\t{prec[i]:.4f}\t\t{rec[i]:.4f}\t{f1[i]:.4f}")
    return acc, np.mean(prec), np.mean(rec), np.mean(f1), cm

## 8. Encapsulation

In [19]:
class CustomCategoricalNB:
    """
    Categorical Naive Bayes classifier for discrete features.

    This implementation handles categorical features directly without requiring label encoding.
    Uses Laplace smoothing to handle unseen feature values during prediction.

    Attributes:
        alpha (float): Smoothing parameter (default=1.0).
        priors_ (Dict[str, float]): Class prior probabilities.
        likelihoods_ (Dict[str, Dict[str, Dict[str, float]]]): Feature likelihood probabilities.
        classes_ (NDArray[np.str_]): Unique class labels.
        feature_names_ (List[str]): Feature names from training data.
    """

    def __init__(self, alpha: float = 1.0) -> None:
        """
        Initialise the Categorical Naive Bayes classifier.

        Args:
            alpha (float): Smoothing parameter for Laplace smoothing (default=1.0).
        """
        self.alpha = alpha
        self.priors_ = None
        self.likelihoods_ = None
        self.classes_ = None
        self.feature_names_ = None

    def fit(self, X: pd.DataFrame, y: pd.Series) -> None:
        """
        Fit the model to the training data.

        Args:
            X (pd.DataFrame): Training data (categorical).
            y (pd.Series): Target values (class labels).

        Computes:
            - Class prior probabilities (priors_).
            - Feature likelihood probabilities (likelihoods_).
        """
        self.classes_ = np.unique(y)
        self.feature_names_ = X.columns.to_list()
        self.priors_ = self._calculate_priors(y)
        self.likelihoods_ = self._calculate_likelihoods(X, y)

    def predict(self, X: pd.DataFrame) -> List[str]:
        """
        Predict class labels using Categorical Naive Bayes.

        Args:
            X (pd.DataFrame): Feature matrix.

        Returns:
            List[str]: Predicted class labels.

        Raises:
            ValueError: If model hasn't been fitted.
        """
        if self.priors_ is None or self.likelihoods_ is None:
            raise ValueError('Model not fitted. Call .fit() first.')

        predictions = []
        for row in X.itertuples(index=False):
            log_posteriors = self._calculate_posteriors(row._asdict())
            predictions.append(max(log_posteriors, key=log_posteriors.get))
        return predictions

    def _calculate_priors(self, y: pd.Series) -> Dict[str, float]:
        """
        Calculate prior probabilities for each class in the target variable.

        Args:
            y (pd.Series): Target variable containing class labels (strings).

        Returns:
            Dict[str, float]: Prior probabilities for each class.
        """
        return y.value_counts(normalize=True).to_dict()

    def _calculate_likelihoods(self, X: pd.DataFrame, y: pd.Series) -> Dict[str, Dict[str, Dict[str, float]]]:
        """
        Calculate conditional probabilities for feature values given each class.

        Args:
            X (pd.DataFrame): Feature matrix (DataFrame with categorical columns)
            y (pd.Series): Target variable (Series of class labels)

        Returns:
            Nested dictionary with structure:
            {feature_name: {class_label: {feature_value: probability}}}
        """
        likelihoods = {}
        for feature in self.feature_names_:  # For each column of X
            likelihoods[feature] = {}

            # Unique feature values in each column
            unique_features = X[feature].unique()

            for c in self.classes_:  # Unique target values of y
                class_subset = X[y == c]
                total = len(class_subset)  # Count(C)

                # Count frequencies (e.g., {'Sunny':3, 'Rain':2} for class 'No')
                value_counts = class_subset[feature].value_counts()

                # All features values are included, even if missing in subset
                value_counts = value_counts.reindex(
                    unique_features, fill_value=0)

                probas = round((value_counts + self.alpha) / \
                    (total + len(unique_features) + self.alpha), 4)

                likelihoods[feature][c] = probas.to_dict()

        return likelihoods

    def _calculate_posteriors(self, x: Dict[str, str]) -> Dict[str, float]:
        """
        Calculate log-posterior probabilities for all classes given a sample.

        Args:
            x (Dict[str, str]): Input sample as dictionary {feature: value}.

        Returns:
            Dict[str, float]: Dictionary mapping each class to its log-posterior probability.
        """
        log_posteriors = {}
        for c in self.classes_:
            log_proba = np.log(self.priors_[c])  # Log of prior
            for feature in self.feature_names_:  # Sum of the likelihood for x given c
                category = x[feature]
                # Avoid log(0) if the feature does not exist
                proba = self.likelihoods_[feature][c].get(category, 1e-9)
                log_proba += np.log(proba)
            log_posteriors[c] = round(log_proba, 4)
        return log_posteriors  # log-posterior probabilities for all classes

In [20]:
model = CustomCategoricalNB()
model.fit(X, y)

predictions = model.predict(X)
acc, prec, rec, f1, cm = evaluate(y, predictions)
print(f"Accuracy: {acc:.4f}")
print(f"Precision: {prec:.4f}")
print(f"Recall: {rec:.4f}")
print(f"F1-Score: {f1:.4f}")
print(f"Confusion Matrix:\n{cm}")

Accuracy: 0.9286
Precision: 0.9500
Recall: 0.9000
F1-Score: 0.9181
Confusion Matrix:
[[4 1]
 [0 9]]


## 9. Comparison with Scikit-Learn
Scikit-Learn's `CategoricalNB` only works with integer-encoded categorical features, therefore all categorical features and the target variable need to be converted to integers using `LabelEncoder`.

In [21]:
from sklearn.naive_bayes import CategoricalNB
from sklearn.metrics import classification_report
from sklearn.preprocessing import LabelEncoder

# Encode categorical features and target
label_encoders = {}
for column in df.columns:
    le = LabelEncoder()
    df[column] = le.fit_transform(df[column])
    label_encoders[column] = le

# Separate features and target
X = df.drop('Play', axis=1)
y = df['Play']

X.head()

Unnamed: 0,Outlook,Temperature,Humidity,Windy
0,2,1,0,1
1,2,1,0,0
2,0,1,0,1
3,1,2,0,1
4,1,0,1,1


In [22]:
model = CategoricalNB()
model.fit(X, y)

predictions = model.predict(X)
accuracy = model.score(X, y)
print(f'Predictions: {predictions}')
print(f'Accuracy: {accuracy:.4f}')
print(f'Classification report:\n{classification_report(y, predictions)}')

Predictions: [0 0 1 1 1 1 1 0 1 1 1 1 1 0]
Accuracy: 0.9286
Classification report:
              precision    recall  f1-score   support

           0       1.00      0.80      0.89         5
           1       0.90      1.00      0.95         9

    accuracy                           0.93        14
   macro avg       0.95      0.90      0.92        14
weighted avg       0.94      0.93      0.93        14

