# Bernoulli Naive Bayes Classifier from Scratch
***
## Table of Contents
1. [Introduction](#1-introduction)
    - [Bayes' Theorem](#bayes-theorem)
2. [Loading Data](#2-loading-data)
3. [Prior Probability](#3-prior-probability)
4. [Likelihood for Bernoulli NB](#4-likelihood-for-bernoulli-nb)
5. [Posterior Probability for Bernoulli NB](#5-posterior-probability-for-bernoulli-nb)
6. [Prediction](#6-prediction)
7. [Evaluation Metrics](#7-evaluation-metrics)
    - [Binary Confusion Matrix](#binary-confusion-matrix)
    - [Multi-Class Confusion Matrix](#multi-class-confusion-matrix)
    - [Accuracy](#accuracy)
    - [Precision](#precision)
    - [Recall](#recall)
    - [F1-Score](#f1-score)
8. [Encapsulation](#8-encapsulation)
9. [Comparison with Scikit-Learn](#9-comparison-with-scikit-learn)
***

In [1]:
import numpy as np
import pandas as pd
from typing import Tuple, List, Dict
from numpy.typing import NDArray

## 1. Introduction
Naive Bayes classifiers are probabilistic classification models based on Bayes' Theorem, assuming conditional independence between features given the class labels or values. Naive Bayes is a general framework; the specific variant should be chosen based on the nature of your data:

- **Categorical Naive Bayes**

    - **Features**: Categorical labels (e.g., colours, countries, product types).

    - **Use Case**: Classification with discrete, categorically distributed features.

- **Multinomial Naive Bayes**

    - **Features**: Counts or frequencies (e.g., word occurrences, event counts).

    - **Use** **Case**: Text classification, document classification, or any scenario where features are discrete counts.

- **Gaussian Naive Bayes**

    - **Features**: Continuous data (e.g., measurements, sensor readings).

    - **Use Case**: Classification with numerical features assumed to follow a Gaussian distribution.

- **Bernoulli Naive Bayes**

    - **Features**: Binary features (e.g., True/False, 0/1).

    - **Use Case**: Text classification (presence/absence of words), binary feature spaces.



### Bayes' Theorem
Bayes' theorem describes the probability of a class $C_{i}$ given a set of features $X = (x_{1}, x_{2},\ldots,x_{N})$:

\begin{align*}
P(C_{i}|X) = \dfrac{P(X|C_{i}) \cdot P(C_{i})}{P(X)}
\end{align*}

where:
- $P(C_{i}|X)$: Posterior probability of class $C_{i}$ given features $X$.
- $P(X|C_{i})$: Likelihood of features $X$ given class $C_{i}$.
- $P(C_{i})$: Prior probability of class $C_{i}$.
- $P(X)$: Probability of features $X$ (acts as a normalising constant).

Bernoulli Naive Bayes assumes features $X = (x_{1}, x_{2},\ldots,x_{N})$ are conditionally independent given the class $C_{i}$, thus the likelihood is expressed as:

\begin{align*}
P(X|C_{i}) = P(x_{1}, x_{2}, \dots, x_{N}|C_{i}) = \prod_{j=1}^{N}P(x_{j}|C_{i})
\end{align*}

Replacing $P(X|C_{i})$ in Bayes' theorem, the equation becomes:

\begin{align*}
P(C_{i}|X) = \dfrac{P(C_{i}) \cdot \prod_{j=1}^{N} P(x_{j}|C_{i})}{P(X)}
\end{align*}

Since $P(X)$ is constant for all classes,

\begin{align*}
P(C_{i}|X) \propto P(C_{i}) \cdot \prod_{j=1}^{N} P(x_{j}|C_{i})
\end{align*}

The symbol $\propto$ denotes proportionality, meaning we ignore the denominator $P(X)$ when comparing probabilities across classes.

## 2. Loading Data
Retrieved from [Kaggle - Simple Weather Forecast](https://www.kaggle.com/datasets/dheemanthbhat/simple-weather-forecast?select=weather_forecast.csv)

In [2]:
df = pd.read_csv('../../_datasets/weather_forecast.csv')
df.head()

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play
0,Sunny,Hot,High,Weak,No
1,Sunny,Hot,High,Strong,No
2,Overcast,Hot,High,Weak,Yes
3,Rain,Mild,High,Weak,Yes
4,Rain,Cool,Normal,Weak,Yes


In [3]:
X = df.drop('Play', axis=1)
y = df['Play']

Bernoulli Naive Bayes requires features to have binary values, so we need to one-hot encode all categorical features.

In [4]:
X_binary = pd.get_dummies(X, drop_first=False)
X_binary.head()

Unnamed: 0,Outlook_Overcast,Outlook_Rain,Outlook_Sunny,Temperature_Cool,Temperature_Hot,Temperature_Mild,Humidity_High,Humidity_Normal,Windy_Strong,Windy_Weak
0,False,False,True,False,True,False,True,False,False,True
1,False,False,True,False,True,False,True,False,True,False
2,True,False,False,False,True,False,True,False,False,True
3,False,True,False,False,False,True,True,False,False,True
4,False,True,False,True,False,False,False,True,False,True


We will also convert `y` into a binary format.

In [5]:
y_binary = pd.Series(np.where(y == 'Yes', 1, 0), index=y.index, name='Play')
y_binary

0     0
1     0
2     1
3     1
4     1
5     0
6     1
7     0
8     1
9     1
10    1
11    1
12    1
13    0
Name: Play, dtype: int64

## 3. Prior Probability
Class $C_{i}$ (`y_binary`) has two boolean variables: `1`, and `0` (`1` = 'Yes', `0` = 'No' in `y`):

\begin{align*}
P(C_{i}=1) = \dfrac{\text{Count(1)}}{\text{Total Count}}
\end{align*}

\begin{align*}
P(C_{i}=\text{0}) = \dfrac{\text{Count(0)}}{\text{Total Count}}
\end{align*}

In [6]:
print(f'Total count: {len(df)}')
print(f'Counts: {y_binary.value_counts().to_dict()}')

Total count: 14
Counts: {1: 9, 0: 5}


\begin{align*}
P(1) = \dfrac{9}{14} = 0.6429
\end{align*}

\begin{align*}
P(0) = \dfrac{5}{14} = 0.3571
\end{align*}

In [7]:
def calculate_priors(y: pd.Series) -> Dict[int, float]:
    """
    Calculate prior probabilities for each class in the target variable.

    Args:
        y: Target variable containing class labels (0/1).

    Returns:
        Prior probabilities for each class.
    """
    return y.value_counts(normalize=True).to_dict()

In [8]:
calculate_priors(y_binary)

{1: 0.6428571428571429, 0: 0.35714285714285715}

## 4. Likelihood for Bernoulli NB
For a feature vector $X = (x_{1}, x_{2},\ldots,x_{N})$ and class $C_{i}$, the likelihood is expressed as:

\begin{align*}
P(X|C_{i}) = P(x_{1}, x_{2}, \dots, x_{N}|C_{i}) = \prod_{j=1}^{N}P(x_{j}|C_{i})
\end{align*}

where each $P(x_{j}|C_{i})$ follows a **Bernoulli distribution**:

\begin{align*}

P(x_{j}|C_{i}) = 
  \begin{cases}
    p_{ij}     & \text{if $x_{j} = 1$} \\
    1 - p_{ij} & \text{if $x_{j} = 0$}
  \end{cases}
\end{align*}

Here, $p_{ij}$ is the probability that feature $j$ is $1$ in class $C_{i}$:

\begin{align*}
p_{ij} = \dfrac{\text{Count(${x_{j}}$ = 1|${C_{i}}$)} + \alpha}{\text{Count(${C_{i}}$)} + 2 \alpha}
\end{align*}

where $\alpha$ is the Laplace smoothing parameter to avoid zero probabilities (default $\alpha$ = 1).

In [9]:
def calculate_likelihoods(X: pd.DataFrame, y: pd.Series,
                          alpha: float = 1.0) -> Dict[str, Dict[int, Dict[int, float]]]:
    """
    Calculate conditional probabilities for Bernoulli Naive Bayes.

    Args:
        X: Binary feature matrix (DataFrame with 0/1 values)
        y: Target variable (Series of class labels in 0/1)
        alpha: Smoothing parameter for Laplace smoothing (default=1.0)

    Returns:
        Nested dictionary with structure:
        {feature_name: {class_label: {feature_value: probability}}}
    """
    likelihoods = {}

    for feature in X.columns:
        likelihoods[feature] = {}

        for class_label in y.unique():
            c = int(class_label)
            class_mask = (y == c)
            class_subset = X.loc[class_mask, feature]
            total_in_class = class_mask.sum()  # Number of samples in class

            # Count occurrences of 1s (0s will be total - count_1)
            count_1 = class_subset.sum()
            count_0 = total_in_class - count_1

            # Apply Laplace smoothing for binary features
            # Denominator: total_in_class + 2 * alpha (for two possible values)
            prob_1 = (count_1 + alpha) / (total_in_class + 2 * alpha)
            prob_0 = (count_0 + alpha) / (total_in_class + 2 * alpha)

            # Store probabilities for both values
            likelihoods[feature][c] = {
                0: round(float(prob_0), 4),
                1: round(float(prob_1), 4)
            }

    return likelihoods

In [10]:
calculate_likelihoods(X_binary, y_binary)

{'Outlook_Overcast': {0: {0: 0.8571, 1: 0.1429}, 1: {0: 0.5455, 1: 0.4545}},
 'Outlook_Rain': {0: {0: 0.5714, 1: 0.4286}, 1: {0: 0.6364, 1: 0.3636}},
 'Outlook_Sunny': {0: {0: 0.4286, 1: 0.5714}, 1: {0: 0.7273, 1: 0.2727}},
 'Temperature_Cool': {0: {0: 0.7143, 1: 0.2857}, 1: {0: 0.6364, 1: 0.3636}},
 'Temperature_Hot': {0: {0: 0.5714, 1: 0.4286}, 1: {0: 0.7273, 1: 0.2727}},
 'Temperature_Mild': {0: {0: 0.5714, 1: 0.4286}, 1: {0: 0.5455, 1: 0.4545}},
 'Humidity_High': {0: {0: 0.2857, 1: 0.7143}, 1: {0: 0.6364, 1: 0.3636}},
 'Humidity_Normal': {0: {0: 0.7143, 1: 0.2857}, 1: {0: 0.3636, 1: 0.6364}},
 'Windy_Strong': {0: {0: 0.4286, 1: 0.5714}, 1: {0: 0.6364, 1: 0.3636}},
 'Windy_Weak': {0: {0: 0.5714, 1: 0.4286}, 1: {0: 0.3636, 1: 0.6364}}}

## 5. Posterior Probability for Bernoulli NB
As we discussed [above](#1-introduction), the formula of posterior probability is:


\begin{align*}
P(C_{i}|X) \propto P(C_{i}) \prod_{j=1}^{N} P(x_{j}|C_{i})
\end{align*}

Following a Bernoulli distribution and knowing that $C_{i}$ is a boolean value (0 or 1), we will use log probabilities to prevent overflow:


\begin{align*}
\text{log } P(C_{i}|X) = \text{log } P(C_{i}) + \sum_{j=1}^{N} [x_{j} \log{p_{ij}} + (1-x_{j}) \log{(1-p_{ij})}]
\end{align*}

In the following code, `.get(category, 1e-9)` tries to retrieve the probability for the specific value category from this dictionary. If the category was not seen in the training data for this class (i.e., it's missing from the dictionary), it returns a very small default value (1e-9) instead of raising an error.

In [11]:
def calculate_posterior(x: Dict[str, int], priors: Dict[int, float], likelihoods: Dict[str, Dict[int, Dict[int, float]]],
                        X_columns: List[str], classes: List[int]) -> Dict[int, float]:
    """
    Calculate log-posterior probabilities for all classes given a sample.

    Args:
        x: Input sample as dictionary {feature: value}.
        priors: Prior probabilities from calculate_priors().
        likelihoods: Conditional probabilities from calculate_likelihoods().
        X_columns: List of feature names.
        classes: List of possible class labels (0/1).

    Returns:
        Dictionary mapping each class to its log-posterior probability.
    """

    log_posteriors = {}
    for c in classes:
        log_proba = np.log(priors[c])  # Log of prior
        for feature in X_columns:  # Sum of the likelihood for each x given c
            y_value = x[feature]
            # Avoid log(0) if the feature does not exist
            proba = likelihoods[feature][c].get(y_value, 1e-9)
            log_proba += np.log(proba)
        log_posteriors[int(c)] = round(float(log_proba), 4)
    return log_posteriors  # log-posterior probabilities for all classes

In [12]:
calculate_posterior(X_binary.iloc[0], calculate_priors(
    y_binary), calculate_likelihoods(X_binary, y_binary), X_binary.columns, y_binary.unique())

{0: -6.4139, 1: -8.0838}

## 6. Prediction


In [13]:
def predict(X: pd.DataFrame, y: pd.Series) -> List[int]:
    """
    Predict class labels using Bernoulli Naive Bayes.

    Args:
        X: Feature matrix.
        y: Target variable.

    Returns:
        Predicted class labels.
    """
    priors = calculate_priors(y)
    likelihoods = calculate_likelihoods(X, y)
    classes = y.unique()
    X_columns = X.columns

    predictions = []
    for row in X.itertuples(index=False):
        posterior = calculate_posterior(
            row._asdict(), priors, likelihoods, X_columns, classes)
        predictions.append(max(posterior, key=posterior.get))
    return predictions

In [14]:
predict(X_binary, y_binary)[:10]

[0, 0, 1, 1, 1, 1, 1, 0, 1, 1]

## 7. Evaluation Metrics
### Binary Confusion Matrix
In a confusion matrix, the terms True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) describe the classification performance for binary classification. 

|                     | Predicted Negative  | Predicted Positive  |
| ------------------- | ------------------- | ------------------- |
| **Actual Negative** | True Negative (TN)  | False Positive (FP) |
| **Actual Positive** | False Negative (FN) | True Positive (TP)  |


1. True Positive (TP): The number of instances correctly predicted as positive (e.g., a disease correctly identified).

2. True Negative (TN): The number of instances correctly predicted as negative (e.g., no disease correctly identified).

3. False Positive (FP): The number of instances incorrectly predicted as positive (e.g., predicting disease when there isn't any).

4. False Negative (FN): The number of instances incorrectly predicted as negative (e.g., missing a disease when it exists).

### Multi-Class Confusion Matrix
For multi-class classification, the concepts can be extended by treating one class as the "positive" class and all others as "negative" classes in a one-vs-all approach. Rows represent the actual classes (true labels), and columns represent the predicted classes. For a class $C$,
1. True Positive (TP): The count in the diagonal cell corresponding to class $C$ ($\text{matrix} [C][C]$).
2. False Positive (FP): The sum of the column for class $C$, excluding the diagonal ($\sum(\text{matrix} [:, C]) - \text{matrix} [C][C]$).
3. False Negative (FN): The sum of the row for class $C$, excluding the diagonal ($\sum(\text{matrix} [C, :]) - \text{matrix} [C][C]$).
4. True Negative (TN): All other cells not in the row or column for class $C$ ($\text{total} - (FP + FN + TP)$).

|                  | Predicted Class 0 | Predicted Class 1 | Predicted Class 2 |
| ---------------- | ----------------- | ----------------- | ----------------- |
| **True Class 0** | 5                 | 2                 | 0                 |
| **True Class 1** | 1                 | 6                 | 1                 |
| **True Class 2** | 0                 | 2                 | 7                 |


For Class 0:
- TP = 5 (diagonal element for Class 0)
- FP = 1 (sum of column 0 minus TP: 1 + 0)
- FN = 2 (sum of row 0 minus TP: 2 + 0)
- TN = 6 + 1 + 2 + 7 = 16 (all other cells not in row 0 or column 0)

For Class 1:
- TP = 6 (diagonal element for Class 1)
- FP = 4 (sum of column 1 minus TP: 2 + 2)
- FN = 2 (sum of row 1 minus TP: 1 + 1)
- TN = 5 + 0 + 0 + 7 = 12 (all other cells not in row 1 or column 1)

In [15]:
def confusion_matrix(y_true: pd.Series, y_pred: List[str],
                     class_names: List[str] = None) -> Tuple[NDArray[np.int64], List[str]]:
    """
    Calculate the confusion matrix.

    Args:
        y_true: True labels.
        y_pred: Predicted labels.
        class_names: List of class names. Defaults to None.

    Returns:
        Tuple: 
        - Confusion matrix.
        - List of class names.
    """
    # Encode labels as integers
    unique_classes = np.unique(y_true)
    if class_names is None:
        class_names = [str(cls) for cls in unique_classes]
    class_to_index = {cls: i for i, cls in enumerate(unique_classes)}

    n_classes = len(unique_classes)
    matrix = np.zeros((n_classes, n_classes), dtype=int)

    for true, pred in zip(y_true, y_pred):
        true_idx = class_to_index[true]
        pred_idx = class_to_index[pred]
        matrix[true_idx][pred_idx] += 1

    return matrix, class_names

### Accuracy
Accuracy is the most common evaluation metric for classification problems, representing the percentage of correct predictions out of total predictions. It provides a simple measure of how often the classifier makes correct predictions across all classes.

\begin{align*}
\text{Accuracy} = \dfrac{\text{True Positives (TP)} + \text{True Negatives (TN)}}{\text{Total Samples}}
\end{align*}

In [16]:
def accuracy(y_true: pd.Series, y_pred: List[str]) -> float:
    """
    Calculate the accuracy of predictions by comparing true and predicted labels.

    Args:
        y_true: Ground truth target values. Contains the actual class labels for each sample.
        y_pred: Estimated target as returned by a classifier. Contains the predicted class labels for each sample.
    Returns:
        Classification accuracy (0.0 to 1.0).
    """
    return np.mean(y_true == y_pred)

### Precision
Precision measures the proportion of true positive predictions out of all positive predictions made by the classifier.

\begin{align*}
\text{Precision} = \dfrac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Positives (FP)}}
\end{align*}

In [17]:
def precision(y_true: pd.Series, y_pred: List[str]) -> NDArray[np.float64]:
    """
    Calculate precision for each class.

    Args:
        y_true: True labels.
        y_pred: Predicted labels.

    Returns:
        Precision values for each class.
    """
    cm, _ = confusion_matrix(y_true, y_pred)
    return np.diag(cm) / (np.sum(cm, axis=0) + 1e-7)

### Recall
Recall measures the proportion of true positive predications out of all actual positive cases.

\begin{align*}
\text{Recall} = \dfrac{\text{True Positives (TP)} }{\text{True Positives (TP)} + \text{False Negatives (FN)}}
\end{align*}

In [18]:
def recall(y_true: pd.Series, y_pred: List[str]) -> NDArray[np.float64]:
    """
    Calculate recall for each class.

    Args:
        y_true: True labels.
        y_pred: Predicted labels.

    Returns:
        Recall values for each class.
    """
    cm, _ = confusion_matrix(y_true, y_pred)
    return np.diag(cm) / (np.sum(cm, axis=1) + 1e-7)

### F1-Score
The F1-Score is the harmonic mean of precision and recall.

\begin{align*}
\text{F1-Score} = 2 \times \dfrac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
\end{align*}

In [19]:
def f1_score(y_true: pd.Series, y_pred: List[str]) -> NDArray[np.float64]:
    """
    Calculate F1-score for each class.

    Args:
        y_true: True labels.
        y_pred: Predicted labels.

    Returns:
        F1-scores for each class.
    """
    prec = precision(y_true, y_pred)
    rec = recall(y_true, y_pred)
    return 2 * (prec * rec) / (prec + rec + 1e-7)

In [20]:
def evaluate(y_true: pd.Series, y_pred: List[str],
             class_names: List[str] = None) -> Tuple[float, float, float, float, NDArray[np.int64]]:
    """
    Calculate evaluation metrics including accuracy, precision, recall, and F1-score for each class.

    Args:
        y_true: True labels.
        y_pred: Predicted labels.
        class_names: List of class names. Defaults to None.

    Returns:
        Tuple:
        - Overall accuracy.
        - Average precision.
        - Average recall.
        - Average F1-score.
        - Confusion matrix.
    """
    cm, class_names = confusion_matrix(y_true, y_pred, class_names)
    acc = accuracy(y_true, y_pred)
    prec = precision(y_true, y_pred)
    rec = recall(y_true, y_pred)
    f1 = f1_score(y_true, y_pred)
    # print("Class\tPrecision\tRecall\tF1-Score")
    # for i, class_name in enumerate(class_names):
    #     print(f"{class_name}\t{prec[i]:.4f}\t\t{rec[i]:.4f}\t{f1[i]:.4f}")
    return acc, np.mean(prec), np.mean(rec), np.mean(f1), cm

## 8. Encapsulation

In [21]:
class CustomBernoulliNB:
    """
    Bernoulli Naive Bayes classifier for boolean features.

    This implementation handles boolean features after label encoding.
    Uses Laplace smoothing to handle underflow.

    Attributes:
        alpha (float): Smoothing parameter (default=1.0).
        priors_ (Dict[int, float]): Class prior probabilities.
        likelihoods_ (Dict[str, Dict[int, Dict[int, float]]]): Feature likelihood probabilities.
        classes_ (NDArray[np.int64]): Unique class labels.
        feature_names_ (List[str]): Feature names from training data.
    """

    def __init__(self, alpha: float = 1.0) -> None:
        """
        Initialise Bernoulli Naive Bayes classifier.

        Args:
            alpha: Smoothing parameter for Laplace smoothing (default=1.0).
        """
        self.alpha = alpha
        self.priors_ = None
        self.likelihoods_ = None
        self.classes_ = None
        self.feature_names_ = None

    def fit(self, X: pd.DataFrame, y: pd.Series) -> None:
        """
        Fit the model to the training data.

        Args:
            X: Training data (integer).
            y: Target values (class labels in integer).

        Computes:
            - Class prior probabilities (priors_).
            - Feature likelihood probabilities (likelihoods_).
        """
        # Validate binary features
        if not X.isin([0, 1]).all().all():
            raise ValueError("All features must be binary (0/1)")

        self.classes_ = np.unique(y)
        self.feature_names_ = X.columns.to_list()
        self.priors_ = self._calculate_priors(y)
        self.likelihoods_ = self._calculate_likelihoods(X, y)

    def predict(self, X: pd.DataFrame) -> List[int]:
        """
        Predict class labels using Categorical Naive Bayes.

        Args:
            X: Feature matrix.

        Returns:
            Predicted class labels.

        Raises:
            ValueError: If model hasn't been fitted.
        """
        if self.priors_ is None or self.likelihoods_ is None:
            raise ValueError('Model not fitted. Call .fit() first.')

        predictions = []
        for row in X.itertuples(index=False):
            log_posteriors = self._calculate_posteriors(row._asdict())
            predictions.append(max(log_posteriors, key=log_posteriors.get))
        return predictions

    def _calculate_priors(self, y: pd.Series) -> Dict[int, float]:
        """
        Calculate prior probabilities for each class in the target variable.

        Args:
            y: Target variable containing class labels (0/1).

        Returns:
            Prior probabilities for each class.
        """
        return y.value_counts(normalize=True).to_dict()

    def _calculate_likelihoods(self, X: pd.DataFrame, y: pd.Series) -> Dict[str, Dict[int, Dict[int, float]]]:
        """
        Calculate conditional probabilities for feature values given each class.

        Args:
            X: Feature matrix (DataFrame with binary columns)
            y: Target variable (Series of binary class labels)

        Returns:
            Nested dictionary with structure:
            {feature_name: {class_label: {feature_value: probability}}}
        """
        likelihoods = {}

        for feature in self.feature_names_:
            likelihoods[feature] = {}

            for class_label in self.classes_:
                c = int(class_label)
                class_mask = (y == c)
                class_subset = X.loc[class_mask, feature]
                total_in_class = class_mask.sum()  # Number of samples in class

                # Count occurrences of 1s (0s will be total - count_1)
                count_1 = class_subset.sum()
                count_0 = total_in_class - count_1

                # Apply Laplace smoothing for binary features
                # Denominator: total_in_class + 2 * alpha (for two possible values)
                prob_1 = (count_1 + self.alpha) / \
                    (total_in_class + 2 * self.alpha)
                prob_0 = (count_0 + self.alpha) / \
                    (total_in_class + 2 * self.alpha)

                # Store probabilities for both values
                likelihoods[feature][c] = {
                    0: round(float(prob_0), 4),
                    1: round(float(prob_1), 4)
                }

        return likelihoods

    def _calculate_posteriors(self, x: Dict[str, int]) -> Dict[int, float]:
        """
        Calculate log-posterior probabilities for all classes given a sample.
        Args:
            x: Input sample as dictionary {feature: value}.

        Returns:
            Dictionary mapping each class to its log-posterior probability.
        """
        log_posteriors = {}
        for c in self.classes_:
            log_proba = np.log(self.priors_[c])  # Log of prior
            for feature in self.feature_names_:  # Sum of the likelihood for x given c
                category = x[feature]
                # Avoid log(0) if the feature does not exist
                proba = self.likelihoods_[feature][c].get(category, 1e-9)
                log_proba += np.log(proba)
            log_posteriors[c] = round(log_proba, 4)
        return log_posteriors  # log-posterior probabilities for all classes

In [22]:
model = CustomBernoulliNB()
model.fit(X_binary, y_binary)

y_pred = model.predict(X_binary)
acc, prec, rec, f1, cm = evaluate(y_binary, y_pred)
print(f'Accuracy: {acc:.4f}')
print(f'Precision: {prec:.4f}')
print(f'Recall: {rec:.4f}')
print(f'F1-Score: {f1:.4f}')
print(f'Confusion Matrix:\n{cm}')

Accuracy: 0.9286
Precision: 0.9500
Recall: 0.9000
F1-Score: 0.9181
Confusion Matrix:
[[4 1]
 [0 9]]


## 9. Comparison with Scikit-Learn

In [23]:
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import classification_report

model = BernoulliNB()
model.fit(X_binary, y_binary)

y_pred = model.predict(X_binary)
accuracy = model.score(X_binary, y_binary)
print(f'Predictions: {y_pred}')
print(f'Accuracy: {accuracy:.4f}')
print(f'Classification report:\n{classification_report(y_binary, y_pred)}')

Predictions: [0 0 1 1 1 1 1 0 1 1 1 1 1 0]
Accuracy: 0.9286
Classification report:
              precision    recall  f1-score   support

           0       1.00      0.80      0.89         5
           1       0.90      1.00      0.95         9

    accuracy                           0.93        14
   macro avg       0.95      0.90      0.92        14
weighted avg       0.94      0.93      0.93        14

