### Strengths
#### MultinomialNB and BernoulliNB
- The algorithms also has an alpha value that controls the model complexity, a bigger value of alpha gets a less complex model.
- The alpha value can be tuned to get the best model, but it is not critical to get a good model.
- They are used for sparse data, like text classification.
- MultinomialNB usually performs better than BernoulliNB, in particular if the dataset has many non-zero features.
#### GaussianNB
- The algorithm is very fast and can be used for large datasets.
### Weaknesses
- The predictions are not very accurate, because the algorithms assume that the features are independent, which is not true in most cases.

# Naive Bayes Classifiers

In [1]:
import sys
!{sys.executable} -m pip install numpy scipy matplotlib pandas mglearn


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


There are three kinds of NBCs implemented in scikit-learn:
- Gaussian Naive Bayes
- Bernoulli Naive Bayes
- Multinomial Naive Bayes

### GaussianNB
Can be applied to any continuous data.

### BernoulliNB
Assumes binary data.

### MultinomialNB
assumes count data (i.e. that each feature represents an integer count of something, like how often a word appears in a sentence).

*BernoulliNB and MultinomialNB are mostly used in text data classification*

In [2]:
import numpy as np

## BernoulliNB Classifier

Counts how often every feature of each class is not zero.

In [5]:
X = np.array([
            [0,1,0,1],
            [1,0,1,1],
            [0,0,0,1],
            [1,0,1,0]
            ])
y = np.array([0,1,0,1])

In [6]:
counts = {}
for label in np.unique(y):
    # iterate over each class
    # count (sum) entries of 1 per feature
    counts[label] = X[y == label].sum(axis=0)
print("Feature counts:\n{}".format(counts))

Feature counts:
{np.int64(0): array([0, 1, 0, 2]), np.int64(1): array([2, 0, 2, 1])}


*It was the worst hour in my life, but DeepSeek explain me it <3*

We have to classes, class 0 and class 1. The class 0 has two vectors (first and thirst), then we separate the data in their respective classes and count how many times the word appears in the class.

For class 0 we have the vectors [0,1,0,1] and [0,0,0,1], the we analize the columns and count how many times the word appears in the class.
Then, we have the following table:

Features:

Col 0: [0,0] -> 0 appears two times and 1 appears zero times
Col 1: [1,0] -> 0 appears one time and 1 appears one time
Col 2: [0,0] -> 0 appears two times and 1 appears zero times
Col 3: [1,1] -> 0 appears zero times and 1 appears two times

For class 1 we have the vectors [1,0,1,1] and [1,0,1,0], then...

Features:

Col 0: [1,1] -> 0 appears zero time and 1 appears two times
Col 1: [0,0] -> 0 appears two times and 1 appears zero times
Col 2: [1,1] -> 0 appears zero time and 1 appears two times
Col 3: [1,0] -> 0 appears one time and 1 appears one time

Then if we count the number of times the 1 appears in the class we have the following table:

| Feature | Class 0 | Class 1 |
|---------|---------|---------|
| 0       | 0       | 2       |
| 1       | 1       | 0       |
| 2       | 0       | 2       |
| 3       | 2       | 1       |

If we explain the code, we have the following:

1.
```python
for label in np.unique(y)
```
We select the unique elements of the target vector, then as our vector is [0,1,0,1] we get [0,1] and iterate over them.

2.
```python
y == label
```
It return the rows of the table that are of the class 0 or 1.
i.e. if y = [0,1,0,1] and label = 0, then y == label = [True, False, True, False]

3.
```python
X[y == label]
```
It returns the rows of the table that are of the class 0 or 1.
i.e. if y = [0,1,0,1] and label = 0, then X[y == label] = [[0,1,0,1], [0,0,0,1]]

4.
```python
np.sum(X[y == label], axis=0)
```

It sums the time that 1 appears in the class.

The other two Naive Bayes classifiers are similar to this one, but they have some differences in the way they count the features.
The MultinoialNB counts the average value value of each feature for each class.

The GaussianNB is used for continuous data, so it calculates the mean and the standard deviation of each feature for each class.