The Naive Bayes classifier separates data into different classes according to the Bayes’ Theorem, along with the assumption that all the predictors are independent of one another. It assumes that a particular feature in a class is not related to the presence of other features.

For example, you can consider a fruit to be a watermelon if it is green, round and has a 10-inch diameter. These features could depend on each other for their existence, but each one of them independently contributes to the probability that the fruit under consideration is a watermelon. That’s why this classifier has the term ‘Naive’ in its name. 

This algorithm is quite popular because it can even outperform highly advanced classification techniques. Moreover, it’s quite simple, and you can build it quickly. 

Here’s the Bayes theorem, which is the basis for this algorithm:

P(c | x) = P(x | c) P(c)/P(x)

In this equation, 
- ‘c’ stands for class. 
- ‘x’ stands for attributes.
- P(c/x) stands for the posterior probability of class according to the predictor. 
- P(x) is the prior probability of the predictor.
- P(c) is the prior probability of the class. 
- P(x/c) shows the probability of the predictor according to the class. 

__Advantages of Naive Bayes__
- This algorithm works very fast and can easily predict the class of a test dataset. 
- You can use it to solve multi-class prediction problems as it’s quite useful with them. 
- Naive Bayes classifier performs better than other models with less training data if the assumption of independence of features holds. 
- If you have categorical input variables, the Naive Bayes algorithm performs exceptionally well in comparison to numerical variables. 

__Disadvantages of Naive Bayes__
- If your test data set has a categorical variable of a category that wasn’t present in the training data set, the Naive Bayes model will assign it zero probability and won’t be able to make any predictions in this regard. This phenomenon is called ‘Zero Frequency,’ and you’ll have to use a smoothing technique to solve this problem.(alpha)
- This algorithm is also notorious as a lousy estimator. So, you shouldn’t take the probability outputs of ‘predict_proba’ too seriously. 
- It assumes that all the features are independent. While it might sound great in theory, in real life, you’ll hardly find a set of independent features. 

__Applications of Naive Bayes Algorithm__
As you must’ve noticed, this algorithm offers plenty of advantages to its users. That’s why it has a lot of applications in various sectors too. Here are some applications of Naive Bayes algorithm:

- As this algorithm is fast and efficient, you can use it to make real-time predictions.
- This algorithm is popular for multi-class predictions. You can find the probability of multiple target classes easily by using this algorithm.
- Email services (like Gmail) use this algorithm to figure out whether an email is a spam or not. This algorithm is excellent for spam filtering.
- Its assumption of feature independence, and its effectiveness in solving multi-class problems, makes it perfect for performing Sentiment Analysis. Sentiment Analysis refers to the identification of positive or negative sentiments of a target group (customers, audience, etc.)
- Collaborative Filtering and the Naive Bayes algorithm work together to build recommendation systems. These systems use data mining and machine learning to predict if the user would like a particular resource or not.

__Types of Naive Bayes Classifier__
This algorithm has multiple kinds. Here are the main ones:

- __Bernoulli Naive Bayes__
Here, the predictors are boolean variables. So, the only values you have are ‘True’ and ‘False’ (you could also have ‘Yes’ or ‘No’). We use it when the data is according to multivariate Bernoulli distribution. Like MultinomialNB, this classifier is suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features.

- __Multinomial Naive Bayes__
People use this algorithm to solve document classification problems. For example, if you want to determine whether a document belongs to the ‘Legal’ category or ‘Human Resources’ category, you’d use this algorithm to sort it out. It uses the frequency of the present words as features. The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work.

- __Gaussian Naive Bayes__
If the predictors aren’t discrete but have a continuous value, we assume that they are a sample from a gaussian distribution. 

- __ComplementNB__
The Complement Naive Bayes classifier was designed to correct the “severe assumptions” made by the standard Multinomial Naive Bayes classifier. It is particularly suited for imbalanced data sets.

- __CategoricalNB__
CategoricalNB implements the categorical naive Bayes algorithm for categorically distributed data. It assumes that each feature, which is described by the index , has its own categorical distribution.

__Out-of-core NB Model fitting__
Naive Bayes models can be used to tackle large scale classification problems for which the full training set might not fit in memory. To handle this case, MultinomialNB, BernoulliNB, and GaussianNB expose a __partial_fit__ method that can be used incrementally as done with other classifiers as demonstrated in Out-of-core classification of text documents. All naive Bayes classifiers support sample weighting.

Contrary to the fit method, the first call to partial_fit needs to be passed the list of all the expected class labels.

#### 1. Gaussian Naive Bayes


In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

  return f(*args, **kwds)


In [4]:
X,y  = load_iris(return_X_y =True)
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.5,
                                                   random_state=0)

In [5]:
gnb = GaussianNB()
y_pred = gnb.fit(X_train,y_train).predict(X_test)

In [6]:
print(f'Number of mislabeled points out of a total {X_test.shape[0]} points: {(y_test != y_pred).sum()}')

Number of mislabeled points out of a total 75 points: 4


In [28]:
dir(gnb)

['__abstractmethods__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_cache',
 '_abc_negative_cache',
 '_abc_negative_cache_version',
 '_abc_registry',
 '_check_X',
 '_estimator_type',
 '_get_param_names',
 '_get_tags',
 '_joint_log_likelihood',
 '_more_tags',
 '_partial_fit',
 '_update_mean_variance',
 'class_count_',
 'class_prior_',
 'classes_',
 'epsilon_',
 'fit',
 'get_params',
 'partial_fit',
 'predict',
 'predict_log_proba',
 'predict_proba',
 'priors',
 'score',
 'set_params',
 'sigma_',
 'theta_',
 'var_smoothing']

#### 2. MultiNominalNB

In [16]:
import numpy as np
rng = np.random.RandomState(1)
X = rng.randint(5, size=(6, 100))
y = np.array([1, 2, 3, 4, 5, 6])

In [15]:
X.shape

(6, 100)

In [17]:
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.fit(X, y)
print(clf.predict(X[2:3]))

[3]


In [27]:
dir(clf)

['__abstractmethods__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_cache',
 '_abc_negative_cache',
 '_abc_negative_cache_version',
 '_abc_registry',
 '_check_X',
 '_check_X_y',
 '_check_alpha',
 '_count',
 '_estimator_type',
 '_get_coef',
 '_get_intercept',
 '_get_param_names',
 '_get_tags',
 '_init_counters',
 '_joint_log_likelihood',
 '_more_tags',
 '_update_class_log_prior',
 '_update_feature_log_prob',
 'alpha',
 'class_count_',
 'class_log_prior_',
 'class_prior',
 'classes_',
 'coef_',
 'feature_all_',
 'feature_count_',
 'feature_log_prob_',
 'fit',
 'fit_prior',
 'get_params',
 'intercept_',
 'n_features_',
 'norm',


#### 3. BernoulliNB

In [20]:
import numpy as np
rng = np.random.RandomState(1)
X = rng.randint(5, size=(6, 100))
Y = np.array([1, 2, 3, 4, 4, 5])
from sklearn.naive_bayes import BernoulliNB
clf = BernoulliNB()
clf.fit(X, Y)
print(clf.predict(X[2:3]))

[3]


In [26]:
dir(clf)

['__abstractmethods__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_cache',
 '_abc_negative_cache',
 '_abc_negative_cache_version',
 '_abc_registry',
 '_check_X',
 '_check_X_y',
 '_check_alpha',
 '_count',
 '_estimator_type',
 '_get_coef',
 '_get_intercept',
 '_get_param_names',
 '_get_tags',
 '_init_counters',
 '_joint_log_likelihood',
 '_more_tags',
 '_update_class_log_prior',
 '_update_feature_log_prob',
 'alpha',
 'class_count_',
 'class_log_prior_',
 'class_prior',
 'classes_',
 'coef_',
 'feature_all_',
 'feature_count_',
 'feature_log_prob_',
 'fit',
 'fit_prior',
 'get_params',
 'intercept_',
 'n_features_',
 'norm',


#### 4. ComplementNB

In [22]:
from sklearn.naive_bayes import ComplementNB
clf = ComplementNB()
clf.fit(X,y).predict(X[2:3])

array([3])

In [25]:
dir(clf)

['__abstractmethods__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_cache',
 '_abc_negative_cache',
 '_abc_negative_cache_version',
 '_abc_registry',
 '_check_X',
 '_check_X_y',
 '_check_alpha',
 '_count',
 '_estimator_type',
 '_get_coef',
 '_get_intercept',
 '_get_param_names',
 '_get_tags',
 '_init_counters',
 '_joint_log_likelihood',
 '_more_tags',
 '_update_class_log_prior',
 '_update_feature_log_prob',
 'alpha',
 'class_count_',
 'class_log_prior_',
 'class_prior',
 'classes_',
 'coef_',
 'feature_all_',
 'feature_count_',
 'feature_log_prob_',
 'fit',
 'fit_prior',
 'get_params',
 'intercept_',
 'n_features_',
 'norm',


#### 5. CategoricalNB

In [29]:
from sklearn.naive_bayes import CategoricalNB
clf = CategoricalNB()
clf.fit(X,y).predict(X[2:3])

array([3])

In [30]:
dir(clf)

['__abstractmethods__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_cache',
 '_abc_negative_cache',
 '_abc_negative_cache_version',
 '_abc_registry',
 '_check_X',
 '_check_X_y',
 '_check_alpha',
 '_count',
 '_estimator_type',
 '_get_coef',
 '_get_intercept',
 '_get_param_names',
 '_get_tags',
 '_init_counters',
 '_joint_log_likelihood',
 '_more_tags',
 '_update_class_log_prior',
 '_update_feature_log_prob',
 'alpha',
 'category_count_',
 'class_count_',
 'class_log_prior_',
 'class_prior',
 'classes_',
 'coef_',
 'feature_log_prob_',
 'fit',
 'fit_prior',
 'get_params',
 'intercept_',
 'n_features_',
 'partial_fit',
 'predict