# 朴素贝叶斯

朴素贝叶斯是生成学习方法，生成方法由训练数据学习联合概率分布 $P(X,Y)$，再求得后验概率分布 $P(Y|X)$。利用训练数据学习 $P(X|Y)$ 和 $P(Y)$ 的估计，得到联合概率分布

$$P(X,Y)=P(X)P(X|Y)$$

概率估计方法可使用极大似然估计或贝叶斯估计

朴素贝叶斯法的基本假设是条件独立性：

$$
P(X=x|Y=c_k) = P(X^{(1)}=x^{(1)}, ..., X^{(n)}=x^{(n)}, Y=c_k) \\
             = \prod\limits_{_{j=1}}^n P(X^{(j)}=x^{(j)}, Y=c_k)
$$

朴素贝叶斯法利用贝叶斯定理与学习到的联合概率模型进行分类预测

$$
P(Y|X)=\frac{P(X|Y)P(Y)}{P(X)}=\frac{P(X|Y)P(Y)}{\sum\limits_{Y}{P(Y)P(X|Y)}}
$$

将输入 x 分到后验概率最大的类 y

$$
y = \argmax\limits_{c_k} P(Y=c_k) \prod\limits_{j=1}^n P(X_j=x^{(j)} | Y = c_k)
$$

In [1]:
import numpy as np
import pandas as pd

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from collections import Counter

In [4]:
df_iris = load_iris(as_frame=True, return_X_y=True)
x = np.array(df_iris[0].iloc[:100])
y = np.array(df_iris[1].iloc[:100])

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

In [5]:
x_test[0], y_test[0]

(array([6. , 2.7, 5.1, 1.6]), 1)

In [10]:
from sklearn.naive_bayes import GaussianNB, BernoulliNB

clf = GaussianNB()
clf.fit(x_train, y_train)
clf.score(x_test, y_test)

1.0

In [11]:
clf.predict([[4.4, 3.2, 1.3, 0.2]])

array([0])

In [12]:
from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier()
clf.fit(x_train, y_train)
clf.score(x_test, y_test)

1.0