# 13 Classification Task dengan Naive Bayes

# Bayes' Theorem

Bayes' theorem menawarkan suatu formula untuk menghitung nilai probability dari suatu event dengan memanfaatkan pengetahuan sebelumnya dari kondisi terkait; atau seringkali dikenal dengan istilah conditional probability.

# Pengenalan Naive Bayes Classification

## Studi Kasus 1

| Asep | Joko |
| :---: | :---: |
| + Siomay: 0.1 | + siomay: 0.5 |
| + bakso: 0.8 | + bakso: 0.2 |
| + lumpia: 0.1 | + lumpia: 0.3 |

**Misi:** Lakukan prediksi siapa yang melakukan pemesanan dengan diketahui pemesanannya adalah **lumpia** dan **bakso**.

***Prior Probability: P(y)***

- ***P(Asep)*** = 0.5
- ***P(Joko)*** = 0.5

***Likelihood: P(X|y)***

- Asep:
    ```bash
    P(lumpia, bakso|Asep) = (0.1 * 0.8)
                          = 0.08
    ```
- Joko:
    ```bash
    P(lumpia, bakso|Joko) = (0.3 * 0.2)
                          = 0.06
    ```

***Evidence atau Normalizer***

```bash
        Evidence = ∑(Likelihood * Prior)
P(lumpia, bakso) = (0.08 * 0.5) + (0.06 * 0.5)
                 = 0.07
```

***Posterior Probability: P(y|X)***

- Formula:
    ```bash
    Posterior = (Likelihood * Prior) / Evidence
    ```
- Asep:
    ```bash
    P(Asep|lumpia, bakso) = (0.08 * 0.5) / 0.07
                          = 0.57
    ```
- Joko:
    ```bash
    P(Joko|lumpia, bakso) = (0.06 * 0.5) / 0.07
                          = 0.43
    ```


## Studi Kasus 2

| Asep | Joko |
| :---: | :---: |
| + Siomay: 0.1 | + siomay: 0.5 |
| + bakso: 0.8 | + bakso: 0.2 |
| + lumpia: 0.1 | + lumpia: 0.3 |

**Misi:** Lakukan prediksi siapa pelanggan yang melakukan pemesanan dengan diketahui pesanannya adalah **siomay** dan **bakso**.

***Posterior Probability: P(y|X) (Kasus 2)***

- Pesanan: siomay, bakso
- Evidence: P(X)
    ```bash
    P(siomay, bakso) = (0.1 * 0.8 * 0.5) + (0.5 * 0.2 * 0.5)
                     = 0.09
    ```
- Asep:
    ```bash
    P(Asep|siomay, bakso) = ((0.1 * 0.8) * 0.5) / 0.09
                          = 0.444
    ```
- Joko:
    ```bash
    P(Joko|siomay, bakso) = ((0.5 * 0.2) * 0.5) / 0.09
                          = 0.555
    ```

# Mengapa disebut Naive?

- Karena sewaktu kita mendefinisikan Likelihood ***P(lumpia, bakso|Asep)***.
- Kita Mengasumsikan ***P(lumpia|Asep)*** conditionally independent terhadap ***P(bakso|Asep)***; demikian sebaliknya.
- sehingga dapat dinformasikan sebagai berikut: ***P(lumpia, bakso|Asep) = P(lumpia|Asep) * P(bakso|Asep)***

# Dataset: Breast Cancer Wisconsin (Diagnostic)

Referensi: https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic

## Load Dataset

In [34]:
from sklearn.datasets import load_breast_cancer

print(load_breast_cancer().DESCR)

.. _breast_cancer_dataset:

Breast cancer Wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

:Number of Instances: 569

:Number of Attributes: 30 numeric, predictive attributes and the class

:Attribute Information:
    - radius (mean of distances from center to points on the perimeter)
    - texture (standard deviation of gray-scale values)
    - perimeter
    - area
    - smoothness (local variation in radius lengths)
    - compactness (perimeter^2 / area - 1.0)
    - concavity (severity of concave portions of the contour)
    - concave points (number of concave portions of the contour)
    - symmetry
    - fractal dimension ("coastline approximation" - 1)

    The mean, standard error, and "worst" or largest (mean of the three
    worst/largest values) of these features were computed for each image,
    resulting in 30 features.  For instance, field 0 is Mean Radius, field
    10 is Radius SE, field 20 is Worst Radius.

    - 

In [35]:
# load_breast_cancer?
X, y = load_breast_cancer(return_X_y=True)
X.shape

(569, 30)

## Training & Testing Set

In [36]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=0.2,
                                                    random_state=0)

print(f'X_train Shape: {X_train.shape}')
print(f'X_test Shape: {X_test.shape}')

X_train Shape: (455, 30)
X_test Shape: (114, 30)


# Naive Bayes dengan Scikit Learn 

In [37]:
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

model = GaussianNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy_score(y_test, y_pred)

0.9298245614035088

In [38]:
model.score(X_test, y_test)

0.9298245614035088