#### Naive Bayes

Is a supervised classification algorithm.

Before we jump into Naive Bayes let's review a few probability concepts.



If A and B are two events of an experiment then

1) $ 0 \leq P(A) \leq 1 $ likewise $ 0 \leq P(B) \leq 1.$

2) $ P(A) + P(B) = 1.$

3) $ P(A') = 1 - P(A) $ here $A'$ is the complement of A. Some use $A^{c}$ to represent complement of A.


If we flip two coins then what is the probability of:

1) Getting two heads.

2) One head and one tail.

3) At least one tail.

In [None]:
"""
In-class activity: If you roll two die, what is the probability of:

1) Getting two even numbers.

2) Getting a sum of 7.

3) Getting a sum divisible by 5.
"""

If two events are independent then

$ P(A \cap B) = P(A)*P(B).$



Let's consider a few examples:

1) Two cards are drawn from a deck of 52 cards with replacement. What is the probability of choosing a king and then a nine?

2) A bowl contains 3 red, 4 green and 8 blue marbles. Three marbles are drawn from the bowl with replacement. What is the probability of choosing a blue, a red and a green?


In [None]:
"""
In-class activity: If you roll two die, what is the probability of:

1) Getting a 4 on the first roll and a 3 on the second roll.

2) Getting an even number on the first roll and a number divisible by 3.

3) Getting a sum divisible by 5 or a sum divisible by 6
"""

Conditional probability of A given B is
    
$ P(A|B) = \frac{P(A \cap B)}{P(A)} $

Let's consider a few examples:

1) If you pick a card from a deck of 52 cards, then what is the probability of getting an ace given it is a diamond?

2) Consider the table below. What is the probability that a person chosen at random from the below group is a teacher given that they are a female?

<img src="conditional1.png", width=300, height=200>

In [None]:
"""
In-class activity: If you roll a dice once, what is the probability of:

1) Getting a 4 given that the outcome is even. 

2) Getting 4 given that the outcome is odd.

3) Getting 6 given that the outcome is divisible by 3. 
"""

#### Naive Bayes

Is a probabilistic classifier technique.

It is fast and scalable. Used for binary and for multi class classification. 

It assumes that every feature is unrealted to other features.
This is the disadvantage of this model as in real life, features might not be unrelated to each other. 


Where is Naive Bayes used:

1) Text classification

2) Recommendation system

3) Weather forecasting and more.

#### Naive Bayes formula

<img src="bayes1.png", width = 300, height=200>

<img src="bayes1.png", width = 300, height=200>


References:
https://www.machinelearningplus.com/predictive-modeling/how-naive-bayes-algorithm-works-with-example-and-full-code/

Let's consider the golf data set.

<img src="golf.png">

Let's create a likelihood table

<img src="likelihood.png"> 

We want to check the claim: Players will play if the weather is Rainy. 

$P(yes | rainy) = \frac{P(rainy | yes) * P(yes)}{P(rainy)} $

From the table we know 

$p(rainy) = 5/14 = 0.36$

$p(yes) = 9/14 = 0.64 $

$p(rainy | yes) = 2/14 = 0.14$

let's plug these values into the Bayes equation.

$ P(yes | rainy) = \frac{0.14 * 0.64}{0.36} = 0.248 = 24.8 % $ 

This probability is very low, the claim is false. That means, players are very unlikely to play when the weather is rainy.


In [None]:
"""
In-class activity: Using the above table can you compute the probability 
for the following:

1) Claim: Players will not play when the weather is overcast.

2) Claim: Players will play when the weather is sunny. 
"""


In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()

In [None]:
from sklearn import datasets

#Load dataset
iris = datasets.load_iris()

In [None]:
print(type(iris))

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size = 0.2, random_state=1)

In [None]:
#Import Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB

#Create a Gaussian Classifier
gnb = GaussianNB()

#Train the model using the training sets
gnb.fit(x_train, y_train)

#Predict the response for test dataset
y_pred = gnb.predict(x_test)

In [None]:
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics

# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

In [None]:
"""
In-class activity:

Apply Naive Bayes to the income dataset that you used for the last 
homework 2 and find the accuracy.
"""