**Bayes Theorem**

<br>In a Bayes classifier, we calculate the posterior (technically we only calculate the numerator of the posterior, but ignore that for now) for every class for each observation. 
<br>Then, classify the observation based on the class with the largest posterior value.

$P(class\mid data)={\frac {P(data\mid class)\,P(class)}{P(data)}}$

where:
<br>- class is a particular class (e.g. male)
<br>- data is an observation’s data
<br>- $p(class\mid data)$ is called the posterior
<br>- $p(data\mid class)$ is called the likelihood
<br>- $p(class)$ is called the prior
<br>- $p(data)$ is called the marginal probability

**Why should we use Naive Bayes ?**
<br>- It is easy to build and is particularly useful for very large data sets.
<br>- It is extremely fast for both training and prediction.
<br>- It provide straightforward probabilistic prediction.
<br>- It is often very easily interpretable.
<br>- It has very few (if any) tunable parameters.
<br>- It perform well in case of categorical input variables compared to numerical variable(s). For numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).

**Naive bayes**
<br>is simple classifier known for doing well when only a small number of observations is available.
<br>In this tutorial, i use dataset [Tennis Weather](https://www.kaggle.com/pranavpandey2511/tennis-weather/downloads/tennis-weather.zip) from [Kaggle](https://www.kaggle.com/)

In [1]:
import pandas as pd
import numpy as np

from sklearn.naive_bayes import GaussianNB

In [2]:
data_unlable = pd.read_csv("./tennis.csv")

In [3]:
data_unlable.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 5 columns):
outlook     14 non-null object
temp        14 non-null object
humidity    14 non-null object
windy       14 non-null bool
play        14 non-null object
dtypes: bool(1), object(4)
memory usage: 542.0+ bytes


In [4]:
data_unlable

Unnamed: 0,outlook,temp,humidity,windy,play
0,sunny,hot,high,False,no
1,sunny,hot,high,True,no
2,overcast,hot,high,False,yes
3,rainy,mild,high,False,yes
4,rainy,cool,normal,False,yes
5,rainy,cool,normal,True,no
6,overcast,cool,normal,True,yes
7,sunny,mild,high,False,no
8,sunny,cool,normal,False,yes
9,rainy,mild,normal,False,yes


In [5]:
data = data_unlable.copy()

In [6]:
outlook = {'sunny': 0, 'overcast': 1, 'rainy': 2}
temp = {'hot': 0, 'mild': 1, 'cool': 2}
humidity = {'high': 0, 'normal': 1}

# A dataframe with 100 rows then 'apply' is marginally faster
# For one with 10000 rows then map is 26 times faster
# data.outlook.map(outlook)

data['outlook'] = data.outlook.apply(lambda x: outlook[x])
data['temp'] = data.temp.apply(lambda x: temp[x])
data['humidity'] = data.humidity.apply(lambda x: humidity[x])
# Windy data is boolean True/False
data['windy'] = data.windy.astype(int)

In [7]:
data

Unnamed: 0,outlook,temp,humidity,windy,play
0,0,0,0,0,no
1,0,0,0,1,no
2,1,0,0,0,yes
3,2,1,0,0,yes
4,2,2,1,0,yes
5,2,2,1,1,no
6,1,2,1,1,yes
7,0,1,0,0,no
8,0,2,1,0,yes
9,2,1,1,0,yes


<br>Then we can calculate probability of play tennis (play = yes) with temp:

$p(play_{yes}\mid temp)={\frac{p(temp\mid play_{yes})\,p(play_{yes})}{p(temp)}}$

$P(play_{no}\mid temp)={\frac {P(temp\mid play_{no})\,P(play_{no})}{P(temp)}}$

**Gaussian Naive Bayes Classifier**
<br>In our example, we have one observation to predict and two possible classes play tennis (e.g. yes or no), therefore we will calculate two posteriors: one for yes and one for no.

$posterior(play_{yes})={\frac {p(play_{yes})*p(outlook\mid play_{yes})*p(temp\mid play_{yes})*p(humidity\mid play_{yes})*p(windy\mid play_{yes})}{{marginal probability}}}$

$posterior(play_{no})={\frac {p(play_{no})*p(outlook\mid play_{no})*p(temp\mid play_{no})*p(humidity\mid play_{no})*p(windy\mid play_{no})}{{marginal probability}}}$

In [8]:
# Calculate Priors
# Number of play=yes/play=no
yes = data['play'][data['play'] == 'yes'].count()
no = data['play'][data['play'] == 'no'].count()
total = data['play'].count()

p_yes = yes / total
p_no = no / total

# The means
play_means = data.groupby('play').mean()

# The variance
play_variances = data.groupby('play').var()

In [9]:
p_yes, p_no, play_means, play_variances

(0.6428571428571429,
 0.35714285714285715,
        outlook      temp  humidity     windy
 play                                        
 no    0.800000  0.800000  0.200000  0.600000
 yes   1.111111  1.111111  0.666667  0.333333,
        outlook      temp  humidity  windy
 play                                     
 no    1.200000  0.700000      0.20   0.30
 yes   0.611111  0.611111      0.25   0.25)

In [10]:
# Means for (play=yes)
play_yes_outlook_mean = play_means['outlook'][play_means.index == 'yes'].values[0]
play_yes_temp_mean = play_means['temp'][play_means.index == 'yes'].values[0]
play_yes_humidity_mean = play_means['humidity'][play_means.index == 'yes'].values[0]
play_yes_windy_mean = play_means['windy'][play_means.index == 'yes'].values[0]

# Means for (play=no)
play_no_outlook_mean = play_means['outlook'][play_means.index == 'no'].values[0]
play_no_temp_mean = play_means['temp'][play_means.index == 'no'].values[0]
play_no_humidity_mean = play_means['humidity'][play_means.index == 'no'].values[0]
play_no_windy_mean = play_means['windy'][play_means.index == 'no'].values[0]

# Variances for (play=no)
play_yes_outlook_variance = play_variances['outlook'][play_variances.index == 'yes'].values[0]
play_yes_temp_variance = play_variances['temp'][play_variances.index == 'yes'].values[0]
play_yes_humidity_variance = play_variances['humidity'][play_variances.index == 'yes'].values[0]
play_yes_windy_variance = play_variances['windy'][play_variances.index == 'yes'].values[0]

play_no_outlook_variance = play_variances['outlook'][play_variances.index == 'no'].values[0]
play_no_temp_variance = play_variances['temp'][play_variances.index == 'no'].values[0]
play_no_humidity_variance = play_variances['humidity'][play_variances.index == 'no'].values[0]
play_no_windy_variance = play_variances['windy'][play_variances.index == 'no'].values[0]

In [11]:
# Create a function that calculates p(x|y):
def p_x_given_y(x, mean_y, variance_y):
    # Input the arguments into a probability density function
    p = 1/(np.sqrt(2*np.pi*variance_y)) * np.exp((-(x-mean_y)**2)/(2*variance_y))
    return p

**Apply Bayes Classifier To New Data Point**

In [12]:
# We predict with an observation
observation = pd.DataFrame()
observation['outlook'] = ['overcast']
observation['temp'] = ['mild']
observation['humidity'] = ['normal']
observation['windy'] = 1

observation['outlook'] = observation.outlook.apply(lambda x: outlook[x])
observation['temp'] = observation.temp.apply(lambda x: temp[x])
observation['humidity'] = observation.humidity.apply(lambda x: humidity[x])

observation

Unnamed: 0,outlook,temp,humidity,windy
0,1,1,1,1


In [13]:
# Numerator of the posterior if the unclassified observation is play = yes
p_yes * \
p_x_given_y(observation['outlook'][0], play_yes_outlook_mean, play_yes_outlook_variance) * \
p_x_given_y(observation['temp'][0], play_yes_temp_mean, play_yes_temp_variance) * \
p_x_given_y(observation['humidity'][0], play_yes_humidity_mean, play_yes_humidity_variance) *\
p_x_given_y(observation['windy'][0], play_yes_windy_mean, play_yes_windy_variance)

0.034385195514255944

In [14]:
# Numerator of the posterior if the unclassified observation is play = no
p_no * \
p_x_given_y(observation['outlook'][0], play_no_outlook_mean, play_no_outlook_variance) * \
p_x_given_y(observation['temp'][0], play_no_temp_mean, play_no_temp_variance) * \
p_x_given_y(observation['humidity'][0], play_no_humidity_mean, play_no_humidity_variance) *\
p_x_given_y(observation['windy'][0], play_no_windy_mean, play_no_windy_variance)

0.005955761194251448

Because the numerator of the posterior for yes is greater than no, then we predict that the person is play=yes.
<br>Next, we predict the observation with model

In [15]:
X_train = pd.get_dummies(data[['outlook', 'temp', 'humidity', 'windy']])
y_train = pd.DataFrame(data['play'])

In [16]:
#Create a Gaussian Classifier
model = GaussianNB()

# Train the model using the training sets 
model.fit(X_train, y_train)

  y = column_or_1d(y, warn=True)


GaussianNB(priors=None, var_smoothing=1e-09)

In [17]:
# Predict the observation 
predicted= model.predict(observation)
print (predicted)

['yes']
