**Aim: Implement Naive Bayes classifier : Whether Example**

# Step 1: Import necessary libraries.
We will use preprocessing and naive bayes libraries of sklearn

In [1]:
from sklearn import preprocessing
from sklearn.naive_bayes import GaussianNB, MultinomialNB
import pandas as pd

# Step 2: Prepare dataset.
Create feature set for weather and temperature, and classlabel play.

In [2]:
weather = ['Sunny', 'Sunny', 'Overcast', 'Rainy', 'Rainy','Rainy', 'Overcast',
           'Sunny', 'Sunny', 'Rainy', 'Sunny', 'Overcast', 'Overcast', 'Rainy']

temp = ['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild',
        'Cool','Mild','Mild','Mild','Hot','Mild']

play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes',
      'Yes','Yes','Yes','Yes','No']

In [3]:
df = pd.DataFrame(list(zip(weather,temp,play)), columns=['Weather', 'Temperature', 'Play'])
df.head()

Unnamed: 0,Weather,Temperature,Play
0,Sunny,Hot,No
1,Sunny,Hot,No
2,Overcast,Hot,Yes
3,Rainy,Mild,Yes
4,Rainy,Cool,Yes


# Step 3: Digitize the data set using encoding

In [4]:
#creating labelEncoder
le = preprocessing.LabelEncoder() # apply different label

# Converting string labels into numbers.
weather_encoded=le.fit_transform(weather)
print("Weather:" ,weather_encoded)

Weather: [2 2 0 1 1 1 0 2 2 1 2 0 0 1]


In [5]:
temp_encoded=le.fit_transform(temp)
label=le.fit_transform(play)

print("Temp:",temp_encoded)
print("Play:",label)

Temp: [1 1 1 2 0 0 0 2 0 2 2 2 1 2]
Play: [0 0 1 1 1 0 1 0 1 1 1 1 1 0]


# Step 4: Merge different features to prepare dataset

In [6]:
#Combinig weather and temp into single listof tuples
features=tuple(zip(weather_encoded,temp_encoded))
print("Features:",features)

Features: ((2, 1), (2, 1), (0, 1), (1, 2), (1, 0), (1, 0), (0, 0), (2, 2), (2, 0), (1, 2), (2, 2), (0, 2), (0, 1), (1, 2))


# Step 5: Train ’Naive Bayes Classifier’

In [7]:
#Create a Classifier
model=MultinomialNB()
# Train the model using the training sets
model.fit(features,label)

MultinomialNB()

#Step 6: Predict Output for new data

In [8]:
#Predict Output
predicted= model.predict([[0,2]]) # 0:Overcast, 2:Mild
print("Predicted Value:", predicted)

Predicted Value: [1]


In [9]:
predicted= model.predict([[0,1]]) # 0:Overcast, 1:Hot
print("Predicted Value:", predicted)

Predicted Value: [1]


In [10]:
predicted= model.predict([[2,2]]) # 2:Sunny, 2:Mild

print("Predicted Value:", predicted)

Predicted Value: [1]


# Exercise:

**Manually calculate output for the following cases and compare it with system’s output.**

1. Will you play if the temperature is 'Hot' and weather is 'overcast'?

2. Will you play if the temperature is 'Mild' and weather is 'Sunny'?










In [11]:
print(df.shape)
df.head(14)

(14, 3)


Unnamed: 0,Weather,Temperature,Play
0,Sunny,Hot,No
1,Sunny,Hot,No
2,Overcast,Hot,Yes
3,Rainy,Mild,Yes
4,Rainy,Cool,Yes
5,Rainy,Cool,No
6,Overcast,Cool,Yes
7,Sunny,Mild,No
8,Sunny,Cool,Yes
9,Rainy,Mild,Yes


- A - Weather
- B - Temp
- C - Play

- P(C=Yes) = 9/14
- P(C=No) = 5/14

- P(A=Overcast) = 4/14
- P(A=Rainy) = 5/14
- P(A=Sunny) = 5/14

- P(B=Hot) = 4/14
- P(B=Mild) = 6/14
- P(B=Cool) = 4/14

- P(A=Overcast/C=Yes) = 4/9
- P(A=Rainy/C=Yes) = 3/9
- P(A=Sunny/C=Yes) = 2/9

- P(A=Overcast/C=No) = 0
- P(A=Rainy/C=No) = 2/5
- P(A=Sunny/C=No) = 3/5

- P(B=Hot/C=Yes) = 2/9
- P(B=Mild/C=Yes) = 4/9
- P(B=Cool/C=Yes) = 3/9

- P(B=Hot/C=No) = 2/5
- P(B=Mild/C=No) = 2/5
- P(B=Cool/C=No) = 1/5

# Exercise-1

**P(C=Yes/A=Overcast,B=Hot) = ( P(A=Overcast/C=Yes) * P(B=Hot/C=Yes) * P(C=Yes) ) / (P(A=Overcast,B=Hot))**
- (0.45 * 0.23 * 0.64) / (P(A=Overcast,B=Hot))
- (0.066) / (P(A=Overcast,B=Hot))

Doing `marginalization` for denominator (model parameters)

**P(A=Overcast,B=Hot) = P(A=Overcast,B=Hot,C=Yes) + P(A=Overcast,B=Hot,C=No)**
- (0.45 * 0.23 * 0.64) + (0 * 0.4 * 0.36)
- 0.066

**Final Answer**
- (0.066) / (P(A=Overcast,B=Hot))
- (0.066) / (0.066)
- 1

In [12]:
print("Expected O/P Probability = 1, Yes")
predicted = model.predict([[0,1]])
print("Predicted Value:", predicted)

Expected O/P Probability = 1, Yes
Predicted Value: [1]


# Exercise-2

**P(C=Yes/A=Sunny,B=Mild) = ( P(A=Sunny/C=Yes) * P(B=Mild/C=Yes) * P(C=Yes) ) / (P(A=Sunny,B=Mild))**
- (0.23 * 0.45 * 0.64) / (P(A=Sunny,B=Mild))
- (0.066) / (P(A=Sunny,B=Mild))

Doing `marginalization` for denominator (model parameters)

**P(A=Sunny,B=Mild) = P(A=Sunny,B=Mild,C=Yes) + P(A=Sunny,B=Mild,C=No)**
- (0.23 * 0.45 * 0.64) + (0.6 * 0.4 * 0.36)
- (0.066) + (0.086)
- 0.152

**Final Answer**
- (0.066) / (P(A=Sunny,B=Mild))
- (0.066) / (0.152)
- 0.43

In [13]:
print("Expected O/P Probability = 0.43, No")
predicted= model.predict([[2,2]]) # 2:Sunny, 2:Mild
print("Predicted Value:", predicted)

Expected O/P Probability = 0.43, No
Predicted Value: [1]
