# Section 1: Naïve Bayes with Categorical Data - Manual Calculation

Return to the worked example in the slides.

In [25]:
import numpy as np
import pandas as pd
from sklearn.naive_bayes import CategoricalNB
from sklearn.preprocessing import OrdinalEncoder

**1. Suppose tomorrow will be mild, rainy, and windy, with high humidity. Should we play golf tomorrow? Show how the answer was calculated.**

In [26]:
df1 = pd.read_csv('PlayGolfNext.csv')
df2 = pd.read_csv('PlayGolf.csv')

In [27]:
print(df1.head())

                  Day   Outlook Temperature Humidity  Windy
0  Day After Tomorrow  Overcast        Cool     High  False
1            Tomorrow     Rainy        Mild     High   True
2               Today     Sunny         Hot   Normal  False


In [28]:
print(df2.head())

    Outlook Temperature Humidity  Windy PlayGolf
0     Rainy         Hot     High  False       No
1     Rainy         Hot     High   True       No
2  Overcast         Hot     High  False      Yes
3     Sunny        Mild     High  False      Yes
4     Sunny        Cool   Normal  False      Yes


In [1]:
#prob of playing = Mild_P * Rainy_P * Windy_P * HighHumidity_P * playGolf
probP = (2/9) * (4/9) * (3/9) * (3/9) * (9/14)
print("Probability of playing:", probP)

#prob of not playing = Mild_N * Rainy_N * Windy_N * HighHumidity_N * notPlayGolf
probNP = (2/5) * (3/5) * (3/5) * (4/5) * (5/14)
print("Probability of not playing:", probNP)

Probability of playing: 0.007054673721340387
Probability of not playing: 0.04114285714285714


We should not play golf tomorrow, as the probility of not playing is higher.

**2. Is the assumption that outlook and humidity are independent reasonable? Explain why or why not.**

Hint: Think about the different values for outlook.

The assumption is not quite reasonable, as there are lots of factors influencing the weather.

# Section 2: Naïve Bayes with Categorical Data - Sklearn

**Again, we are using the example in the slides. The data set for this problem is PlayGolf.csv. We will be using CategoricalNB from sklearn. The link is for sklearn documentation where sample code can be found.**

**1. Temperature and Outlook are ordinal variables. Instead of converting them to dummy variables, they need to be recoded as ordinal variables. Use this link for guidance.**

In [30]:
orderT = ['Cool', 'Mild', 'Hot']  
orderO = ['Sunny', 'Overcast', 'Rainy']  

encoder = OrdinalEncoder(categories = [orderO, orderT])

df2[['Outlook', 'Temperature']] = encoder.fit_transform(df2[['Outlook', 'Temperature']])

**2. The other variables need to be recoded to binary variables. Because Windy is Boolean, it does not need to be recoded.**

In [31]:
df2['Humidity'] = df2['Humidity'].map({'Normal': 0, 'High': 1})

In [32]:
print(df2)

    Outlook  Temperature  Humidity  Windy PlayGolf
0       2.0          2.0         1  False       No
1       2.0          2.0         1   True       No
2       1.0          2.0         1  False      Yes
3       0.0          1.0         1  False      Yes
4       0.0          0.0         0  False      Yes
5       0.0          0.0         0   True       No
6       1.0          0.0         0   True      Yes
7       2.0          1.0         1  False       No
8       2.0          0.0         0  False      Yes
9       0.0          1.0         0  False      Yes
10      2.0          1.0         0   True      Yes
11      1.0          1.0         1   True      Yes
12      1.0          2.0         0  False      Yes
13      0.0          1.0         1   True       No


**3. Fit the data using CategoricalNB.**

In [33]:
X = df2[['Outlook', 'Temperature', 'Humidity', 'Windy']]
y = df2['PlayGolf'].map({'No': 0, 'Yes': 1})

model = CategoricalNB()
model.fit(X, y)
model.predict(X)

array([0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0])

**4. Using the data set, PlayGolfNext.csv, use your Naïve Bayes model to predict the next few days. Which days should you play golf?**

In [34]:
df1[['Outlook', 'Temperature']] = encoder.transform(df1[['Outlook', 'Temperature']])
df1['Humidity'] = df1['Humidity'].map({'Normal': 0, 'High': 1})

X_new = df1[['Outlook', 'Temperature', 'Humidity', 'Windy']]

model.predict(X_new)

array([1, 0, 1])

**5. Does the recommendation (Yes or No to play golf) for today and tomorrow match the class example and your manual prediction above?**

Yes, both types of predictions suggest to play golf today and not to play tomorrow.