
# Naive Bayes Classifier from Scratch
The "naive" part comes from the assumption that **all features are conditionally independent of each other given the class**. This simplifies the likelihood calculation:

$$P(\text{Features} | \text{Class}) = P(f_1 | \text{Class}) \cdot P(f_2 | \text{Class}) \cdot \dots \cdot P(f_n | \text{Class})$$

Where $f\_1, f\_2, \\dots, f\_n$ are the individual features.

Combining these, the Naive Bayes Classifier predicts the class $k$ that maximizes:

$$P(\text{Class } k) \cdot \prod_{i=1}^{n} P(f_i | \text{Class } k)$$
"

In [6]:
import pandas as pd
import numpy as np 


In [7]:
data = {
    'Outlook': ['Rainy', 'Rainy', 'Overcast', 'Sunny', 'Sunny', 'Sunny', 'Overcast', 'Rainy', 'Rainy', 'Sunny', 'Rainy', 'Overcast', 'Overcast', 'Sunny'],
    'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool', 'Mild', 'Mild', 'Mild', 'Hot', 'Mild'],
    'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'High'],
    'Windy': ['False', 'True', 'False', 'False', 'False', 'True', 'True', 'False', 'False', 'False', 'True', 'True', 'False', 'True'],
    'Play Golf': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']
}
df_full = pd.DataFrame(data)
df_full

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play Golf
0,Rainy,Hot,High,False,No
1,Rainy,Hot,High,True,No
2,Overcast,Hot,High,False,Yes
3,Sunny,Mild,High,False,Yes
4,Sunny,Cool,Normal,False,Yes
5,Sunny,Cool,Normal,True,No
6,Overcast,Cool,Normal,True,Yes
7,Rainy,Mild,High,False,No
8,Rainy,Cool,Normal,False,Yes
9,Sunny,Mild,Normal,False,Yes


In [8]:
# Rows 0-11 are training, rows 12-13 are test
df_train = df_full.iloc[0:12]
df_test = df_full.iloc[12:14]

print("--- Training Data ---")
print(df_train)
print("\n--- Test Data ---")
print(df_test)

# Define features and target
features = ['Outlook', 'Temperature', 'Humidity', 'Windy']
target = 'Play Golf'

--- Training Data ---
     Outlook Temperature Humidity  Windy Play Golf
0      Rainy         Hot     High  False        No
1      Rainy         Hot     High   True        No
2   Overcast         Hot     High  False       Yes
3      Sunny        Mild     High  False       Yes
4      Sunny        Cool   Normal  False       Yes
5      Sunny        Cool   Normal   True        No
6   Overcast        Cool   Normal   True       Yes
7      Rainy        Mild     High  False        No
8      Rainy        Cool   Normal  False       Yes
9      Sunny        Mild   Normal  False       Yes
10     Rainy        Mild   Normal   True       Yes
11  Overcast        Mild     High   True       Yes

--- Test Data ---
     Outlook Temperature Humidity  Windy Play Golf
12  Overcast         Hot   Normal  False       Yes
13     Sunny        Mild     High   True        No


In [9]:
yes_train_count = df_train[df_train[target] == 'Yes'].shape[0]
no_train_count = df_train[df_train[target] == 'No'].shape[0]
total_train_samples = df_train.shape[0]

# Calculate prior probabilities
P_Yes = yes_train_count / total_train_samples
P_No = no_train_count / total_train_samples


print(f"Prior Probability P(Play Golf=Yes) (from training): {P_Yes:.2f}")
print(f"Prior Probability P(Play Golf=No) (from training): {P_No:.2f}")




Prior Probability P(Play Golf=Yes) (from training): 0.67
Prior Probability P(Play Golf=No) (from training): 0.33


In [11]:
df_train_yes = df_train[df_train[target] == 'Yes']
df_train_no = df_train[df_train[target] == 'No']

In [12]:
df_train_no

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play Golf
0,Rainy,Hot,High,False,No
1,Rainy,Hot,High,True,No
5,Sunny,Cool,Normal,True,No
7,Rainy,Mild,High,False,No


In [12]:
df_train_yes

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play Golf
2,Overcast,Hot,High,False,Yes
3,Sunny,Mild,High,False,Yes
4,Sunny,Cool,Normal,False,Yes
6,Overcast,Cool,Normal,True,Yes
8,Rainy,Cool,Normal,False,Yes
9,Sunny,Mild,Normal,False,Yes
10,Rainy,Mild,Normal,True,Yes
11,Overcast,Mild,High,True,Yes


In [19]:
def calculate_likelihoods_smoothed(feature_col, class_df, class_count, df_overall):
    likelihoods = {}
    value_counts = class_df[feature_col].value_counts()
    num_possible_values = df_overall[feature_col].nunique() # Total unique values for feature in full dataset

    for value in df_overall[feature_col].unique(): # Iterate over all possible values in the full dataset
        count = value_counts.get(value, 0) # Get count, 0 if not present
        likelihoods[value] = (count + 1) / (class_count + num_possible_values)
    return likelihoods

In [20]:
print("--- Calculated Likelihoods (P(Outlook | Play Golf=Yes)) ---")
print("P_Rainy",2/8)
print("P_Overcast",3/8)
print("P_Sunny",3/8)


calculate_likelihoods_smoothed("Outlook", df_train_yes, yes_train_count, df_full)

--- Calculated Likelihoods (P(Outlook | Play Golf=Yes)) ---
P_Rainy 0.25
P_Overcast 0.375
P_Sunny 0.375


{'Rainy': np.float64(0.2727272727272727),
 'Overcast': np.float64(0.36363636363636365),
 'Sunny': np.float64(0.36363636363636365)}

In [21]:
print("--- Calculated Likelihoods (P(Outlook | Play Golf=No)) ---")
print("P_Rainy",3/4)
print("P_Overcast",0)
print("P_Sunny",1/4)


calculate_likelihoods_smoothed("Outlook", df_train_no, no_train_count, df_full)

--- Calculated Likelihoods (P(Outlook | Play Golf=No)) ---
P_Rainy 0.75
P_Overcast 0
P_Sunny 0.25


{'Rainy': np.float64(0.5714285714285714),
 'Overcast': 0.14285714285714285,
 'Sunny': np.float64(0.2857142857142857)}

### Training in Navie Bayes Algorithm

In [22]:

likelihoods_yes = {}
likelihoods_no = {}

print("--- Likelihoods (P(Feature_Value | Play Golf=Yes)) ---")
for feature in features:
    print(feature)
    likelihoods_yes[feature] = calculate_likelihoods_smoothed(feature, df_train_yes, yes_train_count, df_full)
    print(f"{feature}: {likelihoods_yes[feature]}")


print()
print("--- Likelihoods (P(Feature_Value | Play Golf=No)) ---")
for feature in features:
    likelihoods_no[feature] = calculate_likelihoods_smoothed(feature, df_train_no, no_train_count, df_full)
    print(f"{feature}: {likelihoods_no[feature]}")

--- Likelihoods (P(Feature_Value | Play Golf=Yes)) ---
Outlook
Outlook: {'Rainy': np.float64(0.2727272727272727), 'Overcast': np.float64(0.36363636363636365), 'Sunny': np.float64(0.36363636363636365)}
Temperature
Temperature: {'Hot': np.float64(0.18181818181818182), 'Mild': np.float64(0.45454545454545453), 'Cool': np.float64(0.36363636363636365)}
Humidity
Humidity: {'High': np.float64(0.4), 'Normal': np.float64(0.6)}
Windy
Windy: {'False': np.float64(0.6), 'True': np.float64(0.4)}

--- Likelihoods (P(Feature_Value | Play Golf=No)) ---
Outlook: {'Rainy': np.float64(0.5714285714285714), 'Overcast': 0.14285714285714285, 'Sunny': np.float64(0.2857142857142857)}
Temperature: {'Hot': np.float64(0.42857142857142855), 'Mild': np.float64(0.2857142857142857), 'Cool': np.float64(0.2857142857142857)}
Humidity: {'High': np.float64(0.6666666666666666), 'Normal': np.float64(0.3333333333333333)}
Windy: {'False': np.float64(0.5), 'True': np.float64(0.5)}


In [23]:
df_test

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play Golf
12,Overcast,Hot,Normal,False,Yes
13,Sunny,Mild,High,True,No


### Predictions

In [24]:
for index, row in df_test.iterrows():
    ith_data = row[features].to_dict()
    print("ith_data : ",ith_data)

    true_label = row[target]

    P_Play_Golf_Yes = P_Yes
    P_Play_Golf_No = P_No

    # Calcuate probability for Yes
    p_outlook_yes = likelihoods_yes['Outlook'][ith_data['Outlook']]
    p_temperature_yes = likelihoods_yes['Temperature'][ith_data['Temperature']]
    p_humidity_yes = likelihoods_yes['Humidity'][ith_data['Humidity']]
    p_windy_yes = likelihoods_yes['Windy'][ith_data['Windy']]



    # Calcuate probability for No
    p_outlook_no = likelihoods_no['Outlook'][ith_data['Outlook']]
    p_temperature_no = likelihoods_no['Temperature'][ith_data['Temperature']]
    p_humidity_no = likelihoods_no['Humidity'][ith_data['Humidity']]
    p_windy_no = likelihoods_no['Windy'][ith_data['Windy']]


    P_Play_Golf_Yes = P_Yes * p_outlook_yes * p_temperature_yes * p_humidity_yes * p_windy_yes
    P_Play_Golf_No = P_No * p_outlook_no * p_temperature_no * p_humidity_no * p_windy_no



    if P_Play_Golf_Yes > P_Play_Golf_No:
        predicted_label = 'Yes'
    else:
        predicted_label = 'No'

    print(f"True label: {true_label}, Predicted label: {predicted_label}")

    

ith_data :  {'Outlook': 'Overcast', 'Temperature': 'Hot', 'Humidity': 'Normal', 'Windy': 'False'}
True label: Yes, Predicted label: Yes
ith_data :  {'Outlook': 'Sunny', 'Temperature': 'Mild', 'Humidity': 'High', 'Windy': 'True'}
True label: No, Predicted label: Yes


### Why we remove the step of dividing with Overall Feature P?

Let's look at the full Bayes' Theorem for a given class $k$ and a set of features $F = \{f_1, f_2, \dots, f_n\}$:

$$P(\text{Class } k | F) = \frac{P(F | \text{Class } k) \cdot P(\text{Class } k)}{P(F)}$$

When we're trying to classify a new instance, we calculate this value for **every possible class** and then pick the class that yields the maximum value.

For example, if you have two classes, "Yes" and "No", you'd calculate:

$P(\text{Yes } | F) = \frac{P(F | \text{Yes}) \cdot P(\text{Yes})}{P(F)}$

$P(\text{No } | F) = \frac{P(F | \text{No}) \cdot P(\text{No})}{P(F)}$

Notice that the denominator, $P(F)$, is the **same for both equations**. It's a constant value for the given set of features $F$ you are trying to classify.

When you're comparing two values:

Is $\frac{A}{C} > \frac{B}{C}$?

This is equivalent to asking:

Is $A > B$? (assuming $C$ is a positive constant, which probabilities always are)

Since $P(F)$ is a positive constant, dividing by it doesn't change the **relative order** of the posterior probabilities. If $P(\text{Class } A | F)$ is greater than $P(\text{Class } B | F)$ with $P(F)$ included, it will still be greater without $P(F)$.

Therefore, to save on computation (as calculating $P(F)$ can be complex, often requiring summing over all possible classes), we simplify the comparison to:

**Choose the Class $k$ that maximizes:**

$$P(\text{Class } k) \cdot P(f_1 | \text{Class } k) \cdot P(f_2 | \text{Class } k) \cdot \dots \cdot P(f_n | \text{Class } k)$$

This is why we use the proportionality symbol ($\propto$) in the simplified Naive Bayes formula. We are essentially maximizing the numerator of Bayes' Theorem, as the denominator doesn't affect the final class decision.