## Naive Bayes Classifier

Naive Bayes is a statistical classification technique based on Bayes Theorem with an assumption of independence among the features.

Naive Bayes can work with both categorical and continuous features, assuming independence between them. 

1. For categorical features, categorical Naive Bayes is applied using frequency counts. 

2. For continuous features, usually Gaussian Naive Bayes is applied using probability density functions.



![image.png](attachment:image.png)


P(h): Probability of hypothesis h being true. This is known as the prior probability of h. 

P(D): Probability of the data. This is known as the prior probability.

P(h|D): Probability of hypothesis h given the data D. This is known as posterior probability. 

P(D|h): Probability of data d given that the hypothesis h was true. This is known as posterior probability.

### Steps for Implementation:

1. Prepare the dataset.
2. Calculate prior probabilities for each class (e.g., whether tennis can be played or not).
3. Calculate likelihoods: 𝑃(𝑋𝑖∣𝑌)P(Xi∣Y) for each feature, assuming conditional independence.
4. Predict the class for a new instance using the product of the prior and likelihoods.
5. Evaluate the model.

In [None]:
#Importing Libraries

import pandas as pd
import numpy as np

### 1. Prepare the dataset

In [None]:
#Prepare the dataset

data = {
    'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rain', 'Rain', 'Rain', 'Overcast', 'Sunny', 'Sunny', 'Rain', 'Sunny', 'Overcast', 'Overcast', 'Rain'],
    'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool', 'Mild', 'Mild', 'Mild', 'Hot', 'Mild'],
    'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'High'],
    'Wind': ['Weak', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', 'Weak', 'Strong'],
    'PlayTennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']
}

In [None]:
# Convert to a DataFrame
df = pd.DataFrame(data)
print(df)

### 2. Calculate Prior Probabilities

The prior probability is the probability of a class occurring regardless of the feature values. For example,  the probability of playing tennis (Yes) or not playing tennis (No).

In [None]:
# Prior probability P(PlayTennis)
prior_play_yes = df['PlayTennis'].value_counts()['Yes'] / len(df)
prior_play_no = df['PlayTennis'].value_counts()['No'] / len(df)

print(f"P(PlayTennis=Yes): {prior_play_yes}")
print(f"P(PlayTennis=No): {prior_play_no}")

### 3. Calculate Likelihoods

For each feature value (e.g., Outlook=Sunny, Temperature=Hot, etc.), calculate the likelihood given the class (Yes or No)

![image.png](attachment:image.png)

In [None]:
# Function to calculate likelihood P(feature_value | target_class)

def likelihood(feature, value, target_class):
    feature_given_class = df[(df[feature] == value) & (df['PlayTennis'] == target_class)]
    class_count = df['PlayTennis'].value_counts()[target_class]
    return len(feature_given_class) / class_count

# Calculate likelihoods for a sample instance with PlayTennis='Yes'
likelihood_outlook_yes = likelihood('Outlook', 'Sunny', 'Yes')
likelihood_temperature_yes = likelihood('Temperature', 'Cool', 'Yes')
likelihood_humidity_yes = likelihood('Humidity', 'Normal', 'Yes')
likelihood_wind_yes = likelihood('Wind', 'Weak', 'Yes')

print(f"P(Outlook=Sunny | PlayTennis=Yes): {likelihood_outlook_yes}")
print(f"P(Temperature=Cool | PlayTennis=Yes): {likelihood_temperature_yes}")
print(f"P(Humidity=Normal | PlayTennis=Yes): {likelihood_humidity_yes}")
print(f"P(Wind=Weak | PlayTennis=Yes): {likelihood_wind_yes}")


### 4. Predict using Naive Bayes

Now, for a given set of feature values, we can calculate the posterior probabilities for both classes (Yes or No) and choose the class with the highest probability.

In [None]:
# Function to predict the class (Yes or No) for new data
def predict(outlook, temperature, humidity, wind):
    # Likelihood for PlayTennis=Yes
    yes_likelihood = (likelihood('Outlook', outlook, 'Yes') *
                      likelihood('Temperature', temperature, 'Yes') *
                      likelihood('Humidity', humidity, 'Yes') *
                      likelihood('Wind', wind, 'Yes'))
    
    # Likelihood for PlayTennis=No
    no_likelihood = (likelihood('Outlook', outlook, 'No') *
                     likelihood('Temperature', temperature, 'No') *
                     likelihood('Humidity', humidity, 'No') *
                     likelihood('Wind', wind, 'No'))
    
    # Posterior for PlayTennis=Yes
    posterior_yes = prior_play_yes * yes_likelihood
    
    # Posterior for PlayTennis=No
    posterior_no = prior_play_no * no_likelihood
    
    print(f"Posterior for Yes: {posterior_yes}")
    print(f"Posterior for No: {posterior_no}")
    
    # Predict the class with the highest posterior probability
    if posterior_yes > posterior_no:
        return 'Yes'
    else:
        return 'No'

# Test the prediction on a new instance
prediction = predict('Sunny', 'Cool', 'Normal', 'Weak')
print(f"Prediction for Outlook=Sunny, Temperature=Cool, Humidity=Normal, Wind=Weak: {prediction}")


### 5. Evaluate the Model

In [None]:
# Test data
test_data = [
    ['Sunny', 'Cool', 'Normal', 'Weak'],
    ['Overcast', 'Hot', 'High', 'Weak'],
    ['Rain', 'Mild', 'High', 'Strong'],
    ['Sunny', 'Mild', 'Normal', 'Strong'],
]

# True labels for the test data
test_labels = ['Yes', 'Yes', 'No', 'Yes']



In [None]:
# Evaluate the model
correct_predictions = 0
for i, test_instance in enumerate(test_data):
    prediction = predict(test_instance[0], test_instance[1], test_instance[2], test_instance[3])
    print(f"Predicted: {prediction}, Actual: {test_labels[i]}")
    if prediction == test_labels[i]:
        correct_predictions += 1

accuracy = correct_predictions / len(test_data)
print(f"Model Accuracy: {accuracy * 100:.2f}%")