In [None]:
"""
Q1. A company conducted a survey of its employees and found that 70% of the employees use the
company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the
probability that an employee is a smoker given that he/she uses the health insurance plan?
"""

In [None]:
"""
To solve this problem, we need to use Bayes' theorem:

P(A|B) = P(B|A) * P(A) / P(B)

where A and B are events, and P(A|B) is the conditional probability of A given B.

In this case, let A be the event that an employee is a smoker and B be the event that an employee uses the health insurance plan. We want to find P(A|B), the probability that an employee is a smoker given that he/she uses the health insurance plan.

From the problem, we know that:

P(B) = 0.7 (the proportion of employees who use the health insurance plan)
P(A|B) = ? (what we want to find)
P(B|A) = 0.4 (the proportion of smokers among the employees who use the health insurance plan)
P(A) = ? (unknown)
We need to find P(A), the proportion of employees who are smokers. We can do this by using the law of total probability:

P(A) = P(A|B) * P(B) + P(A|not B) * P(not B)

where not B means the event that an employee does not use the health insurance plan. We know that:

P(not B) = 1 - P(B) = 0.3 (the proportion of employees who do not use the health insurance plan)
P(A|not B) = 0.1 (the proportion of smokers among the employees who do not use the health insurance plan)
Substituting these values into the equation above, we get:

P(A) = P(A|B) * 0.7 + 0.1 * 0.3
P(A) = 0.7P(A|B) + 0.03

Now we can use Bayes' theorem to find P(A|B):

P(A|B) = P(B|A) * P(A) / P(B)
P(A|B) = 0.4 * (0.7P(A|B) + 0.03) / 0.7
P(A|B) = 0.4P(A|B) + 0.012
0.6P(A|B) = 0.012
P(A|B) = 0.02

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.02 or 2%.
"""

In [None]:
"""
Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?
"""

In [None]:
"""
Bernoulli Naive Bayes and Multinomial Naive Bayes are two common variations of the Naive Bayes algorithm used in machine learning for classification tasks. Here are the key differences between them:

Data Representation: Bernoulli Naive Bayes is designed for binary data, where the features can take on only two values, typically 0 or 1, whereas Multinomial Naive Bayes is designed for count-based data, where the features represent the frequency of occurrence of a particular word or term in a document.

Feature Dependency: Bernoulli Naive Bayes assumes that features are conditionally independent of each other given the class variable, meaning that the presence or absence of one feature has no effect on the probability of the presence or absence of any other feature. Multinomial Naive Bayes, on the other hand, allows for dependencies between features, as the count of one feature could influence the likelihood of the count of another feature.

Probability Distribution: Bernoulli Naive Bayes models the distribution of each feature as a Bernoulli distribution, which represents the probability of success or failure, whereas Multinomial Naive Bayes models the distribution of each feature as a Multinomial distribution, which represents the probability of observing a certain count of a word or term.

Performance: Bernoulli Naive Bayes is typically used for text classification tasks where the presence or absence of a term is more important than its frequency, while Multinomial Naive Bayes is better suited for tasks where the frequency of terms is more important than their presence or absence.

In summary, Bernoulli Naive Bayes and Multinomial Naive Bayes differ in their assumptions about data representation, feature dependency, probability distribution, and performance. The choice of which algorithm to use depends on the nature of the data and the requirements of the classification task.
"""

In [None]:
"""
Q3. How does Bernoulli Naive Bayes handle missing values?
"""

In [None]:
"""
Bernoulli Naive Bayes assumes that each feature is binary, taking on the values 0 or 1, and represents the presence or absence of a particular attribute or word in a document. If a feature is missing, the algorithm assumes that the feature is absent, which means it is assigned a value of 0.

In other words, missing values in Bernoulli Naive Bayes are handled by treating them as if the corresponding feature is not present in the document. This can lead to a loss of information and potentially affect the accuracy of the classification model.

To avoid this loss of information, one approach is to impute the missing values with some reasonable estimate, such as the mean or mode of the feature values for the training set. Another approach is to use more advanced imputation techniques, such as k-nearest neighbor (k-NN) or multiple imputation, to estimate the missing values based on the values of the other features.

It is important to note that the approach to handling missing values depends on the nature of the data and the specific requirements of the classification task. In some cases, imputing missing values may improve the accuracy of the model, while in other cases, it may introduce bias or overfitting.
"""

In [None]:
"""
Q4. Can Gaussian Naive Bayes be used for multi-class classification?
"""

In [None]:
"""
Yes, Gaussian Naive Bayes can be used for multi-class classification. In multi-class classification, the goal is to predict a target variable that can take on more than two possible values. Gaussian Naive Bayes is a probabilistic algorithm that models the distribution of each feature in each class as a Gaussian (normal) distribution, and uses Bayes' theorem to calculate the probability of each class given the feature values.

To use Gaussian Naive Bayes for multi-class classification, we can apply the algorithm to each class separately and compare the probabilities to determine which class has the highest probability. This is known as the one-vs-all or one-vs-rest approach, where we train one classifier for each class, and for each new instance, we predict the class with the highest probability among all the classifiers.

Alternatively, we can use the one-vs-one approach, where we train a separate classifier for each pair of classes and combine the predictions to determine the final class. This approach is computationally more intensive, as it requires training a larger number of classifiers, but it can be more accurate in some cases.

In summary, Gaussian Naive Bayes can be used for multi-class classification by applying the algorithm to each class separately and comparing the probabilities to determine the final class.
"""

In [None]:
"""

Q5 .Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.
Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.
Results:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score
Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?
Conclusion:
Summarise your findings and provide some suggestions for future work.
"""

In [22]:
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
# Load the dataset into a pandas DataFrame
df = pd.read_csv("spambase.data", header=None)

# Assign column names to the DataFrame
df.columns = [
    "word_freq_make", "word_freq_address", "word_freq_all", "word_freq_3d",
    "word_freq_our", "word_freq_over", "word_freq_remove", "word_freq_internet",
    "word_freq_order", "word_freq_mail", "word_freq_receive", "word_freq_will",
    "word_freq_people", "word_freq_report", "word_freq_addresses", "word_freq_free",
    "word_freq_business", "word_freq_email", "word_freq_you", "word_freq_credit",
    "word_freq_your", "word_freq_font", "word_freq_000", "word_freq_money",
    "word_freq_hp", "word_freq_hpl", "word_freq_george", "word_freq_650",
    "word_freq_lab", "word_freq_labs", "word_freq_telnet", "word_freq_857",
    "word_freq_data", "word_freq_415", "word_freq_85", "word_freq_technology",
    "word_freq_1999", "word_freq_parts", "word_freq_pm", "word_freq_direct",
    "word_freq_cs", "word_freq_meeting", "word_freq_original", "word_freq_project",
    "word_freq_re", "word_freq_edu", "word_freq_table", "word_freq_conference",
    "char_freq_;", "char_freq_(", "char_freq_[", "char_freq_!", "char_freq_$",
    "char_freq_#", "capital_run_length_average", "capital_run_length_longest",
    "capital_run_length_total", "is_spam"
]


In [28]:
df.head()

Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,...,char_freq_;,char_freq_(,char_freq_[,char_freq_!,char_freq_$,char_freq_#,capital_run_length_average,capital_run_length_longest,capital_run_length_total,is_spam
0,0.0,0.64,0.64,0.0,0.32,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.778,0.0,0.0,3.756,61,278,1
1,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
2,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1


In [29]:
X = df.iloc[:,:-1]

In [30]:
y = df.iloc[:,-1]

In [31]:
y

0       1
1       1
2       1
3       1
4       1
       ..
4596    0
4597    0
4598    0
4599    0
4600    0
Name: is_spam, Length: 4601, dtype: int64

In [32]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

In [36]:



# Create an instance of the Bernoulli Naive Bayes classifier
bnb = BernoulliNB()

# Define a grid of hyperparameters to search over
param_grid = {'alpha': [0.1, 1.0, 10.0],
              'binarize': [0.0, 0.5, 1.0]}

# Use GridSearchCV to exhaustively search over the hyperparameter grid and find the best combination of hyperparameters
grid1 = GridSearchCV(bnb, param_grid, cv=10, scoring='accuracy')
grid1.fit(X_train, y_train)




In [37]:
grid1.best_params_

{'alpha': 0.1, 'binarize': 0.5}

In [40]:
bernolli_clr =  BernoulliNB(alpha=0.1, binarize= 0.5)
bernolli_clr.fit(X_train, y_train)

In [41]:
y_pred_bernolli = bernolli_clr.predict(X_test)

In [43]:
from sklearn.metrics import accuracy_score, confusion_matrix

In [44]:
conf_mat = confusion_matrix(y_test,y_pred_bernolli)
conf_mat

array([[747,  75],
       [ 75, 484]])

In [45]:
true_positive = conf_mat[0][0]
false_positive = conf_mat[0][1]
false_negative = conf_mat[1][0]
true_negative = conf_mat[1][1]

In [46]:
Accuracy = (true_positive + true_negative) / (true_positive +false_positive + false_negative + true_negative)
Accuracy

0.8913830557566981

In [47]:
Accuracy = (true_positive + true_negative) / (true_positive +false_positive + false_negative + true_negative)
Accuracy

0.8913830557566981

In [48]:
Precision = true_positive/(true_positive+false_positive)
Precision

0.9087591240875912

In [49]:
Recall = true_positive/(true_positive+false_negative)
Recall

0.9087591240875912

In [50]:
F1_Score = 2*(Recall * Precision) / (Recall + Precision)
F1_Score

0.9087591240875912

In [58]:
# Create an instance of the Bernoulli Naive Bayes classifier
mnb = MultinomialNB()

# Define a grid of hyperparameters to search over
param_grid = {'alpha': [0.1, 1.0, 10.0]
              }

# Use GridSearchCV to exhaustively search over the hyperparameter grid and find the best combination of hyperparameters
grid2 = GridSearchCV(mnb, param_grid, cv=10, scoring='accuracy')
grid2.fit(X_train, y_train)

In [59]:
grid2.best_params_

{'alpha': 0.1}

In [61]:
Multinomial_clr =  MultinomialNB(alpha=0.1)
Multinomial_clr.fit(X_train, y_train)

In [63]:
y_pred_Multinomial = Multinomial_clr.predict(X_test)

In [66]:
conf_mat = confusion_matrix(y_test,y_pred_Multinomial)
conf_mat

true_positive = conf_mat[0][0]
false_positive = conf_mat[0][1]
false_negative = conf_mat[1][0]
true_negative = conf_mat[1][1]





In [67]:
Accuracy = (true_positive + true_negative) / (true_positive +false_positive + false_negative + true_negative)
Accuracy


0.8110065170166546

In [68]:
Precision = true_positive/(true_positive+false_positive)
Precision



0.851581508515815

In [69]:
Recall = true_positive/(true_positive+false_negative)
Recall

0.834326579261025

In [70]:
F1_Score = 2*(Recall * Precision) / (Recall + Precision)
F1_Score

0.8428657435279953

In [71]:


# Create an instance of the Gaussian Naive Bayes classifier
gnb = GaussianNB()

# Define a grid of hyperparameters to search over
param_grid = {'var_smoothing': [1e-10, 1e-9, 1e-8, 1e-7]}

# Use GridSearchCV to exhaustively search over the hyperparameter grid and find the best value of var_smoothing
grid3 = GridSearchCV(gnb, param_grid, cv=10, scoring='accuracy')
grid3.fit(X, y)

In [72]:
grid3.best_params_

{'var_smoothing': 1e-09}

In [74]:
 Gaussian_clr = GaussianNB(var_smoothing=1e-09)
 Gaussian_clr.fit(X_train, y_train)

In [75]:
y_pred_Gaussian =  Gaussian_clr.predict(X_test)

In [79]:
conf_mat = confusion_matrix(y_test,y_pred_Gaussian)
conf_mat



array([[597, 225],
       [ 34, 525]])

In [80]:
true_positive = conf_mat[0][0]
false_positive = conf_mat[0][1]
false_negative = conf_mat[1][0]
true_negative = conf_mat[1][1]


In [81]:
Accuracy = (true_positive + true_negative) / (true_positive +false_positive + false_negative + true_negative)
Accuracy

0.8124547429398986

In [82]:
Precision = true_positive/(true_positive+false_positive)
Precision



0.7262773722627737

In [83]:
Recall = true_positive/(true_positive+false_negative)
Recall

0.9461172741679873

In [84]:
F1_Score = 2*(Recall * Precision) / (Recall + Precision)
F1_Score

0.8217481073640742