### Q1. A company conducted a survey of its employees and found that 70% of the employees use thecompany's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?
Ans: \
To solve this problem, we can apply **Bayes' Theorem**. We're given the following information:

- \( P(H) = 0.70 \): The probability that an employee uses the company's health insurance plan.
- \( P(S|H) = 0.40 \): The probability that an employee is a smoker given that they use the health insurance plan.
- We need to find \( P(S|H) \), which is the probability that an employee is a smoker given that they use the health insurance plan.

### Applying Bayes' Theorem:

We can directly use the **conditional probability** formula from Bayes' Theorem. Bayes' Theorem in this context states:

$$[
P(S|H) = \frac{P(S \cap H)}{P(H)}
]$$

Where:
- \( P(S|H) \) is the probability that an employee is a smoker given that they use the health insurance plan (this is what we want to find).
- $( P(S \cap H) )$ is the probability that an employee is both a smoker and uses the health insurance plan.
- \( P(H) \) is the probability that an employee uses the health insurance plan.

Given:
- \( P(S|H) = 0.40 \)
- \( P(H) = 0.70 \)

Now, we can rewrite this as:

$$[
P(S \cap H) = P(S|H) \times P(H) = 0.40 \times 0.70 = 0.28
]$$

So, the probability that an employee is both a smoker and uses the health insurance plan is \( 0.28 \) (28%).

---

### Conclusion:
The probability that an employee is a smoker given that they use the health insurance plan is **0.28** or **28%**.

### Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?
Ans: \

###  **1. Bernoulli Naive Bayes**:
**Best for binary features (0/1)**

- **Features**: Assumes that features are binary (i.e., they take values of 0 or 1). This means each feature represents the presence or absence of a characteristic.
- **Model Assumption**: Each feature is treated as a Bernoulli random variable (which is a special case of a binomial distribution), where the outcome is either 0 or 1.
- **Likelihood**: For a given class, the likelihood of the features is computed based on whether each feature is present (1) or not (0).
- **Example Use Case**: Classifying spam/ham emails based on whether certain words appear in the email (presence/absence of words).

**Formula** for Bernoulli Naive Bayes:
$$[
P(X_i = 1 | C) = P(\text{feature } i \text{ appears given class } C)
]$$
$$[
P(X_i = 0 | C) = 1 - P(\text{feature } i \text{ appears given class } C)
]$$

---

###  **2. Multinomial Naive Bayes**:
**Best for count-based features**

- **Features**: Assumes that features are **counts** or **frequencies**. Features can take any integer value, usually representing the number of times a specific event occurred.
- **Model Assumption**: The likelihood of the features is modeled as a **multinomial distribution**. This distribution is used when we have categorical data with more than two possible outcomes and where the data represents counts of occurrences.
- **Likelihood**: For a given class, the likelihood of the features is computed based on the frequency of each feature (e.g., the count of words in a document).
- **Example Use Case**: Text classification tasks such as spam filtering, where features are word counts.

**Formula** for Multinomial Naive Bayes:
$$[
P(X_1, X_2, \dots, X_n | C) = \prod_{i=1}^{n} P(X_i | C)^{X_i}
]$$
Where $(X_i)$ is the count of feature \(i\) (e.g., word occurrences) for class \(C\).

---

###  **Key Differences**:

| Aspect                        | **Bernoulli Naive Bayes**                                | **Multinomial Naive Bayes**                              |
|-------------------------------|----------------------------------------------------------|----------------------------------------------------------|
| **Feature Type**               | Binary (0 or 1) - presence or absence of a feature.      | Count-based (integer values) - frequency of a feature.    |
| **Data Assumption**            | Assumes binary features (e.g., words appearing or not). | Assumes feature counts (e.g., word frequencies in text).  |
| **Likelihood Calculation**     | Uses Bernoulli distribution (success/failure).           | Uses Multinomial distribution (counts of events).         |
| **Typical Use Case**           | Email classification, sentiment analysis with binary features (e.g., presence/absence of words). | Text classification tasks with word frequency counts, document classification. |
| **Formula**                    | $( P(X_i = 1 | C) ) or ( P(X_i = 0 | C) )$ based on presence or absence. | $( P(X_i | C)^{X_i} )$, based on the count of occurrences. |

---

###  **When to Use Which?**
- **Bernoulli Naive Bayes**: Use when the features are binary. This is useful for text classification when you're interested in whether a word appears in a document or not (presence/absence).
- **Multinomial Naive Bayes**: Use when the features are count-based. This is useful when you have features that represent counts, such as the number of occurrences of words in a document.

---

### Example:

Consider a text classification problem where you want to classify documents as either "sports" or "politics":

- **Bernoulli Naive Bayes**: Treats each word in the document as a binary feature (e.g., "Is the word 'football' present in the document? Yes/No?").
- **Multinomial Naive Bayes**: Treats each word as a feature with a count (e.g., "How many times does the word 'football' appear in the document?").

### Q3. How does **Bernoulli Naive Bayes** handle missing values?
Ans: \
**Bernoulli Naive Bayes** assumes that the features are binary, meaning each feature is either present (1) or absent (0). When it comes to **missing values**, there are a few approaches to handle them, though the algorithm does not have a direct built-in method for handling missing data.

Here are some common strategies to handle missing values when using **Bernoulli Naive Bayes**:

1. **Imputation**:
   - **Impute Missing Values**: You can replace missing values with a default value, like the **mode** (most frequent value) of the feature, or use statistical imputation methods such as mean, median, or mode imputation.
   - In the case of Bernoulli Naive Bayes, you could replace missing features with the most frequent value (either 0 or 1), assuming the missingness is not systematically related to the target class.

2. **Removing Instances**:
   - **Remove Instances with Missing Features**: If the number of missing values is small, a simple approach is to **drop** the instances (rows) with missing values.

3. **Using a "Missing" Category**:
   - **Treat Missing as a Separate Category**: You could also treat the missing value as a separate category (an additional binary value, like "missing = 1"). However, this approach may not always make sense depending on the context and dataset.

4. **Handling During Model Training**:
   - If missing values are sparse and do not significantly affect the training data, you can sometimes ignore them. However, this is generally not recommended because ignoring missing data can result in biased estimates.

In summary, **Bernoulli Naive Bayes** does not handle missing values automatically. You need to apply one of the above strategies to handle them before applying the classifier.

---

### Q4. Can **Gaussian Naive Bayes** be used for **multi-class classification**?
Ans: \
Yes, **Gaussian Naive Bayes** can be used for **multi-class classification**.

- **Gaussian Naive Bayes** is an extension of Naive Bayes used when the features are **continuous** and assumed to follow a **Gaussian (normal) distribution**. Each feature in a given class is modeled by a Gaussian distribution, where the mean and variance are estimated from the data for each class.

- **Multi-Class Classification**:
  - Naive Bayes classifiers (including **Gaussian Naive Bayes**) are naturally suited for **multi-class classification**. In fact, Naive Bayes works by computing a **posterior probability** for each class, and the class with the highest posterior probability is predicted.
  - For multi-class classification, the algorithm computes the likelihood for each class and the **class with the highest probability** is selected.

In the context of **Gaussian Naive Bayes**, the following happens during multi-class classification:
1. The model calculates the likelihood of each feature being in the Gaussian distribution of each class.
2. For each class, the conditional probability of observing the feature values (given the class) is calculated using the probability density function (PDF) of the Gaussian distribution.
3. The class with the highest posterior probability is chosen as the predicted class.

So, **Gaussian Naive Bayes** can handle **multi-class classification** by computing the posterior probabilities for each class and selecting the one with the highest probability.