# Naive Bayes
## Baye's Theorem & its Building Block

In this module, you will learn about another type of supervised classification, i.e. **Naive Bayes**. Naive Bayes is a **probabilistic classifier** which **returns the probability of a test point belonging to a class** rather than the label of the test point.

### Conditional Probability

Before you get into Bayes’ theorem, let's understand conditional probability and try to understand its intuition.

 ![91.png](attachment:3cce11fb-e35d-451a-93f1-539c2fc30a77.png)

 ![92.png](attachment:8b165111-3b18-42a9-9d1c-5ec0f8abcebe.png)

 

## Bayes' Theorem

In this section, you will understand **Bayes’ Theorem** for **calculating** the **conditional probability**. The example we will use now is something almost all of us can relate to which is cricket.



While watching cricket matches on TV, you may have seen statistics similar to this: “India wins 70% matches when Tendulkar scores a century.” Sounds like conditional probability? 

This is a classic example of how conditional probability can be used to estimate the chances of an event taking place, given certain other events that have happened.



Suppose that India plays 100 matches, out of which it wins 60 and loses 40. Also, Sachin Tendulkar plays these 100 matches, scores a century in 12 of them, and doesn't score a century in the rest 88.



To make things interesting, you also have this additional information: out of the 60 games that India wins, Sachin scores a century in 10, and out of the 40 games that India loses, Sachin scores a century only in two.



Let us look at how the two-way contingency matrix will look like for the above case :

![93.png](attachment:51bedac4-5407-4654-96e7-6c0f6c5751a6.png)

Now, can you answer this question: **what is the probability that India wins, given that Sachin has scored a century?**

![94.png](attachment:468aa987-656d-4e33-a4e8-48223325dd33.png)


**Bayes' Theorem is a simple way to update what you know about something after getting new information.**

**Here’s how to think about it:**

- Imagine you have an initial belief about the chance of something happening (for example, a patient being sick).
- You get some new evidence (like a test result).
- Bayes’ Theorem helps you combine your initial belief and the new evidence to get an updated probability (how likely it is the patient is sick after seeing the test result)

**In formula terms:**
$$
P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}
$$
- P(A) : your initial guess (prior probability).
- P(B|A) : how likely the new information is, assuming your guess was correct.
- P(B) : overall likelihood of the new information.
- P(A|B) : the updated probability after seeing the new info (posterior probability).

In simple words: **It’s a way to update your beliefs based on new evidence**.


### Naive Bayes for Categorical Data

You will understand Naive Bayes classifier through an example of mushrooms where the aim is to classify a new mushroom into edible or poisonous class. You will also learn that this algorithm uses probability to do such classification. You will follow the steps given below through the course of this session:

- Bayes Theorem 
- Naïve Bayes on Categorical data

### Naive Bayes - With One Feature

**Naïve Bayes** is a **probabilistic classifier** that returns the probability of a test point belonging to a class, **using Bayes’ theorem**. 

Please find the Mushroom Dataset [here](https://ml-course2-upgrad.s3.amazonaws.com/Naive+Bayes/Naive+Bayes+For+Categorical+Data/Mushroom+Subset.xlsx) 

![95.png](attachment:a80b7502-ab3d-40ad-bab7-4492cc7be3ad.png)

You will now implement Naïve Bayes on this mushroom dataset and try to classify a new test point into either of the two classes – edible or poisonous. 

As the name suggests, the Naive Bayes algorithm uses Bayes’ theorem to classify new test points. 

But, how exactly does it do this? Let’s find out.

![96.png](attachment:ba6b8dfb-59be-420d-b307-a089dc6a6766.png)

- The **effect of the denominator P(x) is not incorporated** while calculating probabilities as it is the same for both the classes and hence, can be ignored without affecting the final outcome.

- The class assigned to the new test point is the class for which  Equation is greater.

![97.png](attachment:f88f1bac-da16-4c24-bd91-1920e9744eaf.png)


## MCQ 

<div class="text_component"><p><strong><span style="font-size: 14px;">Comprehension - Part 1</span></strong><span style="font-size: 
      14px;"><br><br><br></span></p><p><span style="font-size: 
      14px;"><strong>Comprehension: Naive Bayes With One Feature</strong><br><br></span></p><div align="center"><table border="1" cellpadding="0" cellspacing="1"><tbody><tr><td colspan="3"><p><span style="font-size: 
      14px;">Table 1: Mushroom Dataset with One Feature</span></p></td></tr><tr><td><p><span style="font-size: 
      14px;"><strong>S.No</strong></span></p></td><td><p><span style="font-size: 
      14px;"><strong>Type of mushroom</strong></span></p></td><td><p><span style="font-size: 
      14px;"><strong>Cap shape</strong></span></p></td></tr><tr><td><p><span style="font-size: 
      14px;">1.</span></p></td><td><p><span style="font-size: 
      14px;">Poisonous</span></p></td><td><p><span style="font-size: 
      14px;">Convex</span></p></td></tr><tr><td><p><span style="font-size: 
      14px;">2.</span></p></td><td><p><span style="font-size: 
      14px;">Edible</span></p></td><td><p><span style="font-size: 
      14px;">Convex</span></p></td></tr><tr><td><p><span style="font-size: 
      14px;">3.</span></p></td><td><p><span style="font-size: 
      14px;">Poisonous</span></p></td><td><p><span style="font-size: 
      14px;">Convex</span></p></td></tr><tr><td><p><span style="font-size: 
      14px;">4.</span></p></td><td><p><span style="font-size: 
      14px;">Edible</span></p></td><td><p><span style="font-size: 
      14px;">Convex</span></p></td></tr><tr><td><p><span style="font-size: 
      14px;">5.</span></p></td><td><p><span style="font-size: 
      14px;">Edible</span></p></td><td><p><span style="font-size: 
      14px;">Convex</span></p></td></tr><tr><td><p><span style="font-size: 
      14px;">6.</span></p></td><td><p><span style="font-size: 
      14px;">Poisonous</span></p></td><td><p><span style="font-size: 
      14px;">Convex</span></p></td></tr><tr><td><p><span style="font-size: 
      14px;">7.</span></p></td><td><p><span style="font-size: 
      14px;">Edible</span></p></td><td><p><span style="font-size: 
      14px;">Bell</span></p></td></tr><tr><td><p><span style="font-size: 
      14px;">8.</span></p></td><td><p><span style="font-size: 
      14px;">Edible</span></p></td><td><p><span style="font-size: 
      14px;">Bell</span></p></td></tr><tr><td><p><span style="font-size: 
      14px;">9.</span></p></td><td><p><span style="font-size: 
      14px;">Edible</span></p></td><td><p><span style="font-size: 
      14px;">Convex</span></p></td></tr><tr><td><p><span style="font-size: 
      14px;">10.</span></p></td><td><p><span style="font-size: 
      14px;">Poisonous</span></p></td><td><p><span style="font-size: 
      14px;">Convex</span></p></td></tr><tr><td><p><span style="font-size: 
      14px;">11.</span></p></td><td><p><span style="font-size: 
      14px;">Edible</span></p></td><td><p><span style="font-size: 
      14px;">Flat</span></p></td></tr><tr><td><p><span style="font-size: 
      14px;">12.</span></p></td><td><p><span style="font-size: 
      14px;">Edible</span></p></td><td><p><span style="font-size: 
      14px;">Bell</span></p></td></tr></tbody></table></div><p><span style="font-size: 
      14px;">&nbsp;</span></p><p><span style="font-size: 
      14px;">Consider the table shown above. &nbsp;There are two types of mushrooms, edible and poisonous, which is the target (dependent) variable. &nbsp;They have various kinds of cap-shapes. Out of the total 12 mushrooms, eight are edible and four poisonous.</span></p><p><span style="font-size: 
      14px;">&nbsp;</span></p><p><span style="font-size: 
      14px;">You want to train Naive Bayes using this data so that it can predict whether a given (new) mushroom is edible or poisonous. The task is to classify a mushroom as edible/poisonous.</span></p></div>


#### Q1. What is the feature in this task?

- [ ] Type of Mushroom
- [ ] Cap-Shape

** Comprehension - Part 2**

Say you represent the two class labels as ![](![image.png](attachment:15bfe81c-f99c-4e64-9fab-19a72c5981fe.png)![image.png](attachment:3d4d25ba-9794-465d-b4d4-b271b5b3aa5a.png)), where k = 1 represents edible, and k = 2 represents poisonous. The task is to predict the probability of a mushroom belonging to C1 or C2 using the feature ‘cap-type’. You can represent the class either as C1, C2 or C = edible/poisonous.

The feature ‘cap-shape’ is represented by X, i.e. X can take the values CONVEX, FLAT, BELL, etc. In the following questions, you will break down each term of the Bayes Theorem and understand them individually.


#### Q2. The probability of a CONVEX mushroom being edible, P(C = edible | X = CONVEX) is given by:

- [x] P( X = CONVEX | C = edible) . P(C = edible) / P(X = CONVEX)
- [ ] P( X = CONVEX | C = edible) . P(X = CONVEX) / P(C = edible)
- [ ] P(C = edible | X = CONVEX ) . P(X = CONVEX) / P(C = edible)
- [ ] None of the above

#### Q3. The value of P(C = edible) is simply the number of edible mushrooms in the dataset divided by the total observations. What is the value of P( C = edible)?

- [x] 8/12
- [ ] 7/14
- [ ] 8/14
- [ ] 7/12

**Comprehension - Part 3**

So you noticed that P(C = edible) is 8/12 = 66.66%. This means that approx. 66.66% of all mushrooms are edible. Note that P(C = edible) appears in the numerator of the Bayes expression and this value is directly proportional to the chances of a mushroom being edible. Let’s understand the other two terms in the Bayes expression. 

#### Q4. Now let’s say you picked a new mushroom whose cap-shape is CONVEX. What are the chances of this happening, i.e. what is the value of P(X = CONVEX)?

- [ ] 2/12
- [x] 8/12
- [ ] 4/12
- [ ] Can not be calculated


#### Q5. What is the probability of the mushroom being CONVEX given it is edible, i.e. P(X = CONVEX | C = edible)? This is the fraction of CONVEX mushrooms out of all the edible ones.

- [ ] 8/12
- [ ] 3/8
- [ ] 4/12
- [x] 4/8



#### Q6. In the previous questions, you have calculated that P(C = edible) is 8/12, P(X = CONVEX) is 8/12 and  P(X = CONVEX | C = edible) is 4/8. What is the probability that the CONVEX mushroom is edible, P(C = edible | X = CONVEX)?

- [x] 4/8
- [ ] 8/12
- [ ] 4/12
- [ ] 8/24


#### Q7. In the previous question, you found the probability of the CONVEX mushroom being edible. What is the probability of the CONVEX mushroom being poisonous, P(C = poisonous | X = CONVEX)?

- [ ] 8/12
- [ ] 4/12
- [x] 4/8
- [ ] 5/8

#### Q8. What are the chances of a random mushroom being poisonous, i.e. P(C = poisonous)?

- [ ] 8/12
- [ ] 4/8
- [x] 4/12
- [ ] 3/4

#### Q9. What are the chances of a mushroom being CONVEX given it is poisonous, i.e. P(X = CONVEX | C = poisonous)?

- [ ] 4/12
- [ ] 8/12
- [x] 1
- [ ] 6/12

**Comprehension - Part 4**


Let’s analyse the results of this problem:

The probabilities of a CONVEX mushroom being edible and poisonous are both 50%. The probability of a mushroom being edible, `P(C = edible | X = CONVEX)` is : 

    P( X = CONVEX | C = edible) . P(C = edible) / P(X = CONVEX)
    
    = (4/8).(8/12) / (8/12)
    
    = 50%


Similarly, the probability of the mushroom being poisonous, `P(C = poisonous| X = CONVEX)` is
 
    = P( X = CONVEX | C = poisonous) . P(C = poisonous) / P(X = CONVEX)
    
    = (4/4).(4/12) / (8/12)
    
    = 50%


Note that the denominator is common in both calculations, i.e. P(X = CONVEX) = 8/12, and thus you do not need to calculate it. You can simply compare the numerators and conclude the classes based on that:

- **Edible:** P( X = CONVEX | C = edible) . P(C = edible) =  (4/8).(8/12) = 4/12 = 33.33%

- **Poisonous:** P( X = CONVEX | C = poisonous) . P(C = poisonous) =  (4/4).(4/12) = 4/12 = 33.33%

Since both numerators are 4/12, you cannot classify the CONVEX mushroom as edible or poisonous (if you consider 50% as the threshold probability for classification). The fundamental concept is that you only need to compare the numerators for the two classes and assign the class based on that.

Let’s now break down the Bayes theorem. The 50% probability that the CONVEX mushroom is edible (or poisonous) is a result of three probabilities. P(edible | CONVEX) is:


- Proportional to P(edible), which tells us how abundant edible mushrooms are; if P(edible) is high, then P(edible | CONVEX) will be high simply because edible mushrooms are abundant!
     - P(edible) is 66.66% and P(poisonous) is 33.33 %
     - This pushes the favour towards edible since they are in abundance
- Proportional to P(CONVEX | edible), which explains how likely you are to find a CONVEX mushroom if you separately consider all the edible ones;
    - P(CONVEX | edible) is 50% and P(CONVEX | poisonous) is 100%
    - This pushes the favour towards poisonous since all poisonous mushrooms are CONVEX
- Inversely proportional to P(CONVEX); this term cancels out while comparing the two classes
  
Thus, the numerators are equal because of the product of two probabilities balances each other out.

**P(edible)** = 66.66% Equation 50% = 33.33%

**P(poisonous)** = 33.33%  100% = 33.33%

## Multivarite
### Conditional Independence in Naive Bayes

In the previous segment, you understood the basic idea behind the working of Naive Bayes and how it is implemented on categorical data consisting of one feature and one target variable. In this case, the calculations for solving the classification problem are very simple as the probabilities can simply be calculated by counting. In this segment, you will **understand how Naive Bayes would work if there are more than one feature in the data set**.

![98.png](attachment:5c25178f-0394-4f8b-bf35-d21c6ea757c2.png)

![99.png](attachment:4ebea705-fa36-4622-9bb6-dc5dac25bc64.png)


**Naïve Bayes** follows an assumption that the **variables are conditionally independent** given the class i.e.  `P(X = convex,smooth | C= edible)` can be written as `P(X=smooth | C=edible) EquationP(X=convex | C=edible)`. The terms `P(X=smooth | C=edible)` and `P(X=convex | C=edible)` is simply calculated by counting the data points. 

Hence, the name **“Naïve”** because in most real-world situations the variables are not conditionally independent given the class label but most of the times the algorithm works nonetheless.

Let us say you are trying to compute `P(A and B | C)`. If `P(A | C)` is the same for all values of `B` and `P(B | C)` is the same for all values of `A`, then there is conditional independence between A and B, given C. This is when `P(A and B | C) = P(A | C) x P(B | C)`, implying that A is not conditioned on B or vice versa.



Despite this assumption, Naive Bayes has proven to work very well in some cases, such as text classification. You'll study an example of classifying emails into spam/ham in the next session.

## MCQ

**Comprehension - Naive Bayes with Multiple Features**

Table 2: Mushroom Dataset
<table align="center" border="1" cellpadding="1" cellspacing="1"><tbody><tr><td><div style="text-align: justify;"><span style="font-size: 
      14px;"><strong>S.No</strong></span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;"><strong>Type of Mushroom</strong></span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;"><strong>Cap.shape</strong></span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;"><strong>Cap.surface</strong></span></div></td></tr><tr><td><div style="text-align: justify;"><span style="font-size: 
      14px;">1.</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Poisonous</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Convex</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Scaly</span></div></td></tr><tr><td><div style="text-align: justify;"><span style="font-size: 
      14px;">2.</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Edible</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Convex</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Scaly</span></div></td></tr><tr><td><div style="text-align: justify;"><span style="font-size: 
      14px;">3.</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Poisonous</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Convex</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Smooth</span></div></td></tr><tr><td><div style="text-align: justify;"><span style="font-size: 
      14px;">4.</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Edible</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Convex</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Smooth</span></div></td></tr><tr><td><div style="text-align: justify;"><span style="font-size: 
      14px;">5.</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Edible</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Convex</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Fibrous</span></div></td></tr><tr><td><div style="text-align: justify;"><span style="font-size: 
      14px;">6.</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Poisonous</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Convex</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Scaly</span></div></td></tr><tr><td><div style="text-align: justify;"><span style="font-size: 
      14px;">7.</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Edible</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Bell</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Scaly</span></div></td></tr><tr><td><div style="text-align: justify;"><span style="font-size: 
      14px;">8.</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Edible</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Bell</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Scaly</span></div></td></tr><tr><td><div style="text-align: justify;"><span style="font-size: 
      14px;">9.</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Edible</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Convex</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Scaly</span></div></td></tr><tr><td><div style="text-align: justify;"><span style="font-size: 
      14px;">10.</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Poisonous</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Convex</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Scaly</span></div></td></tr><tr><td><div style="text-align: justify;"><span style="font-size: 
      14px;">11.</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Edible</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Flat&nbsp;</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Scaly</span></div></td></tr><tr><td><div style="text-align: justify;"><span style="font-size: 
      14px;">12.</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Edible</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Bell</span></div></td><td><div style="text-align: justify;"><span style="font-size: 
      14px;">Smooth</span></div></td></tr></tbody></table>

      
Refer to the table above for the questions that follow. The first two columns are same as before. The third column is cap.surface - it is the second, newly added feature. The task is to **predict the Type.of.mushroom given its two features**.



In the multivariate case, the feature X is written as X = (cap.shape, cap.surface). Let us say if you take a mushroom having cap.shape = CONVEX and cap.surface =  SCALY, the probability of it being edible is expressed as:


    
    P(C = edible | X = CONVEX, SCALY)
    
    = P(X = CONVEX, SCALY | C = edible) Equation P(edible) / P(X = CONVEX, SCALY)



You can similarly write the expression for `P(C = poisonous | X = CONVEX, SCALY)` and compare that with `P(C = edible | X = CONVEX, SCALY)` and conclude the result. Recall that you do not need to calculate the denominator because it is same for both the edible and the poisonous class.


**Useful numbers:**

Number of edible mushrooms = 8

Number of poisonous mushrooms = 4

#### Q1. Say you take a new mushroom which is (CONVEX, SMOOTH). What is the numerator of P(C = edible | X = CONVEX, SMOOTH)?

- [X] P(edible) x P(CONVEX | edible) x P(SMOOTH| edible)
- [ ] P(CONVEX) x P(CONVEX | edible) x P(SMOOTH| edible)
- [ ] P(SMOOTH) x P(CONVEX | edible) x P(SMOOTH| edible)
- [ ] None of these

#### Q2. What is P(CONVEX | edible)?

- [ ] 8/12
- [ ] 4/12
- [x] 4/8
- [ ] 3/12
#### Q3. What is P(SMOOTH | edible)?

- [ ] 2/8
- [ ] 2/12
- [ ] 8/12
- [ ] 4/8

#### Q4.  What is P(CONVEX | poisonous)?

- [ ] 8/12
- [ ] 4/8
- [x] 1
- [ ] 4/12

Ans: Out of 4 poisonous mushrooms, all 4 are CONVEX.

#### Q5.  What is P(SMOOTH| poisonous)?

- [ ] 1
- [x] 1/4
- [ ] 4/8
- [ ] 1/12
      
#### Q6. In the previous questions, you have calculated that:

P(CONVEX | edible) = 4/8

P(SMOOTH| edible) = 2/8

P(CONVEX | poisonous) = 1 and

P(SMOOTH| poisonous) = 1/4


If all mushrooms above 50% probability of being edible are classified as edible, is the CONVEX, SMOOTH mushroom edible?

- [ ] Yes

- [ ] No

- [X] Cannot be decided, it is a tie

Answer: 

P(edible | CONVEX, SMOOTH) = P(edible).P(CONVEX | edible).P(SMOOTH| edible)/denominator = (8/12)(4/8)(2/8)/d = 1/12d

P(poisonous | CONVEX, SMOOTH) = P(poisonous).P(CONVEX | poisonous). P(SMOOTH| poisonous)/denominator = (4/12)(1)(1/4)/d = 1/12d.

Since both numerators are equal to 1/12d, this mushroom cannot be classified with a 50% threshold. Although if you would take a higher threshold, like 60% (which is reasonable since you don't want to take responsibility of people eating poisonous mushrooms), then it will be classified as poisonous. Why? Because, when you set the threshold as 60%, you want the probability of edible|CONVEX,SMOOTH to atleast 60%.

## Deciphering Naive Bayes

You saw how conditional independence lets you calculate the class probability in cases where you have more than one feature. Now, in this segment, you will deal with the original five variable problem, where the new test point has the following features:  

- Cap Shape = Convex
- Cap Surface = Smooth
- Cap Colour = White
- Bruises = Yes
- Odour = None

Here the objective is to classify it into edible or poisonous class. Let's see how that can be done

![100.png](attachment:3f4d9c24-e9ce-49c2-9ef4-184b1e957e6e.png)