In the previous course, we covered the fundamentals of probability and learned about:

- Theoretical and empirical probabilities

- Probability rules (the addition rule and the multiplication rule)

- Counting techniques (the rule of product, permutations, and combinations)


In this course, we'll build on what we've learned and develop new techniques that will enable us to better estimate probabilities. Our focus for the entire course will be on learning how to calculate probabilities based on certain conditions — hence the name conditional probability.

By the end of this course, we'll be able to:

- Assign probabilities to events based on certain conditions by using conditional probability rules.

- Assign probabilities to events based on whether they are in a relationship of statistical independence or not with other events.

- Assign probabilities to events based on prior knowledge by using Bayes' theorem.

- Create a spam filter for SMS messages using the multinomial Naive Bayes algorithm.


---------------------------------------------------------------------------------------------------------------------

Now suppose the die is rolled and we're told some new information: the die showed an odd number (1, 3, or 5) after landing. Is the probability of getting a 5 still P(5)=1/6? 

Or should we instead update the probability based on the information we have?

 
Ω = {1,2,3,4,5,6}  ==>   Ω = {1,3,5}

Therefore, knowing the die showed an add number, the probability of getting 5 is P(5) = 1/3


For notation simplicity, P(5 given the die showed an odd number) becomes **P(5|odd)**. The vertical bar character ( | ) should be read as "given." We can read P(5|odd) as "the probability of getting a 5 given that the die showed an odd number."


Say we roll a fair six-sided die and want to find the probability of getting an odd number, given the die showed a number greater than 1 after landing. Using probability notation, we want to find P(A|B) where:

- A is the event that the number is odd: A = {1, 3, 5}
- B is the event that the number is greater than 1: B = {2, 3, 4, 5, 6}


To find P(A|B), we need to use the following formula:

**P(A|B) = number of successful outcomes / total number of possible outcomes**


We know for sure event B happened (the number is greater than 1), so the sample space is reduced from {1, 2, 3, 4, 5, 6} to {2, 3, 4, 5, 6}:


The total number of possible outcomes above is given by the number of elements in the reduced sample space Ω={2,3,4,5,6} — there are five elements.

The number of elements in a set is called the **cardinal** of the set. Ω is a set, and the **cardinal of Ω={2,3,4,5,6} is**:

**cardinal(Ω) = 5**


In set notation, cardinal(Ω) is abbreviated as **card(Ω)**, so we have:

total number of possible outcomes = **card(Ω) = 5**



The only possible odd numbers we can get are only 3 and 5, and the number of possible successful outcomes is also given by the cardinal of the set {3, 5}:

number of successful outcomes = 2 = card({3,5})


### **P(A|B) = card(A∩B)/card(B) = 2/5**



--------------------------------------------------
Two fair six-sided dice are simultaneously rolled, and the two numbers they show are added together. The diagram below shows all the possible results that we can get from adding the two numbers together.

![probability-pic-1](https://raw.githubusercontent.com/tongNJ/Dataquest-Online-Courses-2022/main/Pictures/probability-pic-1.PNG)


Find P(A|B), where A is the event where the sum is an even number, and B is the event that the sum is less than eight.

In [1]:
def two_dice_sum():
    total = []
    for i in range(1,7):
        for j in range(1,7):
            _sum = i+j
            total.append(_sum)
    return total


In [21]:
card_b=0 
card_a_and_b = 0
for i in two_dice_sum():
    if i < 8:
        card_b += 1
        if i%2==0:
            card_a_and_b +=1

print('card(b) = ' + str(card_b))
print('card(a_and_b) = ' + str(card_a_and_b))
print(f'P(A|B) = card(a and b) / card(b) = {round(card_a_and_b/card_b*100,2)}%')

card(b) = 21
card(a_and_b) = 9
P(A|B) = card(a and b) / card(b) = 42.86%


----------------------------------------------------------------------------------------------------------
A team of biologists wants to measure the efficiency of a new HIV test they developed (HIV is a virus that causes AIDS, a disease which affects the immune system). They used the new method to test 53 people, and the results are summarized in the table below:

![probability-pic-8](https://raw.githubusercontent.com/tongNJ/Dataquest-Online-Courses-2022/main/Pictures/probability-pic-8.PNG)


By reading the table above, we can see that:

- 23 people are infected with HIV.
- 30 people are not infected with HIV (HIVC means not infected with HIV — recall from the previous course that the superscript "C" indicates a set complement).
- 45 people tested positive for HIV .
- 8 people tested negative for HIV.
- Out of the 23 infected people, 21 tested positive (correct diagnosis).
- Out of the 30 not-infected people, 24 tested positive (wrong diagnosis).


The team now intends to use these results to calculate probabilities for new patients and figure out whether the test is reliable enough to use in hospitals. They want to know:

1. What is the probability of testing positive, given that a patient is infected with HIV?


2. What is the probability of testing negative, given that a patient is not infected with HIV?

In [24]:
#Q1 What is the probability of testing positive, given that a patient is infected with HIV?
# P(T+ | HIV+) = card(T+ ∩ HIV+) / card(HIV+)

card_HIV = 23 # there are 23 people infected with HIV
card_positive_and_HIV = 21 # out of 23 infected people, 21 people were tested positive

P_T_given_HIV = card_positive_and_HIV /card_HIV 

print(f'The probability of testing positive, given that a patient is infected with HIV is {round(P_T_given_HIV*100,2)}%')

The probability of testing positive, given that a patient is fected with HIV is 91.3%


In [25]:
#Q2 What is the probability of testing negative, given that a patient is not infected with HIV?
# P(T- | HIV-) = card(T- ∩ HIV-) / card(HIV-)
card_no_HIV = 30
card_negative_and_no_HIV = 6

P_T_given_no_HIV = card_negative_and_no_HIV / card_no_HIV

print(f'The probability of testing negative, given that a patient is not infected with HIV is {round(P_T_given_no_HIV*100,2)}%')

The probability of testing negative, given that a patient is not infected with HIV is 20.0%


In [26]:
'''
The probability of testing negative given that a patient is not
infected with HIV is 20%. This means that for every 10,000 healthy
patients, only about 2000 will get a correct diagnosis, while the
other 8000 will not. It looks like the test is almost completely
inefficient, and it could be dangerous to have it used in hospitals.
'''

'\nThe probability of testing negative given that a patient is not\ninfected with HIV is 20%. This means that for every 10,000 healthy\npatients, only about 2000 will get a correct diagnosis, while the\nother 8000 will not. It looks like the test is almost completely\ninefficient, and it could be dangerous to have it used in hospitals.\n'


--------------------------------------------------------------------------------------------------
A company offering a browser-based task manager tool intends to do some targeted advertising based on people's browsers. The data they collected about their users is described in the table below:

