# Probability Rules

In probability theory, the outcomes of a random experiment are usually represented as a set. For example, this is how we can represent the outcomes of a die roll as a set:

Outcomes = {1,2,3,4,5,6}

A set is a collection of distinct objects, which means each outcome must occur only once in a set:

- {Heads, Tails} is an example of a valid set because all the elements are distinct.
- {Heads, Heads} is not a proper set because two elements are identical.

Notice we also use curly braces to write a set: {Heads, Tails} is a set, while [Heads, Tails] is not a set.

In probability theory, the set of all possible outcomes is called a **sample space**. A sample space is often denoted by the capital Greek letter Ω (read "omega"). This is how we represent the sample space of a die roll:

**Ω={1,2,3,4,5,6}**

For the following exercise, we'll consider a random experiment where we roll a fair six-sided die two times ("fair" means all outcomes have equal chances of occurring). The sample space of this experiment has 36 possible outcomes (all the sequences of numbers we can get from the two throws):

**Ω={(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(2,1),...,(6,5),(6,6)}**

![probability-pic-1](https://raw.githubusercontent.com/tongNJ/Dataquest-Online-Courses-2022/main/Pictures/probability-pic-1.PNG)

In [2]:
import numpy as np

In [8]:
#let write a function to generate smaple space (a set of unique outcomes) by rolling two 6-sided dices
sample_space = []
sample_space_sum = []
dice_outcome = np.arange(1,7)
for d1 in dice_outcome:
    for d2 in dice_outcome:
        if [d1,d2] in sample_space:
            pass
        else:
            sample_space.append([d1,d2])
            sample_space_sum.append(d1+d2)

print(len(sample_space))
print(sample_space)

print(len(sample_space_sum))
print(sample_space_sum)


36
[[1, 1], [1, 2], [1, 3], [1, 4], [1, 5], [1, 6], [2, 1], [2, 2], [2, 3], [2, 4], [2, 5], [2, 6], [3, 1], [3, 2], [3, 3], [3, 4], [3, 5], [3, 6], [4, 1], [4, 2], [4, 3], [4, 4], [4, 5], [4, 6], [5, 1], [5, 2], [5, 3], [5, 4], [5, 5], [5, 6], [6, 1], [6, 2], [6, 3], [6, 4], [6, 5], [6, 6]]
36
[2, 3, 4, 5, 6, 7, 3, 4, 5, 6, 7, 8, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 6, 7, 8, 9, 10, 11, 7, 8, 9, 10, 11, 12]


In [11]:
def cal_probability(criteria,value):
# if criteria = 0 means ==
# if criteria = 1 means >
# if criteria = -1 means <
    outcome=0
    for i in sample_space_sum:
        if criteria ==0:
            if i == value:
                outcome +=1
        elif criteria ==1:
            if i > value:
                outcome +=1
        else:
            if i < value:
                outcome +=1
    return outcome/len(sample_space_sum) 

    

In [18]:
#The sum of the two rolls is 6. Assign the probability to p_sum_6.
p_sum_6 = cal_probability(criteria=0,value=6)
print(f'P(The sum of the two rolls is 6) = {round(p_sum_6*100,2)}%')

#The sum of the two rolls is lower than 15. Assign the probability to p_lower_15.
p_lower_15 = cal_probability(criteria=-1,value=15)
print(f'P(The sum of the two rolls is lower than 15) = {round(p_lower_15*100,2)}%')

#The sum of the two rolls is greater than 13. Assign the probability to p_greater_13.
p_greater_13 = cal_probability(criteria=1,value=13)
print(f'P(The sum of the two rolls is greater than 13) = {round(p_greater_13*100,2)}%')

P(The sum of the two rolls is 6) = 13.89%
P(The sum of the two rolls is lower than 15) = 100.0%
P(The sum of the two rolls is greater than 13) = 0.0%


In [19]:
# The sum is either 2 or 4. Assign the probability as a proportion to p_2_or_4.
p_2_or_4 = cal_probability(criteria=0,value=2) + cal_probability(criteria=0,value=4)
print(f'P(The sum is either 2 or 4) = {round(p_2_or_4*100,2)}%')

# The sum is either 12 or 13. Assign the probability as a proportion to p_12_or_13.
p_12_or_13 = cal_probability(criteria=0,value=12) + cal_probability(criteria=0,value=13)
print(f'P(The sum is either 12 or 13) = {round(p_12_or_13*100,2)}%')

P(The sum is either 2 or 4) = 11.11%
P(The sum is either 12 or 13) = 2.78%


For rolling a fair six-sided die ("fair" means all outcomes have equal chances of occurring), consider the following two events, A and B:

- A = {3} — getting a 3
- B = {5} — getting a 5
Now, we'd like to find:

- P(A) — the probability of getting a 3 ==> 1/6
- P(B) — the probability of getting a 5 ==> 1/6
- P(A or B) — the probability of getting a 3 or a 5 ==> (1/6) + (1/6)

The sample space of rolling a fair six-sided die is:

Ω= {1,2,3,4,5,6}

To calculate P(A or B), we can also use the formula below, which is sometimes called the addition rule:

**P(A or B) = P(A) + P(B) = (1/6) + (1/6) = (1/3)**



In [43]:
#The sum is either 5 or 9 — assign your answer to p_5_or_9.
p_5_or_9  = cal_probability(0,5)+cal_probability(0,9)
print(f'P(The sum is either 5 or 9) = {round(p_5_or_9*100,2)}%')

#The sum is either even or less than 2 — assign your answer to p_even_or_less_2.
p_even = 0
for i in set(sample_space_sum):
    if i%2==0:
        p_even += cal_probability(0,i)
    else:
        pass
p_even_or_less_2 = p_even + cal_probability(-1,2)
print(f'P(The sum is either even or less than 2 ) = {round(p_even_or_less_2*100,2)}%')

#The sum is either 4 or a multiple of 3 — assign your answer to p_4_or_3_multiple.
p_mutiple_3 = 0
for i in set(sample_space_sum):
    if i%3==0:
        p_mutiple_3 += cal_probability(0,i)
    else:
        pass
p_4_or_3_multiple = p_mutiple_3 + cal_probability(0,4)
print(f'P(The sum is either 4 or a multiple of 3) = {round(p_4_or_3_multiple*100,2)}%')

P(The sum is either 5 or 9) = 22.22%
P(The sum is either even or less than 2 ) = 50.0%
P(The sum is either 4 or a multiple of 3) = 41.67%


So far so good, we use addition method to calcualte probability of the event. But addition method wouldn't work if consider the following cases

Consider also the events C and D, which are:

- C = {2, 4, 6} — getting an even number
- D = {4, 5, 6} — getting a number greater than 3

Notice that two elements, 4 and 6, belong to both C and D. To account for these two common elements, we need to represent C and D on a Venn diagram with a point of intersection:

![probability_pic-2](https://raw.githubusercontent.com/tongNJ/Dataquest-Online-Courses-2022/main/Pictures/probability-pic-2.PNG)

In the previous exercise, we used two different ways to calculate P(C or D), and we expected both to lead to the same result. 

- First, we used the addition rule and got 
  - ~~**P(C or D) = P(C) + P(D) = (3/6) +(3/6) = 1**~~


- Then, we used the theoretical probability formula and got

  - **P(C or D) = number of successful outcomes / total number of outcomes = 4/6 = 0.67**
    
  - **P(C or D) = P(C) + P(D) - P(C and D) = (3/6) + (3/6) -(2/6) = 4/6**
      - where P(C and D) corresponding to the event where the number is both even and greater than 3...
  
The reason we got different results is that the addition rule **doesn't work** for events that **share corresponding outcomes**. In the case of C and D, they have two outcomes in common: 4 and 6 (remember event C is getting an even number and event D is getting a number greater than 3).

### Test

An online betting company offers customers the possibility of betting on a variety of games and events (football, tennis, hockey, horse races, car races, etc.). Based on historical data, the company knows the empirical probabilities of the following events:

- Event F (a new customer's first bet is on football) — the probability is 0.26.

- Event T (a new customer's first bet is on tennis) — the probability is 0.11.

- Event "T and F" (a new customer's first bet is on both football and tennis) — the probability is 0.03.

Find the probability that a new customer's first bet is either on football or tennis. Assign your answer to p_f_or_t. You can't use theoretical probability formula to solve this, so you'll need to make use of the addition rule.

**P(A or B) = P(A) + P(B) - P(A and B)**


In [50]:
p_f_or_t = 0.26+0.11-0.03
print(p_f_or_t)



0.33999999999999997


Consider the following sets:

A={1,2,6}
B={1,4,5,6}
C={HH, HT, TH}
D={Green, Yellow, Brown}

In set theory, when we say "**set A or set B**," we are referring to a single set that is the result of the **union** between set A and set B. The resulting set will have:
- The elements of set A that are not in B
- The elements of set B that are not in A
- The elements that occur in both A and B

A set can only contain unique elements, so the set resulting from a union cannot include the elements that occur in both A and B more than one time. Below, we see the result of a few unions between the four sets above (A, B, C, and D) — in set theory, we use the symbol 
∪ to represent union:


A ∪ B (A or B) = {1, 2, 3, 4, 5}

A ∪ D (A or D) = {1, 2, 6, HH, HT, TH}

B ∪ C (B or C)= {1, 4, 5, 6, Green, Yellow, Brown}


--------------------------------------------------------------------------------------------

When we say "**set A and set B**," we are referring to a single set that contains all the **unique elements** that occur in **both A and B**. In set theory, this "and" operation is called intersection and is represented by the symbol **∩**. Below, we see the results of the intersection of the various sets above (remember Ø means an empty set):

A ∩ B (A and B) = {1,6}

A ∩ D (A and D) = Ø

B ∩ C (B and C) = Ø

### Consider the following sets:

M = {100, 22, 1, 2}

N = {22, car insurance, 2, house insurance}

O = {HHHH, TTTT, TH}

P = {Hockey, Cycling, Athletics, Swimming}

Consider the following set operations and their results:

In [52]:
# Q1. M ∪ P = Ø: If you think the result is correct, assign the boolean True to a variable named operation_1, otherwise assign False.
operation_1 = False 
# M ∪ P (M or P) = P(M) + P(P) - P (M ∩ P) = {100,22,1,2} + {Hockey, Cycling, Athletics, Swimming} - {Ø} = {100,22,1,2,Hockey, Cycling, Athletics, Swimming}

# Q2. N ∩ M = {22, 2}: If you think the result is correct, assign the boolean True to a variable named operation_2, otherwise assign False.
operation_2 = True

# Q3 O ∪ M = {HHHH, TTTT, 100, 22, 2}: If you think the result is correct, assign the boolean True to a variable named operation_3, otherwise assign False.
operation_3 = False
# O ∪ M (O or M) = P(O) + P(M) - P (O ∩ M) = {HHHH,TTTT,TH} + {100,22,1,2} - {Ø} = {HHH,TTTT,TH,100,22,1,2}

# Q4. P ∩ N = Ø: If you think the result is correct, assign the boolean True to a variable named operation_4, otherwise assign False.
operation_4 = True



In [56]:
print('Two mutually exclusive events have no point of intersection on a Venn diagram. [True]')
print('\n')
print('The probability of an event must be greater than 0 and lower than 1. [False]')
print('\n')
print('Events A and B are mutually exclusive, so the probability that they happen both at the same time is 0. [True]')


Two mutually exclusive events have no point of intersection on a Venn diagram. [True]


The probability of an event must be greater than 0 and lower than 1. [False]


Events A and B are mutually exclusive, so the probability that they happen both at the same time is 0. [True]


### Test
A travel agency analyzed the purchasing behavior of its customers and found out that out of 132 randomly selected people:

- 64 bought a summer vacation
- 21 bought a winter vacation
- 15 bought both a summer and a winter vacation

The travel agency provides customers with only two options: a summer vacation and a winter vacation. Calculate:

In [63]:
print(f'The probability that a customer buys both a summer and a winter vacation. AWS = {round(15/132*100,1)}%')

print(f'The probability that a customer buys a summer vacation or a winter vacation. AWS = {round((64+21-15)/132*100,1)}%')

print(f'The probability that a person does not buy anything at all. AWS = {round((1- ((64+21-15)/132))*100,1)}%')



The probability that a customer buys both a summer and a winter vacation. AWS = 11.4%
The probability that a customer buys a summer vacation or a winter vacation. AWS = 53.0%
The probability that a person does not buy anything at all. AWS = 47.0%
