# Probability Axioms and Definitions

We will open up by discussing the building block ideas of probability including joint probability, union probability, and conditional probability. We will also learn some basic probability notation on the way. 

## What is Probability?

**Probability** is how strongly one believes an event will happen, often expressed as a proportion or odds. This is an abstract concept and probability can philosophically be derived from many places.

For example, let's say **I spill water on my Macbook and I believe it has a 60% chance of working again**. What exactly does this mean? 

img

One way we express a probability is as a percentage between 0 and 100%, or more technically a floating point between 0 and 1. 

$ \Huge P(X) = .6 $

Does this mean... 

* If I took 100 Macbooks and spilled water on them, approximately 60 are expected to recover?
* In the multiverse of alternate realities, 60% of the spill incidents would result in recovery?
* The 60% probabilty is guessed solely on my expertise as an accident-prone person?

Where does this 60% probability come from? As you can see, probability is often based on data and/or beliefs. We attempt to quantify our belief based on our own judgment and/or data. Then we use this probability to make predictions on the future. 

> Data is not always superior to beliefs. It is a common fallacy for people to think data is objective and unbiased, especially in opposition to "using your gut" or "subjective" beliefs. This is simply not true. Some expert opinions can be just as good, even superior, to what data says. This is especially useful to know when data is hard to get. If an Apple engineer looks at my Macbook and tells me it has a 60% chance of probability, I can take that assessment seriously and save myself from potentially destroying 100 Macbooks just to collect data!

To emphasize again, probability can be derived off of data or beliefs. On its own though, probability is pure math without data. When we introduce data to discover probability, this falls into the statistics arena more than probability. 

## Percentages and Odds

The two conventional ways to express probability are percentages and odds. Take when we express the probabilty of event $ X $ as 60% or .6. We are expressing probability as a percentage between 0 and 100%, or 0 and 1. 

$ \Huge P(X) = .6 $

To emphasize again, the probability $ P(X) $ must be between 0 and 1. It cannot be greater than 1, and it cannot be negative. Otherwise, it is not a probability.

$ \Huge 0 \le P(X) \le 1 $

Logically, this means that the probabilty $ P(X) $ and the probability of $ X $ not happened  $ P(X')$ must also add up to 1. 

$ \Huge P(X) + P(X') = 1 $

While trivial, let
's briefly capture this idea in Python code, showing that if the probabilty of $ X $ is .6, then the probabilty of $ X $ not happening is .4. 

In [None]:
p_x = .6
p_not_x = 1 - p_x 

print(f"P(X) = {p_x}, P(not X) = {p_not_x}") 

Alternatively, we can express the probability $ P(X) $ as an odds $ O(X) $. This expresses how many times we believe event $ X $ will happen versus not happen. A probability $ P(X) $ can be converted into an odds $ O(X) $ with this formula and vice versa.

$ \Huge O(X) = \frac{P(X)}{1 - P(X)} $ 

$ \Huge P(X) = \frac{O(X)}{1 + O(X)} $ 

If we have a $ P(X) $ of $ .6 $, then that means $ x $ is 1.5 more times likely to happen versus not happen. 

$ \Huge O(X) = \frac{.6}{1 - .6} = \frac{3}{2} = 1.5 $ 

Let's create these two conversion formulas in Python and calculate the above. 

In [None]:
def o(p_x): return p_x / (1 - p_x)
def p(o_x): return o_x / (1 + o_x)

p_x = .6
print(o(p_x)) # 1.4999999999999998

Odds can provide a helpful tool of quantifying subjective beliefs by means of "betting" money. For example, placing bets on a sports team winning/losing indicates how much a person believes that team will win. There is a reason the expression "putting money where your mouth is" exists!

## Joint Probability

Let's say I have the probability of my first flight arrival $ A $ being late and my connect flight departure $ B $ being late. 

![](
)

I am interested in this because I know if flight $ A $ is late in arrival, I want my connecting flight $ B $ to be late in departure so I can sprint across the airport to make it. 

$ \Huge P(A) = .4 $

$ \Huge P(B) = .6 $ 

What is the probability of both flights being late? This is known as **joint probability** and it is as simple as multiplying both probabilities together. 

$ \Huge P(A \cap B) = P(A) \times P(B) $ 

Think of $ P(A \cap B) $ as reading "the probability of A *and* B." So the probability of our first flight and the connecting flight being late are multiplying $ .4 \times .6  = .24 $. This means that there is a 24% probability I will make my flight. 

$ \Huge P(A \cap B) = .4 \times .6 = .24 $ 

In [None]:
p_a = .4
p_b = .6

p_a_and_b = p_a * p_b 
print(p_a_and_b) # . 24

Of course, I can make this a bit more nuanced and ask  *how late* flight A and flight B's departure and arrivals will be, rather than a binary *late*/*not late* approach. Tasks like this are better solved with probability distributions which we will learn about later. 

A way to reason why this works is to use combined probabilities. Suppose we wanted to calculate the probability of getting a heads *H* in a coin flip and a *six* in a die roll. We can multiply these two probabilities together, but can also find the intersection in the combinations of all possible outcomes. 

img


In [None]:
coin_outcomes = ["H", "T"]
die_outcomes = [1,2,3,4,5,6]

combos = [(c,d) for c in coin_outcomes for d in die_outcomes]

outcome_ct = len(combos)
joint_outcome_ct = sum(1 for cd in combos if cd[0] == "H" and cd[1] == 6)
print(joint_outcome_ct / outcome_ct) # 0.08333333333333333

## Union Probability 

Let's revisit the problem of spilling some water on my Macbook. A hardware repair shop claims that the probability of screen damage is $ .6 $ and the probability of battery damage is $ .5 $. 

What is the probability of screen damage *or* battery damage? So this is where things get tricky. When we do an *or* operation $ P(A \cup B)$ between two or more events, we call it a **union probability**. 

img

Your first instinct might be to add the two probabilities together like so, but this is not exactly correct. Can you see why? 

$ \Huge P(A \cup B) = P(A) + P(B) $

$ \Huge P(A \cup B) = .6 + .5 = 1.1 $

Notice how the total probability is 1.1, which makes this an invalid probability as it must be between $ 0 $ and $ 1 $. The reason this is happening due to double-counting between both events. To remedy this, we have to subtract the joint probability. 

$ \Huge P(A \cup B) = P(A) + P(B) - P(A) \times P(B) $

$ \Huge P(A \cup B) = .6 + .5 - (.6)(.5) = .8 $

Alright, $ .8 $ is the correct answer! And this is a valid probability. But you probably have questions why the joint probability needs to be subtracted. The most intuitive way to explain this is again with a coin and die roll, and ask "what is the probability of getting a heads *or* a six?" Notice how the *six* outcome is counted twice below, unless we subtract the joint probability. This effectively rids the double-counting of outcomes. 

img


We can also get the correct answer by manually counting the combinations. 

In [None]:
coin_outcomes = ["H", "T"]
die_outcomes = [1,2,3,4,5,6]

combos = [(c,d) for c in coin_outcomes for d in die_outcomes]

outcome_ct = len(combos)
union_outcome_ct = sum(1 for cd in combos if cd[0] == "H" or cd[1] == 6)
print(union_outcome_ct / outcome_ct) # 0.08333333333333333

## Conditional Probability

When we talk about **conditional probability**, we talk about how much event $ A $ changes given event $ B $ has occurred. For example the probability of a flood $ A $ is .01. But given it rains $ B $, then the probability of a flood $ A $ becomes $ .15. $

$ \Huge P(A) = .01 $ 

$ \Huge P(A|B) = .15 $ 

If $ B $ has no effect on $ A $, then that means $ P(A) $ is going to be equal to $ P(A|B)$. Think of it has A not caring what B does. 

Now ask yourself this. Let's say you were asked to find the probability of a flood $ A $ *and* rain $ B $ occuring. What do you multiply in your joint probability given the conditional probability?


$ \Huge P(\text{flood}) = .01 $ 

$ \Huge P(\text{rain}) = .3 $ 

$ \Huge P(\text{flood|rain}) = .15 $ 

$ \Huge \text { } $ 

$ \Huge P(\text{flood} \cap \text{rain} ) = \text{ ? } $ 


The answer is to calculate the joint probability with the conditional probability that applies. You always want to use the more specific conditional probability when it is available! 

$ \Huge P(\text{flood} \cap \text{rain} ) = P(\text{flood|rain}) \times P(\text{rain}) = .045 $ 

This should make sense as if we did not use the conditional probability, we are not accounting for the fact rain increases the probability of a flood. The joint probability will thus be smaller. 

$ \Huge P(\text{flood} \cap \text{rain} ) = P(\text{flood}) \times P(\text{rain}) = .003 $ 

It is not so much that the latter joint probability is wrong, but it does not account for the more specific information that is available as a conditonal probability. Remember, a probability is an approximation reflecting uncertainty. We can decrease that uncertainty by using more specific information when it is available. 

We will learn how to flip conditional probabilities in the next section with Bayes Theorem. 

## Exercise 

You are waiting for a plane to arrive at the gate for your flight.  Your flight-tracking app predicts flights at this time are 10% likely to be late. However, there is a 50% chance of a storm occuring and the flight is 70% likely to be late during a storm. 

What is the probability your flight will be late *and* it will storm? Fill out the Python code below (replacing the question marks (?) to calculate your answer. 

In [None]:
p_late = .1
p_storm = .5
p_late_given_storm = .7

p_storm_and_late = ? 

### SCROLL DOWN FOR ANSWER
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
v 

The answer is .35 probability that you will be late and it will storm. Remember to use the conditional probability when it applies and is available. 

In [None]:
p_late = .1
p_storm = .5
p_late_given_storm = .7

p_storm_and_late = p_late_given_storm * p_storm

print(p_storm_and_late) # .35