# 6 Conditional Probability

Conditional probability is the probability of an event given that another event has already occurred.

Example{example}

Given two dice rolls, we want to calculate the probability of the of the sum of those rolls being 4.

Solution{solution}

The solution space of this problem is the set of all possible outcomes of two dice rolls:

$\Omega = \{(1,1), (1,2), \dots, (6,6)\}$

With

$|\Omega| = 6 * 6 = 36$

The event space of the sum of the rolls being 4 is the set of all outcomes where the sum of the rolls is 4:

$A = \{(1,3), (2,2), (3,1)\}$

With

$|A| = 3 $

Thus, the probability of the sum of the rolls being 4 is $\frac{|A|}{|\Omega|} = \frac{3}{36} = \frac{1}{12}$.

Now, let us consider we could look at the first die roll before the second die roll. If you would have gambled on the sum of the rolls being 4, what number would you want to see on the first die roll? Would you rather see 1, 2, 3, 4, 5, or 6?

Solution{solution}

You wouldn't want to see 4, 5, 6 since then the second roll doesn't matter. The sum would always be greater than 4. Let's consider the other cases:

- If you see a 1, now the solution space is reduced to $\Omega = \{(1,1), (1,2), (1,3), (1,4), (1,5), (1,6)\}$ and the event space is reduced to $A = \{(1,3)\}$. The probability of the sum of the rolls being 4 is $\frac{|A|}{|\Omega|} = \frac{1}{6}$.
- If you see a 2, now the solution space is reduced to $\Omega = \{(2,1), (2,2), (2,3), (2,4), (2,5), (2,6)\}$ and the event space is reduced to $A = \{(2,2)\}$. The probability of the sum of the rolls being 4 is $\frac{1}{6}$.
- If you see a 3, now the solution space is reduced to $\Omega = \{(3,1), (3,2), (3,3), (3,4), (3,5), (3,6)\}$ and the event space is reduced to $A = \{(3,1)\}$. The probability of the sum of the rolls being 4 is $\frac{1}{6}$.

***Note:*** After seeing a 1, 2, or 3 on the first die roll, the probability of the sum of the rolls being 4 is always $\frac{1}{6}$. So you are equally happy to see a 1, 2, or 3 on the first die roll.

Conditional Probability{definition}

The conditional probability is defined as the probability of an event $A$ happening, given that another event $B$ has already occurred:

We write the conditional probability of $A$ given $B$ as $P(A|B)$.

We mean $P(A, \text{given that } B \text{ has occurred})$.

The sample space is now all possible outcomes that are consistent with $B$ (i.e., $\Omega \cap B = B$).

The event space is now all possible outcomes that are consistent with $A$ and $B$ (i.e., $A \cap B$).

The conditional probability is then:

$$
P(A|B) = \frac{|AB|}{|B|}
$$

**Note:** Here, as a shorthand for $A \cap B$, we often write $AB$.

Chain Rule of Probability{rule}

From the definition of conditional probability, we can directly derive the chain rule of probability:

$$
P(AB) = P(A|B)P(B)
$$

Law of Probability{rule}

The law of probability states  that the probability of an event $A$ happening is the sum of the probabilities of $AB$ happening plus the probability of $A B^{c}$ happening:

$$
P(A) = P(AB) + P(A B^c)
$$

This can be rewritten as:

$$
P(A) = P(A|B)P(B) + P(A|B^c)P(B^c)
$$

In other words, the probability of $A$ happening is the probability of $A$ given that $B$ has occurred times the probability of $B$ happening plus the probability of $A$ given that $B$ has not occurred times the probability of $B$ not happening.

Example{example}

You have a bacteria in your gut which is causing a disease. $10\%$ of these bacteria have a mutation that makes them resistant to antibiotics. After taking half a course of antibiotics, the probability of the resistant bacteria having survived is $20\%$. The probability of the non-resistant bacteria having survived is $1\%$. What is the probability of a bacteria having survived?

Solution{solution}

Let $A$ be the event that a bacteria has survived and $B$ the event that the bacteria is resistant.

\begin{align*}
P(A) &= P(A|B)P(B) + P(A|B^c)P(B^c) \\
&= .2 * .1 + .01 * .9 \\
& = .029 \\
\end{align*}

Example{example}

What is the probability of a bacteria being resistant given that it has survived?

Solution{solution}

$$
P(B|A) = P(AB) / P(A) = P(A|B)P(B) / P(A) = .2 * .1 / .029 = .6897
$$

Bayes' Theorem{rule}

Bayes' theorem is a fundamental theorem in probability theory that describes how conditional probabilities can be reversed:

$$
P(B|A) = \frac{P(A|B)P(B)}{P(A)}
$$

(Proof: this follows directly from the definition of conditional probability and the chain rule of probability)

We can also replace $P(A)$ with the law of probability:

$$
P(B|A) = \frac{P(A|B)P(B)}{P(A|B)P(B) + P(A|B^c)P(B^c)}
$$


Example{example}

A test is $98\%$ accurate at detecting a disease (SARS). However, it has a false positive rate of $1\%$. The probability of a person having SARS is $0.5\%$.

You test positive for SARS. What is the probability that you actually have SARS?

Solution{solution}

Let $T$ be the event that you test positive and $S$ the event that you have SARS.

We want to calculate $P(S|T)$. We can use Bayes' theorem:

$$
P(S|T) = \frac{P(T|S)P(S)}{P(T)}
$$

We already have $P(T|S) = .98$ and $P(S) = .005$. We need to calculate $P(T)$. We can use the law of probability:

$$
P(T) = P(T|S)P(S) + P(T|S^c)P(S^c) = .98 * .005 + .01 * .995 = .0149
$$

Now we can calculate $P(S|T)$:

$$
P(S|T) = \frac{.98 * .005}{.0149} = .3295
$$

So the probability that you actually have SARS given that you tested positive is $32.95\%$. This is a lot lower thatn the $98\%$ accuracy of the test and stems from the low base rate of the disease.

Example{example}

Since you are not sure if you have SARS or not even after the test, you decide to take a second test with the same accuracy and false positive rate. You test positive again. What is the probability that you actually have SARS?

Solution{solution}

Let $T_2$ be the event that you test positive in the second test.

We want to calculate $P(S|T_1, T_2)$. We can use Bayes' theorem:

$$
P(S|T_1, T_2) = \frac{P(T_1, T_2|S)P(S)}{P(T_1, T_2)}
$$

We already have $P(T_1, T_2|S) = .98 * .98$ and $P(S) = .005$. We need to calculate $P(T_1, T_2)$. We can use the law of probability:

$$
P(T_1, T_2) = P(T_1, T_2|S)P(S) + P(T_1, T_2|S^c)P(S^c) = .98 * .98 * .005 + .01 * .01 * .995 = .0049015 \approx .005
$$

Now we can calculate $P(S|T_1, T_2)$:

$$
P(S|T_1, T_2) = \frac{.98 * .98 * .005}{.005} \approx .96
$$

So the probability that you actually have SARS given that you tested positive twice is $96\%$. This is a lot higher than the probability after only doing one test.