# Chain Rules and Total Probability

In this section, we introduce techniques that use conditional probability to decompose events. The goal is to express unknown probabilities of events in terms of probabilities that we already know. 

## Using Conditional Probability to Decompose Events: Part 1 -- Chain Rules

From the definition of conditional probability, we can write $P(A|B)$ as
\begin{align}
  P(A|B)&= \frac{P(A \cap B)}{P(B)}  \\
  \Rightarrow P(A \cap B)&= P(A|B)P(B), 
\end{align}

and we can write $P(B|A)$ as

\begin{align}
  P(B|A)&=\frac{P(A \cap B)}{P(A)}  \\
  \Rightarrow P(A \cap B)&= P(B|A)P(A).
\end{align}

After manipulating the expressions as shown, we get two *different* formula for expressing $P(A \cap B)$. These are **chain rules** for the probability of the intersection of two events. Such rules are often used when:
* Two events are dependent on each other, but the relation is simple if the outcome of one of the experiments is known.
* The events are at two different point in a system, such as the input and output of a system.


**Example**

A simple example of the former is in card games. Two cards are drawn (without replacement) from a deck of 52 cards (without jokers). What is the probability that they are both Aces? Let $A_i$ be the event that the card on draw $i$ is an Ace. Then the most natural way to apply the chain rule is to write

$$
P(A_1 \cap A_2) = P(A_2 | A_1) P(A_1).
$$

The probability of getting an Ace in draw 1 is 4/52=1/13 because there are 4 Aces in the deck of 52 cards.  The probability of getting an Ace on the second draw *given that the first draw was an Ace* is 3/51 = 1/17 because after the first draw, there are 3 Aces left in the remaining deck of 51 cards.  Thus,

$$
P(A_1 \cap A_2) = P(A_2 | A_1) P(A_1) = \left( \frac{1}{17} \right) \left( \frac{1}{13} \right) = \frac{1}{221}
$$

As a check, we can compare with a solution using combinatorics. There are 

$$
\binom{4}{2} = 6
$$
ways to choose the two Aces from the four total Aces. There are 

$$
\binom{52}{2} = \frac{ 52!}{50! 2!} = 1326
$$
ways to choose two cards from 52. So,

$$
P( A_1 \cap A_2) = \frac{6}{1326} = \frac{1}{221},
$$
which matches our answer using conditional probability.

The solution using conditional probability is usually much more intuitive for learners who are new to probability, but being able to use both techniques is a powerful method for checking your work

**To be added: Question on probability of getting two face cards (JQK) on consecutive draws from a deck of cards. Question on getting defective computers when sitting down at random computers in a lab.**

**Example** 

For the Magician's Coin problem, what is the probability of getting the Fair coin and it coming up heads on the first flip? This is an example of a system where there is an input that affects the future outputs. In this case, the input is the choice of coin. When we have such problems, we usually will need to decompose them in terms of the probabilities of the input and the conditional probabilities of the output given the input. Let $H_i$ denote the event that the coin comes up heads on flip $i$. Let $F$ be the event that the fair coin was chosen.  We are looking for $P(F \cap H_1)$, which we can write as 

$$
P(F \cap H_1) = P(H_1 | F) P(F).
$$

If there is one Fair coin and one two-headed coin, $P(F) =1/2$.  Given  that the coin is Fair, $P(H_1|F) = 1/2$. So,

$$
P(F \cap H_1) = \left( \frac 1 2  \right) \left( \frac 1 2  \right) = \frac 1 4.
$$

Note that it is generally **not helpful to write the probability using the other form of the chain rule:**

$$
P(F \cap H_1) = P(F| H_1) P(H_1).
$$

We do not know $P(H_1)$ nor $P(F|H_1)$. Thus, although the expression is valid mathematically, it is not helpful in solving this problem because it depends on probabilities that cannot be easily inferred from the problem setup.

**To be added:Question on probability  of getting the Fair coin and heads on first two flips.**

The chain rule can be easily generalized to more than two events. The easiest way is to write probabilities in terms of conditional probabilities that are expressed as fractions (as in the definition of probability), such that the denumenator of one fraction cancels with the numerator of the next fraction to make sure the expression is not changes when it is rewritten. This will make more sense with an example for rewriting the probability of the intersection of 3 events ($A$, $B$, and $C$):
\begin{align}
  P(A \cap B \cap C) &= \frac{P(A \cap B \cap C)} {}  \cdot
  \frac{\hspace{4em} }{} \cdot \frac{ \hspace{3em}\mbox{    }}{} \\
  &\\
  &= \frac{P(A \cap B \cap C)} {P(B \cap C)}  \cdot
  \frac{P(B \cap C)}{\mbox{   }} \cdot \frac{ \hspace{3em}\mbox{    }}{} \\
    &= \frac{P(A \cap B \cap C)} {P(B \cap C)}  \cdot
  \frac{P(B \cap C)}{P(C)} \cdot \frac{ P(C) }{1} \\
  &= P(A|B \cap C)  P(B | C) P(C)
\end{align}

This decomposition assumes we know the probability of  $A$ given that $B$ and $C$ have occurred and we know the probability of $B$ given $C$. Such dependence occurs naturally in many systems, but the particular decomposition will depend on what we know about these probabilities. We could just have easily written
\begin{align}
  P(A \cap B \cap C) 
  &= P(C|A \cap B)  P(B | A) P(A)
\end{align}




## Using Conditional Probability to Decompose Events: Part 2 --  Partitions, and Total Probability

(To be written)