# Definitions and Rules (DRAFT)

The intention here is not to have a comprehensive introduction to probability, but just to provide a reminder of the basic definitions and rules. Every statistics textbook has a chapter on probability that is more complete than this section. We encourage the readers who have not encounter the concept of probability to find a good introductory chapter. We start with some basic definitions:

**Random phenomenon**:  where individual outcomes are uncertain; for example, the number of boys in 100 births in a Chicago hospital (outcome is uncertain as we do not know if the number of boys will be 50, or 40 or something else). 

**Sample space, S**: the set of all possible outcomes of a phenomenon; in the above example, S is the set of integers from 0 to 100 (possible outcomes for the number of boys are 0, 1, 2 , ..., 100). 

**Event, A**: An outcome or a set of outcomes of a random phenomenon; for example, A is the event that less than half of the babies are boys. A is the set of integers from 0 to 49.

**Mutually exclusive events**: Events $A$ and $B$ are mutually exclusive (or disjoint) if they have no outcomes in common. An example of that is B the event that the number of boys is between 60 and 70 and A is as above. 

**Complement of an event**: The complement of an event $A$ is the event that $A$ does not occur, denoted by $A^C$. For the event $A$ defined above, $A^C$ is the event that more than half of the babies are boys, or the set of integers from 50 to 100.

![](./img/venn_comp.png)

**Compound events**: Events built from combinations of other events.

**Union:** ($A$ or $B$) = ($A\cup B$): set of all outcomes in $A$, or in $B$, or in both.


![](./img/venn.union.png)

**Intersection:**  ($A$ and $B$) = ($A\cap B$): set of all outcomes that are in $A$ and in $B$.

![](./img/venn_int.png)



## Definition of Probability

Probabilities describe how likely events are and so probability models consist of:
- A list of possible outcomes (sample space)
- An assignment of probabilities $P$

The **frequentist interpretation of the probability** of an event $A$, $\mbox{P}(A)$, is the long run relative frequency of the event $A$. Suppose you are interested in the probability of "Heads" when tossing a coin. In this frequentist interpretation, probability is given by the limit of the relative frequency of "Heads" when tossing the coin repeatedly. Note that while you can imagine repeating the coin toss for a large number of times (and some people have done it!), there are other events where the intutition behind frequentists probabilities are not as evident. For example, what is the probability that it will rain next Sunday? This where the **Bayesian interpretation** of probability - based on a subjective degree of belief - is more natural. In the Bayesian workd, two people could have different viewpoints and assign different probabilities. 

Note that the rules below are universal.

## Basic Probability Rules

- $0 \le \mbox{P}(A) \le 1$, for any event $A$

- $\mbox{P}(S) = 1$

- **Equally likely outcomes**:
$P(A)=\frac{\mbox{ Number of outcomes in A}}{\mbox{ Total number of outcomes}}$

-  $\mbox{P}(E^C) = 1 - \mbox{P}(E)$ for any event $E$

- $\mbox{P}(A \cup B) = \mbox{P}(A) +
\mbox{P}(B) - \mbox{P}(A \cap B)$

## Conditional Probability
If $\mbox{P}(A) \ne 0$, the conditional probability of event $B$
given $A$ has occurred, denoted by $\mbox{P}(B|A)$, is defined by,
$ \mbox{P}(B|A) = \frac{\mbox{P}(A \mbox{ and } B)}{\mbox{P}(A)}$

![](./img/venn.condprob.png)

Example:

- Select one subject at random in US
- B is the event that the subject spent more than 2 hours on zoom last week
- A is the event that the subject is a college student
- P(B|A) versus P(B)

## More Probability Rules

**Conditional probability**: If $\mbox{P}(B) \ne 0$, the conditional probability of event $A$ given $B$ has occurred, denoted by $\mbox{P}(A|B)$, is defined by,
$ \mbox{P}(A|B) = \frac{\mbox{P}(A \mbox{ and } B)}{\mbox{P}(B)}$

**Multiplication rule**: $\mbox{P}(A \mbox{ and } B) = \mbox{P}(A|B) \mbox{P}(B)$

**Independence**: Events $A$ and $B$ are independent if $\mbox{P}(A|B) =
\mbox{P}(A)$ (or equivalently, $\mbox{P}(B|A) = \mbox{P}(B)$)

Equivalent condition for **independence**: 
$\mbox{P}(A \mbox{ and } B) = \mbox{P}(A) \mbox{P}(B)$

**The Bayes Rule**:
$
\begin{eqnarray}
\mbox{P}(A|B) & = & \frac{\mbox{P}(B|A) \mbox{P}(A)}{\mbox{P}(B)} \nonumber
\end{eqnarray}
$


## The solution to the birthday problem

We will use the **equally likely outcomes** formula from above. Note that, for $n$ random subjects, the total number of outcomes (number of possible combination of birthdays) is 
$365^n.$

The number of outcomes that lead to a set of distinct birthdays is
$365\times364\times ...\times (365-n+1)$
and the intuition comes from the way we can count the total number of distinct birthdays as follows:
- suppose you look at people sequentially;
- first person can have any of the 365 birthdays without leading to mathched birthdays;
- the second can have any of birthdays except the one of the first person: so 364 possibilities;
- the $n$-th person can have any of birthdays except any of the (n-1) different birthdays of the other people: so (365-n+1) possibilities.

So the probability of having $n$ distinct birtdays is:
$\frac{365\times364\times ...\times (365-n+1)}{365^n}$
The complement of this event is the event of interest (at least two people share birthdays) and so the probability of interest is:
$P_n ~=~ 1-\frac{365\times364\times ...\times (365-n+1)}{365^n}$
