# Axiomatic  Probability

Defining probabilities using relative frequencies for statistically regular experiments or using simple math for fair experiments are helpful to develop some intuition about probability. However, these methods have limitations that restrict their usefulness to many  real world problems. These problems were recognized by mathematicians working on probability and motivated these mathematicians to develop an approach to probability that is:
* not based on a particular application or interpretation,
* agrees models based on relative frequency and fair probabilities,
* agrees with our intuition (where appropriate), and
* are useful to solving real-world problems.

The approach they developed is called *Axiomatic Probability*. Axiomatic means that there is a set of rules (called axioms) for probability, but that the set of rules is made as small as possible. 

## Probability Spaces

The first step in developing Axiomatic Probability is to define the core objects that the axioms apply to. In this case, define a *Probability Space* as an ordered collection (tuple) of three objects, and we denote it by 
$$
(S, \mathcal{F}, P)
$$

These objects are called the *sample space*, the *event class*, and the *probability measure*. 

**Sample Space**

We have already introduce the sample space in {doc}`outcomes-samplespaces-events`. It is a **set** containing all possible outcomes for an experiment.

**Event Class**

The second object, denoted by a calligraphic F ($\mathcal{F}$), is called the *event class*.
````{panels}
DEFINTION
^^^
event class:
For a sample space $S$ and a probability measure $P$, the event class, denoted by $\mathcal{F}$ is a collection of all subsets of $S$ to which we will assign probability (i.e., for which $P$ will be defined). The sets in $\mathcal{F}$ are called events.
````
 

When $S$ is finite, then $\mathcal{F}$ can be taken to be all possible subsets of $S$.
````{panels}
DEFINTION
^^^
power set: For a set $S$ with finite cardinality, $|S|=N < \infty$, the power set is the set of all possible subsets. We will use the notation $2^S$ to denote the power set.
````



Note that the power set includes both the empty set ($\emptyset$) and $S$.

**Example**
Consider flipping a coin and observing the top face. Then $S=\{H,T\}$ and 
$$
\mathcal{F} = \bigl\{ \emptyset, H, T, \{H,T\} = S \bigr\}
$$

Note that $|S|=2$ and $|2^S| = 4 = $2^|S|$.

**Exercise**

Consider rolling a standard six-sided die. Give the sample space, $S$,  and the power set of the sample space, $2^S$.  What is the cardinality of $2^S$?

When $|S|=\infty$, weird things can happen if we try to assign probabilities to every subset of $S$. **JMS: Working here. Need footnote about uncountably infinite** For typical data science applications, we can assume that any event that we want to ask about will be in the event class, and we do not need to explicitly enumerate the event class.

**Probability Measure**

Until now, we have discussed the probabilities of outcomes. However, this is not the approach taken in probability spaces:

````{panels}
DEFINTION
^^^
probability measure:
The probability measure, $P$, is a real-valued set function that maps every element of the event class to the real line.
````

Note that in defining the probability measure, we do not specify the range of values for $P$, because at this point we are only defining the structure of the probability space through the types of elements that make it up.

Although $P$ assigns outcomes to events (as opposed to outcomes), every outcome in $S$ is typically an event in the event class. Thus, $P$ is more general in its operation than we have considered in our previous examples. As explained in {doc}`outcomes-samplespaces-events`, an event occurs if the experiment's outcome is one of the outcomes in that event's set.

## Axioms of Probability

As previously mentioned, axioms are a minimal set of rules. There are three Axioms of Probability that are specified in terms of the probability measure:



**The Axioms of Probability**

**I.** For every event $E$ in the eventclass $\mathcal{F}$, $ P(E) \ge 0$ 
*(the event probabilities are non-negative)*

**II.** $P(S) =1$   (the probability that some outcome occurs is 1)

**III.** For all pairs ofevents $E$ and $F$ in the event class that are disjoint ($E \cap F = \emptyset$), 
          $P( E \cup F) = P(E)+P(F)$ (if two events are disjoint, then the probability that either one of the events occurs is equal to the sum of the event probabilities)
          
When dealing with infinite sample spaces, an alternative version of Axiom III should be used:

**III'.** If $A_1, A_2, \ldots$ is a sequence of
          event that are all disjoint  ($A_i \cap A_j = \emptyset~ \forall i\ne j$),
          then 
          $$
          P \left[ \bigcup_{k=1}^{\infty} A_k \right] = \sum_{k=1}^{\infty}
            P\left[ A_k \right].
            $$
<!-- *(Note that these sums and unions are over countably infinite sequences of events.)* -->

Many students of probability wonder why Axiom I does not specify that $0 \le P(E) \le 1$. The answer is that the second part of that inequality is not needed because it can be proven from the other axioms. Anything that is not required is removed to ensure that the axioms are a minimal set of rules. 