# EE319 - Probability & Random Processes
## *Dr.-Ing. Mukhtar Ullah*, FAST NUCES, Spring 2020
<hr>

## **Lecture 14** (2020-03-06)
## Towards a Calculus of Probability

### Borel sets generated by right-closed intervals

In the last lecture, we established that Borel sets include all subsets of the real line encountered in practice. Borel sets are elements of the Borel field, an event space generated by right-closed intervals $\mathsf{B}_{x}=\left(-\infty,x\right]$. Consequently, all we need is assign probabilities to the intervals $\mathsf{B}_{x}$ provided our sample space is the real line. Probabilities of all the other Borel sets can then be determined by application of Kolmogorv axioms and its corollaries. This suggests the following strategy for solving any probabiliy problem.
 1. Identify events of interest in any problem.
 1. Find a representation of your events in the form of Borel sets on the real line.
 1. Assign probabilities to simpler Borel sets, that is, right-closed intervals.
 1. Determine probabilities of events of interest by application of Kolmogorov axioms and its corollaries.
 
This strategy motivates the following definition.

### Distribution Function
The distribution function, also called the cumulative distribution function (CDF) is defined on the real line by
$$F\left(x\right)=P\left(\mathsf{B}_{x}\right)=P\left(\left(-\infty,x\right]\right)$$
It is very important to note that, despite the numerical equality of the final probability, the probability measure $P$ and the CDF $F$ are different functions. The former is a set function (defined on the Borel field over real line) whereas the latter is a point function (defined on the real line itself).

To ensure applicability to the entire real line, the CDF is also required
to satisfy
$$F\left(-\infty^{+}\right)=\lim_{x\downarrow-\infty}F\left(x\right)=0,\quad F\left(\infty^{-}\right)=\lim_{x\uparrow\infty}F\left(x\right)=1$$

If you have understood this, you should never write expressions like $F\left(\infty\right)$ that make no sense.

- The CDF is always right-continuous.
$$F\left(a^{+}\right)=\lim_{x\downarrow a}F\left(x\right)=\lim_{x\downarrow a}P\left(\mathsf{B}_{x}\right)=P\left(\lim_{x\downarrow a}\mathsf{B}_{x}\right)=P\left(\mathsf{B}_{a^{+}}\right)=F\left(a\right)$$

- The CDF may or may not be left-continuous, though.
$$F\left(a^{-}\right)=\lim_{x\uparrow a}F\left(x\right)=\lim_{x\uparrow a}P\left(\mathsf{B}_{x}\right)=P\left(\lim_{x\uparrow a}\mathsf{B}_{x}\right)=P\left(\mathsf{B}_{a^{-}}\right)\le F\left(a\right)$$

- How to determine probabilities of other Borel sets?
\begin{align*}
P\left(\left(a,\infty\right)\right) & =P\left(\mathsf{B}_{a}^{\mathsf{c}}\right)=1-F\left(a\right)\\
P\left(\left(a,b\right]\right) & =P\left(\mathsf{B}_{b}\setminus\mathsf{B}_{a}\right)=F\left(b\right)-F\left(a\right)\\
P\left(\left(-\infty,b\right)\right) & =P\left(\mathsf{B}_{b^{-}}\right)=F\left(b^{-}\right)\\
P\left(\left[a,\infty\right)\right) & =P\left(\mathsf{B}_{a^{-}}^{\mathsf{c}}\right)=1-F\left(a^{-}\right)\\
P\left(\left(a,b\right)\right) & =P\left(\mathsf{B}_{b^{-}}\setminus\mathsf{B}_{a}\right)=F\left(b^{-}\right)-F\left(a\right)\\
P\left(\left[a,b\right)\right) & =P\left(\mathsf{B}_{b^{-}}\setminus\mathsf{B}_{a^{-}}\right)=F\left(b^{-}\right)-F\left(a^{-}\right)\\
P\left(\left[a,b\right]\right) & =P\left(\mathsf{B}_{b}\setminus\mathsf{B}_{a^{-}}\right)=F\left(b\right)-F\left(a^{-}\right)\\
P\left(\left\{ a\right\} \right) & =P\left(\mathsf{B}_{a}\setminus\mathsf{B}_{a^{-}}\right)=F\left(a\right)-F\left(a^{-}\right)
\end{align*}
The last case of singleton reveals that the CDF may not be left-continuous everywhere.

### Integers in an interval
It is sometimes useful to have a convenient notation for the set of integers contained in an interval.
\begin{align*}
\left[m\cdot\cdot n\right] & =\left[m,n\right]\cap\mathbb{Z}, & \left(m\cdot\cdot n\right] & =\left(m,n\right]\cap\mathbb{Z}, & \left[m\cdot\cdot n\right) & =\left[m,n\right)\cap\mathbb{Z}, & \left(m\cdot\cdot n\right) & =\left(m,n\right)\cap\mathbb{Z}
\end{align*}

### Probability density function
Though the CDF is a point function, it does not capture the local behavior of the probability measure. We need another point function $f$, called a probability density function (PDF) to characterize the probability measure locally. The last case of the singleton above suggests one such choice
$$f\left(x\right)=P\left(\left\{ x\right\} \right)=F\left(x\right)-F\left(x^{-}\right)$$
This function has been traditionally referred to the probability mass function (PMF). A more modern name, however, is *discrete probability density function*. It can be seen that the the discrete PDF is nonzero only at points of discontinuity at which the CDF has jumps. If the CDF changes continuously in an interval (without no jumps), the discrete PDF is zero everywhere in that interval, and we need its replacement by a *continuous probability density function*
$$f\left(x\right)=\frac{\mathrm{d}F}{\mathrm{d}x^{-}}=\lim_{\varepsilon\uparrow0}\frac{F\left(x+\varepsilon\right)-F\left(x\right)}{\varepsilon}=\lim_{\varepsilon\downarrow0}\frac{F\left(x\right)-F\left(x-\varepsilon\right)}{\varepsilon}$$
Note that the limit here is the left-hand derivative of the CDF.

> **Spinning arrow in a wheel**
>Imagine a spinning arrow in a wheel divided into $n$ segments. In which segment will the arrow end up?
How do we represent events of our interest on the real line? The segments can be represented as points in the discrete subset $\mathcal{X}=\left[1\cdot\cdot n\right]$ of the real line. The total probability is distributed uniformly among subsets $\left\{ 1\right\} ,\left\{ 2\right\} ,\ldots,\left\{ n\right\} $ so that each subset has probability $1/n$. The remaining subset $\mathbb{R}\setminus\mathcal{X}$ is assigned probability $0$. The CDF takes the form
$$F\left(x\right)=\begin{cases}
0 & x<1\\
1/n & 1\le x<2\\
2/n & 2\le x<3\\
\vdots & \vdots\\
\left(n-1\right)/n & n-1\le x<n\\
1 & x\ge n
\end{cases}$$

### Iversion Bracket
Unwieldy expressions like the above can be written more succintly with help of the
Iversion bracket 
$$\left[S\left(x\right)\right]=\begin{cases}
1 & S\left(x\right)\ \text{true}\\
0 & S\left(x\right)\ \text{false}
\end{cases}$$
defined for a logical statement $S\left(x\right)$ in $x$. Any confusion with the normal usage of square brackets for grouping terms can be avoided by remembering that Iversion bracket parses logical expressions only.
 
> **Spinning arrow in a wheel (continued)**
The CDF can now be succintly expressed by
$$F\left(x\right)=\frac{1}{n}\sum_{k=1}^{n}\left[x\ge k\right]=\frac{\min\left\{ n,\left\lfloor x\right\rfloor \right\} }{n}\left[x\ge1\right]$$

### Discrete Distribution
Generalizing the above example, imagine an increasing sequence $x_{1}<x_{2}<\cdots$ of points in some subset $\mathcal{X}$ of the real line and a discrete probability measure $P$ that assigns probabilities $p_{i}$ to singletons $\left\{ x_{i}\right\} $. The associated CDF takes the general form
$$F\left(x\right)=\sum_{i}p_{i}\left[x\ge x_{i}\right]$$
Recalling $P\left(\left\{ x\right\} \right)=F\left(x\right)-F\left(x^{-}\right)$
and noting that $\left[x\ge x_{i}\right]-\left[x>x_{i}\right]=\left[x=x_{i}\right]$
allows to recover the discrete PDF, or PMF, 
$$f\left(x\right)=P\left(\left\{ x\right\} \right)=\sum_{i}p_{i}\left[x=x_{i}\right]$$
The set $\mathcal{X}$ over which $f$ assigns nonzero values is called its support.

We can then express the probability of any Borel set $\mathsf{B}$ by 
$$P\left(\mathsf{B}\right)=\sum_{x\in\mathsf{B}}f\left(x\right)=\sum_{x}f\left(x\right)\left[x\in\mathsf{B}\right]=\sum_{i}p_{i}\left[x_{i}\in\mathsf{B}\right]$$
Thus, a complete characterization of any discrete probability measure $P$ defined on Borel sets is achieved by only two sets
 - A countable set $\mathcal{X}$ of real numbers $x_{1}<x_{2}<\cdots$
 - A set of probabilities $p_i$ associated with $x_{i}$, in order, that sum to unity
 
Let us identify these two sets for a few examples encounter so far in this course.

> **Benoulli distribution**
>
>The countable set is
$\mathcal{X}=\left\{ 0,1\right\}$
>
>The set of probabilities are $p$ and $1-p$
>
> **Binomial distribution**
>
>The countable set is
$$\mathcal{K}=\left[0\cdot\cdot n\right]$$
>
>The set of probabilities are
$$p_{k}=\binom{n}{k}p^{k}\left(1-p\right)^{n-k}$$
>
> **Geometric distribution**
>
>The countable set is
$$\mathcal{K}=\mathbb{N}$$
The set of probabilities are 
$$p_{k}=\left(1-p\right)^{k-1}p$$
>
> **Negative bionomial distribution**
>
>The countable set is
$$\mathcal{K}=\left[r\cdot\cdot\infty\right)$$
The set of probabilities are 
$$p_{k}=\binom{k-1}{r-1}p^{r}\left(1-p\right)^{k-r}$$
>
> **Hypergeometric distribution**
>
>The countable set is
$$\mathcal{K}=\left[\max\left\{ 0,n-m+r\right\}\cdot\cdot \min\left\{ n,r\right\}\right]$$
>
>The set of probabilities are
$$p_{k}=\dfrac{\dbinom{r}{k}\dbinom{m-r}{n-k}}{\dbinom{m}{n}}$$


### Continuous Distribution
First, we revisit the spinning arrow in a wheel for a finer treatment.
> **Spinning arrow in a wheel (continued)**
A more detailed representation of our events of interest is to reserve the interval $\mathcal{X}=\left[0,n\right]$ and map the segments, in order, to the (sub)intervals $\left(0,1\right],\left(1,2\right],\ldots,\left(n-1,n\right]$. For reasons that will become obvious soon, an even better choice is to reserve the interval $\mathcal{X}=\left[0,1\right]$ and map the segments, in order, to the intervals $\left(0,1/n\right],\left(1/n,2/n\right],\ldots,\left(\left(n-1\right)/n,1\right]$ among which the total probability is distributed uniformly so that each subset has probability $1/n$. The remaining subset $\mathbb{R}\setminus\mathcal{X}$ is assigned probability $0$. The CDF $F\left(x\right)$ assumes a value of zero in the open interval $\left(-\infty,0\right)$, rises gradually from $0$ to $1/n$ in $\left(0,1/n\right]$, from $1/n$ to $2/n$ in $\left(1/n,2/n\right]$, all the way up to a rise from $\left(n-1\right)/n$ to $1$ in $\left(\left(n-1\right)/n,1\right]$ and remains at that value in $\left(1,\infty\right)$. The pattern suggests a uniform increase of the CDF from $0$ to $1$ in the interval
$\mathcal{X}$. This becomes more obvious in the limiting case of infinitely large $n$. As $n$ gets larger and larger, each (sub)interval shrinks to a point and the probability increment in each interval approaches zero, and yet the CDF keeps increasing. That is exactly how continuous changes occur. The CDF takes the form
$$F\left(x\right)=\begin{cases}
0 & x<0\\
x & 0\le x<1\\
1 & x\ge1
\end{cases}$$
which can be expressed more succintly as
$$F\left(x\right)=\min\left\{ 1,x\right\} \left[x\ge0\right]$$
It is interesting to note that the probability of any interval $\left(\left(k-1\right)/n,k/n\right]$
equals its length $1/n$. In other words, the probability-to-length
ratio of each each interval is unity. In the limit, as $n\rightarrow\infty$,
both the probability and the interval length approach zero but the
probability-to-length ratio remains unity. In other words, the continuous
PDF is unity in the interval $\left[0,1\right]$ and zero elsehere.
Mathematically, we write
$$f\left(x\right)=\frac{\mathrm{d}F}{\mathrm{d}x^{-}}=\left[0<x\le1\right]$$
We see that the $\mathcal{X}$ is the support of $f$ as it assigns nonzero values only therein.

Generalizing this example, imagine a subset $\mathcal{X}$ of the real line and a continuous probability measure $P$ that assigns probabilties according to a CDF $F$ that, itself, can be expressed in terms of a density by
$$F\left(x\right)=P\left(\left(-\infty,x\right]\right)=\intop_{-\infty}^{x}f\left(x^{*}\right)\mathrm{d}x^{*}$$
With a continuous PDF available, we can express probability of any Borel set $\mathsf{B}$ by 
$$P\left(\mathsf{B}\right)=\intop_{\mathsf{B}}f\left(x\right)\mathrm{d}x$$
Thus, a complete characterization of any continuous probability measure $P$ defined on Borel sets is achieved by only two objects
 - An uncountable set $\mathcal{X}$ of real numbers
 - A nonnegative continuous density function $f(x)$ with support $\mathcal{X}$ and that integrates to unity