# EE319 - Probability & Random Processes
## *Dr.-Ing. Mukhtar Ullah*, FAST NUCES, Spring 2020
<hr>

## **Lecture 16** (2020-03-13)
## Towards a Calculus of Probability (continued)

### Measures of size for (un)countable sets
The Lebesgue measure, denoted $\mu^{\wedge}$, generalizes the concept of length of an interval in 1-D, area of a rectangle in 2-D, volume of a cuboid in 3-D, and so on to a measure of any set. The Lebesgue measure is useful for uncountable sets but not so for countable sets as it assigns zero value to singletons. To deal with countable sets, the so-called counting measure, denoted $\mu^{\#}$, that generalizes the concept of cardinality for countable sets. 
In the context of probability distributions, the counting measure $\mu^{\#}\left(\mathsf{A}\right)$ of a finite set $\mathsf{A}$ is given by $\left|\mathsf{A}\cap\mathcal{X}^{\#}\right|$. The two measures of the infinitesimal interval
$$
\mathrm{d}\mathsf{B}_{x}=\left(x-\mathrm{d}x,x\right]=\left[x,x+\mathrm{d}x\right)
$$
are then
$$
\mu^{\#}\left(\mathrm{d}\mathsf{B}_{x}\right)=\left[x\in\mathcal{X}^{\#}\right],\quad\mu^{\wedge}\left(\mathrm{d}\mathsf{B}_{x}\right)=\mathrm{d}x
$$
In terms of these two measures, the probability of a Borel set $B$ takes the form of a so-called Lebesgue-Stieltjes integral
$$
P\left(\mathsf{B}\right)=\intop_{\mathsf{B}}f^{\#}\left(x\right)\mu^{\#}\left(\mathrm{d}\mathsf{B}_{x}\right)+\intop_{\mathsf{B}}f^{\wedge}\left(x\right)\left[x\in\mathcal{X}\right]\mu^{\wedge}\left(\mathrm{d}\mathsf{B}_{x}\right)
$$
This justifies the usage of the names discrete probability density (derivative of the probability measure with respect to the counting measure) and continuous probability density (derivative of the probability measure with respect to the Lebesgue measure):
$$
f^{\#}=\frac{\mathrm{d}P}{\mathrm{d}\mu^{\#}},\quad f^{\wedge}=\frac{\mathrm{d}P}{\mathrm{d}\mu^{\wedge}}
$$

### The Random Variable
Recall our proposed strategy for solving any probability problem.
 1. Identify events of interest in any problem.
 1. Find a representation of your events in the form of Borel sets on the real line.
 1. Assign probabilities to simpler Borel sets, that is, right-bounded right-closed intervals.
 1. Determine probabilities of events of interest by application of Kolmogorov axioms and its corollaries.
 
We need to focus on the second point: a way to represent events of interest (in the original event space) to Borel sets (in the event space over the real line). Towards that end, we will employ the concept of a function, or a mapping. Before we get on that road, let us refresh a few concepts of functions.

#### The image of a set
Given a function $g\colon\mathsf{A}\rightarrow\mathsf{B}$, the value
$g\left(a\right)\in B$ assigned to an element $a\in\mathsf{A}$ is
called the image of $a$ under $g$. The set $\left\{ g\left(a\right)\mid a\in\mathsf{A}_{1}\right\} $
of values assigned to elements of a set $\mathsf{A}_{1}\subseteq\mathsf{A}$
is called the image of a set and denoted by $g\left[\mathsf{A}_{1}\right]$.
If the inverse of $g$ exists, that is, if $g$ is one-to-one, then
we write $g^{-1}\left(b\right)$ for the pre-image of $b$. When $g$
is not one-to-one, the point-wise inverse does not exist. The inverse
can still be defined for sets though. For example, $g^{-1}\left[\mathsf{B}_{1}\right]$
is the pre-image of a set $\mathsf{B}_{1}\subseteq\mathsf{B}$.

Given a probability space $\left(\Omega,\mathcal{F},P\right)$ as a model of an experiment, we need a function $X\colon\mathsf{\Omega}\rightarrow\mathbb{R}$ that assigns a real number  $X(\omega)$ to every point $\omega$ of the outcome space in a way that preserves the event structure of the event space $\mathcal{F}$ so that a new probability space $\left(\mathbb{R},\mathfrak{B}\left(\mathbb{R}\right),P_{X}\right)$ can be conveniently used to assign probabilities. The event structure of $\mathcal{F}$ can be preserved only if the inverse $X^{-1}\left[\mathsf{B}\right]$ of every Borel set $B\in\mathfrak{B}\left(\mathbb{R}\right)$ is contained in $\mathcal{F}$. Checking all Borel sets in this way is not practical. Fortunately, we only need to check the right-bounded right-closed intervals of the form $\mathsf{B}_{x}$. All this reasoning leads to the following definition of a random variable.

A *random variable (RV)* with respect to an event space $\mathcal{F}$ is a function  $X\colon\mathsf{\Omega}\rightarrow\mathbb{R}$ that satisfies $X^{-1}\left[\mathsf{B}_{x}\right]\in\mathcal{F}$ for each real $x$. Note that the validity of a function as a RV is determined entirely by the event space $\mathcal{F}$. Neither the values assigned by the RV nor the probability measure $P$ has any role to play as far as the validity of a function to qualify as a RV is concerned. However, a RV is useful only if it assigns values that reflect our quantity of interest. The image set $X\left[\mathsf{\Omega}\right]$ of all the values assigned by $X$ will be referred to as the range of $X$ and denoted by $\mathcal{X}$.

To distinguish distributions from one another, we will use the subscript
$X$ when writing functions such as PDF and CDF. Thus we write $f_{X}$
for the density of $P_{X}$ and refer to it as the PDF of $X$ for
convenience. Similarly, we write $F_{X}$ for the distribution of
$P_{X}$ and refer to it as the CDF of $X$ for convenience. It can
be seen that the range $\mathcal{X}$ of $X$ is also the support
of the density $f_{X}$. The CDF of a discrete RV is a piecewise constant
function. The CDF of a continuous RV is a continuous function. The
CDF of a mixed RV is piecewise continuous function. Only for a mixed
RV will we write $\mathcal{X}^{\#}$ for the set of points at which
the CDF has a jump discontinuity.

The probability of an event that has been represented as a Borel set
$\mathsf{B}$ is then expressed as
$$
P_{X}\left(\mathsf{B}\right)=P\left(X^{-1}\left[\mathsf{B}\right]\right)=P\left(X\in\mathsf{B}\right)
$$
The notation $X\in\mathsf{B}$ is a convenient alternative to the
more mathematical $X^{-1}\left[\mathsf{B}\right]$. Though you should
not confuse the event $X\in\mathsf{B}$ with the Boolean expression
$\left[x\in\mathsf{B}\right]$, an interesting analogy can be drawn
between probabilistic logic (plausible reasoning) and deterministic
logic (deductive reasoning). If you can still not get what I mean,
note that $\left[x\in\mathsf{B}\right]$ is either true or false,
whereas the event $X\in\mathsf{B}$ occurs with some probability.

Let us cast the definitions of CDF and PDF associated with a RV $X$
in our newly acquired language.
\begin{align*}
F_{X}\left(x\right) & =P_{X}\left(\left(-\infty,x\right]\right)=P\left(X\le x\right)\\
f_{X}\left(x\right) & =P_{X}\left(\left\{ x\right\} \right)\left[F_{X}\left(x\right)\ne F_{X}\left(x^{-}\right)\right]+\frac{\mathrm{d}F_{X}}{\mathrm{d}x^{-}}\left[F_{X}\left(x\right)=F_{X}\left(x^{-}\right)\right]
\end{align*}

>**Example**: Is it a RV?
>
>Consider the sample space
$$
\mathsf{\Omega}=\left\{ \text{HH},\text{HT},\text{TH},\text{TT}\right\} 
$$
and the following events of interest
$$
\mathsf{A}_{0}=\left\{ \text{TT}\right\} ,\ \mathsf{A}_{1}=\left\{ \text{HT},\text{TH}\right\} ,\ \mathsf{A}_{2}=\left\{ \text{HH}\right\} 
$$
The event space generated by this collection of events is 
$$
\mathcal{F}=\sigma\left(\left\{ \mathsf{A}_{0},\mathsf{A}_{1},\mathsf{A}_{2}\right\} \right)=\left\{ \emptyset,\mathsf{\Omega},\mathsf{A}_{0},\mathsf{A}_{0}^{\mathsf{c}},\mathsf{A}_{1},\mathsf{A}_{1}^{\mathsf{c}},\mathsf{A}_{2},\mathsf{A}_{2}^{\mathsf{c}}\right\} 
$$
Check the validity of these two point functions
\begin{align*}
X\left(\text{TT}\right) & =0, & X\left(\text{TH}\right) & =1, & X\left(\text{HT}\right) & =1, & X\left(\text{HH}\right) & =2\\
Y\left(\text{TT}\right) & =0, & Y\left(\text{TH}\right) & =-1, & Y\left(\text{HT}\right) & =1, & Y\left(\text{HH}\right) & =0
\end{align*}
The range of $X$ is
$$
\mathcal{X}=\left\{ 0,1,2\right\} 
$$
Checking a few pre-images
$$
X^{-1}\left(\left\{ 0\right\} \right)=\mathsf{A}_{0}\in\mathcal{F},\ X^{-1}\left(\left\{ 1\right\} \right)=\mathsf{A}_{1}\in\mathcal{F},\ X^{-1}\left(\left\{ 2\right\} \right)=\mathsf{A}_{2}\in\mathcal{F},\ X^{-1}\left(\mathbb{R}\setminus\mathcal{X}\right)=\emptyset\in\mathcal{F}
$$
Do you need to test any other events?
>
>The range of $Y$ is
$$
\mathcal{Y}=\left\{ -1,0,1\right\} 
$$
Checking a few pre-images
$$
Y^{-1}\left(\left\{ -1\right\} \right)=\left\{ \text{TH}\right\} \notin\mathcal{F},\,Y^{-1}\left(\left\{ 0\right\} \right)=\mathsf{A}_{1}^{\mathsf{c}}\in\mathcal{F},\,Y^{-1}\left(\left\{ 1\right\} \right)=\left\{ \text{HT}\right\} \notin\mathcal{F},\,Y^{-1}\left(\mathbb{R}\setminus\mathcal{Y}\right)=\emptyset\in\mathcal{F}
$$
Do you need to test any other events?
>
>Clearly, $X$ qualifies as a RV with respect to $\mathcal{F}$ whereas $Y$ does not. Note, however, that $Y$ could be a RV with respect to some other event space. Could you find that event space?

#### A RV must respect the partition
How would you define a RV for these partitions?
<img src="images/two_dice_partition_wrt_sum.png" width="70%" />


#### Dirac Measure
What could be the simplest RV? Consider the smallest event space $\mathcal{F}=\left\{ \emptyset,\mathsf{\Omega}\right\} $. Any RV $X$ definable on $\mathsf{\Omega}$ with respect to $\mathcal{F}$ must assign a constant value $a$ to all the outcomes. The probability of any Borel set $\mathsf{B}$ is then
$$
P_{X}\left(\mathsf{B}\right)=P\left(X^{-1}\left[\mathsf{B}\right]\right)=\left[a\in\mathsf{B}\right]
$$
This probability measure is an example what is called a Dirac measure.
A Dirac measure center at a point $a$, denoted by $\delta_{x}$,
is the probability measure that assigns all the probability,
$1$, to any Borel set $\mathsf{B}$ including $a$ and zero probability to any
Borel set not including $a$. Mathematically, 
$$
\delta_{a}\left(\mathsf{B}\right)=\left[a\in\mathsf{B}\right]
$$
 The associated probability distribution is called the Dirac distribution.
That $X$ is Dirac distributed at $a$ is written as $X\sim\delta_{a}$ with
the following range, strictly positive density, and strictly increasing distribution:
$$
\mathcal{X}=\left\{ a\right\} ,\ f_{X}^{\#}\left(a\right)=F_{X}^{\#}\left(a\right)=1
$$
The PDF and CDF can then be recovered.
$$
f_{X}\left(x\right)=\left[x=a\right],\ F_{X}\left(x\right)=\left[x\ge a\right]
$$

#### Bernoulli Measure
The Dirac RV is too simple to represent uncertainty because concentrating all the probability at a single point makes it deterministic. What could the simplest RV representing uncertainty? That turns out to be Bernoulli distributed. Say the event $\mathsf{A}$ is the only event of interest among all the subsets of an outcome space $\mathsf{\Omega}$. The event space generated by $\mathsf{A}$ is $\mathcal{F}=\left\{ \emptyset,\mathsf{\Omega},\mathsf{A},\mathsf{A^{c}}\right\} $. Any RV $X$ definable on $\mathsf{\Omega}$ with respect to $\mathcal{F}$
must assign the same value to all points $x\in\mathsf{A}$ and a different value to all other points. A RV $X$ is Bernoulli distributed if
$$
X\left[\mathsf{A}\right]=\left\{ 1\right\} ,\ X\left[\mathsf{A^{c}}\right]=\left\{ 0\right\} 
$$
Remember that we treated $\mathsf{A}$ as success and $\mathsf{A^{c}}$ as failure. That $X$ is Bernoulli distributed with probability $p$ of success is written as $X\sim\mathrm{Ber}_{p}$ with the following range, strictly positive density, and strictly increasing distribution:
$$
\mathcal{X}=\left\{ 0,1\right\} ,\ f_{X}^{\#}\left(x\right)=\mathrm{Ber}\left(x\mid p\right)=p^{x}\left(1-p\right)^{1-x},\ F_{X}^{\#}\left(x\right)=\left(1-p\right)^{1-x}
$$
The PDF and CDF can then be recovered.
$$
f_{X}\left(x\right)=f_{X}^{\#}\left(x\right)\left[x\in\left\{ 0,1\right\} \right],\ F_{X}\left(x\right)=F_{X}^{\#}\left(\min\left\{ 1,\left\lfloor x\right\rfloor \right\} \right)\left[x\ge0\right]
$$
Could we express a Bernoulli measure in terms of Dirac measures? The answer is affirmative.
$$
\mathrm{Ber}_{p}=\left(1-p\right)\delta_{0}+p\delta_{1}
$$
