# Basic Probability Review
## From *Probabilistic Methods of Signal and System Analysis* by Cooper & McGillem

### Chapter 1. Introduction to Probability
#### Engineering Applications of Probability
- One feature of probability: concerns itself with situations that involve uncertainty in some form (e.g. dice, drawing cards, roullette).
- Probability is prevalent even in Physical "laws" at the microscopic level.
    - Even v(t) = Ri(t) is not exactly true at every instant of time, as evidenced by observing current through high resistance with high-gain amplifiers.
    - Physical laws are likely idealized cases that never arise in nature exactly.
##### Types of Situations Requiring Application of Probability
- **Random Input Signals** - Input signals acting on a system, in practice, usually have a certain amount of uncertainty and unpredictability to give it *randomness*.
- **Random Disturbances** - Unwanted disturbances applied to a systems input or output in addition to desired signals. Examples include snaps/pops/crackles in high-gain amplitudes through speakers and noise received by antennae.
- **Random System Characteristics** - System itself has characteristics that are unknown and vary in a random fashion. E.g. an electric power system in which load fluctuates randomly based on usage.
- **System Reliability** - Components of a system fail, causing random outages and losses.
- **Quality Control** - Instead of inspecting all pieces of a massive system, instead inspect elements selected at random.
- **Information Theory** - Provide a quantitative measure for the information content of messages.

#### Random Experiments and Events
An ** Experiment** is some action that results in an *outcome*. A random experiment is one in which the outcome is uncertain before the experiment is performed.
- Equally Likely - Coin toss, all options have same probability of outcome regardless of previous experiments.
- Elementary Event - An event for which there is only one outcome: tossing a coin to obtain a side, rolling a die to obtain an integer.
- Composite Event - An event for which there is more than one way it can be achieved: rolling a die to obtain an even number (3 possibilities (2, 4, 6)).
When the number of events that are being performed are countable, outcomes are said to be **discrete**. When number of events are not-countable, such as reading an ever-changing voltage from a voltmeter, they are said to be **continuous**.

#### Definitions of Probability
##### Relative-Frequency Probability
Closely linked to the frequency of occurence of defined events (aka, frequency of event defines probability). Consider an experiment performed N times and has four possible outcomes (A, B, C, D).
$$ N_{A}+N_{B}+N_{C}+N_{D} = N $$
You can define relative frequency of A, r(A) as:
$$ r(A) = N_{A}/N $$
and $$ r(A)+r(B)+r(C)+r(D)=1 $$
Imagining that N increases without limit a phenomenon known as *statistical regularity* applies, where frequencies tend to stabilize and approach a number, Pr(A). Pr(A) can be taken as the probability of elementary event A where $$ Pr(A) = lim_{n \rightarrow \infty} r(A) $$
Therefore $$ Pr(A)+Pr(B)+...+Pr(M)=1 $$
A summary of conecepts of Relative Frequency:
1. $$ 0\leq Pr(A) \leq 1 $$
2. $$ Pr(A) + Pr(B) + ... + Pr(M) = 1 $$
3. An impossible event is represented by Pr(A) = 0.
4. A certain event is represented by Pr(A) = 1.
When interested in more than one event at a time, you are interested in **joint probability**. Probability of two events, A and B (each equally likely to occur at P=1/2), occuring in a scenario where a successive event is not dependent on the previous (**statistical indepedence**) can be defined as: $$ Pr(A,B) = Pr(A)Pr(B)= 1/2 * 1/2 = 1/4 $$
**Conditional probability** is the probability of one event, A, occuring given that another event, B, has occurred as well, which is designated as $$ Pr(A|B) = Pr(A)/Pr(B) $$.
It is interesting to note that in terms of joint probability: $$ Pr(A,B) = Pr(A|B)Pr(B) = Pr(B|A)Pr(A) $$
Pr(A|B) = Pr(A)

##### Elementary Set Theory
Another definition of probability can be demonstrated using **set theory** which representing data as sets of elements. A set is a collection of objects known as *elements*. It can be defined as: $$ A = \{\alpha_{1}, \alpha_{2},...,\alpha_{n}\}\ where\ A\ is\ the\ set\ and\ \alpha\ are\ elements $$

If  B was a subset of A, the general notation would be $$ B\subset A $$
A helpful note is that if a set has n elements, then there are 2^n subsets.
Set Arithmetic (Many More Rules and Help in Cooper Chapter 1 pgs. 13-18)
- Equality: Set A equals set B *iff*(if and only if) every element of A is an element of B and every element of B is an element of A $$ A=B\ iff\ A \subset B\ and\ B \subset A $$
- Sums: (Also union) A set consisting of all the elements that are elements of A or of B or both. $$ A\cup B $$
- Products: (Also intersection) is the set containing elements only common to both sets. $$ A\cap B $$
- Complement: Set containing all elements of S (aka all possible elements) that are not in A. $$ A \cup \bar A = S  $$ $$ A \cap \bar A = \emptyset $$
- Difference: Difference of B from A is a set consisting of the elements of A that are not in B. $$ A - B = A \cap \bar B = A - (A \cap B) $$

##### Axiomatic Approach
Begin by defining a **probability space** whose elements are all the outcomes from an experiment. Probability space for the rolling of a six-sided die would be: S = {1,2,3,4,5,6}. Each varisous subset is defined as an **event**, such as event {2} corresponding to a die rolling to the 2-dotted face, while {1,2,3} is a subset of the die rolling to a 1, 2, or 3 sided face. S corresponds to a *certain event* such as the die rolling onto any face, null is an *impossible event*, and any event consisting of a single event is an *elementary event*. Probability is assigned to events as long as they meet the following conditions or *axioms*:
$$ Pr(A) \geq 0 $$
$$ Pr(S) = 1 $$
$$ If\ A \cap B = \emptyset, then\ Pr(A\cup B) = Pr(A) + Pr(B) $$
*These axioms are postulates, do not try to prove them.*
Corrolaries include: $$ S \cap \emptyset = \emptyset \ and\ S \cup \emptyset = S $$
By using the Axiomatic approach, you can apply concepts of Relative Frequency Probability to subsets and elements derived using elementary set theory.

#### Conditional Probability
If an event, B, is assumed to have nonzero probability the the conditional probability of A given B is:
$$ Pr(A|B) = {Pr(A\cap B)\over Pr(B)} $$
This can be simplified in some cases
$$ if A \subset B \ then \ A \cap B = A \ and \ {Pr(A|B)=Pr(A)\over Pr(B)} \geq Pr(A) $$
$$ if B \subset A \ then \ A \cap B = B \ and \ {Pr(A|B)=Pr(B)\over Pr(B)} =1 $$
In cases of multiple set, apply the following:
$$ Pr[(A \cup C) \cap B] = Pr[(A \ cap B) \cup (C \cap B)] = Pr(A \cap B) + Pr(C \cap B) $$
$$ Pr[(A \cup C) | B] = Pr[(A \cup C) \cap B]/Pr(B) = {Pr(A \cap B)\over Pr(B)} + Pr(C \cap B)/Pr(B) $$

#### Independence
Two events, A and B, are independent iff $$ Pr(A \cap B) = Pr(A)Pr(B) $$
This can be applied even further if there are n independence events to:
$$ Pr(A_{i}\cap A_{j}\cap ... \cap A_{k}) = Pr(A_{i})Pr(A_{j})...Pr(A_{k}) $$
Keep in mind that mutually exclusive events can never be statistically independent.

#### Combined Experiments
Sometimes, two experiments may be run with entirely different probability sets of S. You can combine two sets into a *cartesian product space* whose elements are all ordered pairs of both experiments. Thus if S1 has n elements and S2 has m elements the cartisian product space of S=S1xS2 with mn elements. If you were to run an experiment of a die and a coin toss, the elements of the die would be S1={1,2,3,4,5,6} while the coin toss would be S2={H,T} and the new S={(1,H), (1,T)...(6,H), (6,T)} composed of twelve elements. Considering an element of the cartesian set A, the probability of A would be defined as:
$$ Pr(A) = Pr(A_{1} x A_{2}) = Pr(A_{1})Pr(A_{2}) $$

#### Bernoulli Trials
Situations in which an experiment is repeated n times in order to find the probability that a particular event occurs exactly k times is known as the *Bernoulli Trials*. Consider an experiment for which an event A has probability Pr(A) = p and the event not occuring is q. In order for event A to happen k times, events that are not A must also occur n-k times. This probability product can be simplified down to (but only for specific ordering):
$$ p^{k} q^{n-k} $$
For non-specific ordering (aka, all possibly combinations that event A can occur k times in n experiments, otherwise known as n *choose* k):
$$ _{n}C_{k} = {{n}\choose{k}} = {n!\over k!(n-k)!} $$
Keep in mind that this will only get you the **number** of event occurence. To find the probability you must apply previously discussed probably function:
$$ p_{n}(k) = Pr(A_{ktimes}) = {{n}\choose{k}} p^{k}q^{n-k} $$

In the event that *n* is very large, binomial coefficients and powers of p and q become difficult to work with. An approximate way to carry out a probability is with the *DeMoivre-Laplace theorem* if npq >> 1 and if |k-np| < the square root of npq. This approximation is:
$$ p_{n}(k) \approx {1\over \sqrt{2\pi npq}} e^{(k-np)^{2}/2npq} $$

### Chapter 2: Random Variables
Probability can be applied to random variables, aka, situations where events are not part of a finite set. A typical random time function is demonstrated as x(t). The time function upon which analysis is being applied is one of an infinite number of time functions that might have occured. All possible time functions will be designated as {x(t)}. When probability functions are also specified, this collection is an **ensemble**. Any particular member of the ensemble is a *sample function* and the value of said function at any given time, say t<sub>1</sub>, is X(t<sub>1</sub>) or X<sub>1</sub>. A random variable is a numerical description of the outcome of a random experiment. If a random variable can assume any value within a specified range then it is *continuous*.

#### Distribution Functions
The **probability distribution function** is the probability of the event that the observed random variable, X, is less than or equal to the allowed value x:
$$ F_{x}(x)= Pr(X \leq x) $$
Since it is a probability, it must satisfy the basic axioms for all values of x.
$$ 0 \leq F_{x}(x) \leq 1 \ for \ -\infty < x < \infty $$
$$ F_{x} -\infty = 0 \ and \ F_{x} \infty = 1 $$
$$ F_{x}(x) is \ nondecreasing \ as \ x \ increases $$
$$ Pr(x_{1} < X \leq x_{2}) = F_{x}(x_{2}) - F_{x}(x_{1}) $$

#### Density Functions
For some calculations of interest, it is preferable to use the derivative of F(x) rather F(x) itself. This is called the **probability density function** (*pdf*) and, when it exists, is defined by:
$$ f_{x}(x) = lim_{e\rightarrow 0} {{F_{x}(x+e)-F_{x}(x)}\over e} = {dF_{x}(x)\over dx} $$
The significance of the pdf is best described in terms of the probability element f<sub>x</sub>(x) as:
$$ f_{x}(x)dx=Pr(x<X \leq \  x + dx $$
Which simply states that the probability element f<sub>x</sub>(x)dx is the probability of the event that the random variable X lies in the range of possible values between *x* and *x + dx*.
Some axioms that apply for probability density functions:
$$ f_{x}(x) \geq 0 \ -\infty < x < \infty $$
$$ \int_{-\infty}^{\infty} f_{x}(x)dx = 1 $$
$$ F_{x}(x) = \int_{-\infty}^{x} f_{x}(u)du $$
$$ \int_{x_{1}}^{x_{2}} f_{x}(x)dx = Pr(x_{1}< X \leq x_{2}) $$
**Note:** The density function for a discrete random variable consists of a set of delta functions, each having an area equal to the magnitude of the corresponding discontinuity in the distribution function. It is also possible to have density functions that contain both a continuous part and one or more delta functions.
Situations may occur where one random variable is functionally related to another whose pdf is known and it is desired to determine the pdf of the first RV. Let this RV, Y, be a single-valued, real function of another RV, X. Assume in this scenario: Y=g(X)<sup>4</sup> in which we assume pdf of X is known and denoted by f<sub>x</sub>(x) and pdf of Y is denoted by f<sub>y</sub>(y). Whenever the RV X lies between x and x+dx the RV Y lies between y and y+dy (for a monotonically increasing function of X. Given the pdfs previously mentioned, one can write this relationship as:
$$ f_{y}(y)dy = f_{x}(x)dx $$
from which the desired pdf becomes (with x replaced by it's corresonding funciton of y):
$$ f_{y}(y) = f_{x}(x) {dx \over dy} $$
Since pdfs must be positive:
$$ f_{y}(y) = f_{x}(x) \mid\frac{dx}{dy} \mid $$
Assuming RV Y is related to X by Y=AX, we can follow that:
$$ {dy \over dx }= A $$
<br></br>
$$ f_{y}(y)= {1\over \mid A \mid} f_{x}({y \over A}) $$

#### Mean Values and Moments
Averages are found for a single RV by integrating over the range of possible values that the random variable may assume (*ensemble averaging*) with the result being the mena value. One common notation is with E[X] representing the "expected value of x":
$$ \bar{X} = E[X] = \int_{-\infty}^{\infty} xf(x)dx $$
It can also be done for functions as well:
$$ E[g(x)] = \int_{-\infty}^{\infty} g(x)fx)dx $$
The function g(x)=x<sup>n</sup> is particularly important since it leads to the general moments of the RV:
$$ \bar{X^{n}} = E[X^{n}] = \int_{-\infty}^{\infty} x^{n}f(x)dx $$
The most important moments of X are those where n=1, which is the mean value. n=2 will lead to the **mean-square value**. The mean-square value is important because because it may be interpreted as equal to the time average of the square of a random voltage or current, therefore proportional to the average power (in a resistor) with the square root equal to the rms or effective valueof the random voltage or current.
You can also define the central moments, which are the moments of the difference between a random variable and its mean value. The nth central moment is:
$$ \overline{(X - \bar{X})^{n}} = E[(X - \bar{X})^{n}] = \int_{-\infty}^{\infty}(x - \bar{X})^{n}f(x)dx  $$
In this case, the central moment for n=1 is zero, while n=2 is actually called the *variance* and can be represented by:
$$ \sigma^{2} = \overline{(X - \bar{X})^{e}} = \int_{-\infty}^{\infty}(x - \bar{X})^{2}f(x)dx $$
The variance can also be expressed in an alternative form by using the rules for the expectation of sums:
$$ E[X_{1}+X_{2}+...+X_{m}] = E[X_{1}] + E[X_{2}] + ... + E[X_{m}] $$
By proof found in 2-13 (pg 61):
$$ \sigma^{2} = \bar{X^{2}} - (\bar{X})^{2} $$
It is seen that the variance is the difference between the mean square value and the square of the mean value. The square root of the variance would be the *standard deviation*.

#### The Gaussian Random Variable
The Gaussian, or *normal* density function is important because:
- It's a good mathematical model for many physically observed phenomena. It's fit is justified theoretically as well in many cases.
- It can be extended to handle an arbitrarily large number of RVs conveniently.
- Linear combinations of Gaussian RVs lead to new random RVs that are also Gaussian. Not found in other density functions!
- The random process from which Gaussian RVs are derived can be specified from knowledge of first and second moments only. Not found in other density functions!
- In system analysis, the Gaussian process can be used for completely used in linear or non-linear fashion.

$$ f(x) = \frac{1}{\sqrt{2\pi}\sigma} exp[\frac{-(x-\bar{X})^{2}}{2\sigma^{2}}] \ \ -\infty < x < \infty $$

A few points in connection worth noting:
 1. There is only one maximum and it occurs at the mean value.
 2. The density function is symmetrical about the mean value.
 3. The width of the density function is directly proportional to the *standard deviation*. The width of 2 stdevs occurs at the point where the height is 0.607 of the maximum value. There are also the points of maximum absolute slope
 4. The maximum value of the density function is inversely proportional to the standard deviation. Since the density function has an area of unity, it can be used as a representation of the impusle or delta function by letting stdev approach zero (this representation of the delta function is infinitely differentiable):
 $$ \delta(x-\bar{X}) = lim_{\sigma \rightarrow 0} \frac{1}{\sqrt{2\pi}\sigma} exp[\frac{-(x-\bar{X})^{2}}{2\sigma^{2}}] $$

The general Gaussian distribution function (relation between density and distribution) is:
$$ F(x) = \int^{x}_{-\infty}f(u)du = \frac{1}{\sqrt{2\pi}\sigma} \int^{x}_{-\infty} exp[\frac{-(u-\bar{X})^{2}}{2\sigma^{2}}]du $$

The function that is tabulated is the distribution function for a Gaussian RV that has a mean value of zero and a variance of unity (Xbar = 0, sigma = 1). This distribution function is defined by:
$$ \Phi(x) = \frac{1}{\sqrt{2\pi}} \int^{x}_{-\infty} exp[\frac{-u^{2}}{2}]du $$

By changing one variable, it can be shown that:
$$ F(x) = \Phi(\frac{x -\bar{X}}{\sigma}) $$

Since only positive values of x are tabulated, it can be necessary to use the relationship:
$$ \Phi(-x) = 1 - \Phi(x) $$

Another function that is closely related to Phi(x) is the Q-function defined by:
$$ Q(x) = \frac{1}{\sqrt{2\pi}} \int^{x}_{-\infty} exp[\frac{-u^{2}}{2}]du \ \ for \ which$$
$$ Q(-x) = 1 - Q(x) $$
Therefore:
$$ Q(x) = 1 - \Phi(x) $$
$$ F(x) = 1 - Q(\frac{x - \bar{X}}{\sigma}) $$

Sometimes authors use erf(x) = Phi(x) as the *error function* and erfc(x) = Q(x) as the *complementary error function*.

A method of calculating Q(x) can be done by hand:
$$ Q(x) = \frac{exp(-x^{2}/2}{\sqrt{2\pi}}G(x) $$
$$ G(x) = \frac{1}{\frac{x+1}{\frac{x+2}{\frac{x+3}{x+...}}}} $$
The proper number of times to determine G(x), the product of x and n should be at least 30. **Note:** The Q function is useful in calculatin the probability of events that occur very rarely.

The *central limit theorem* concerns the sum of a large number of independent RVs that have the same pdf. For a set of RVs, assuming they all have the same mean values and the same variance, define a normalized sum as:
$$ Y = \frac{1}{\sqrt{n}} \sum^{n}_{k=1}(X_{k}-m) $$
Which states for nearly all conditions that weak enough to be realized by almost any RV in real life that the pdf for Y approaches a Gaussian density function as n becomes large regardless of the density funcitons of the individual X RVs. Also, because of normalization, the RV Y will have a zero mean and variance matching the X RVs.

**Index for Density Functions related to Gaussian:**
- Distribution of Power - (pg. 71-72)
- Rayleigh distribution - (pg. 73-75)
- Maxwell distribution - (pg. 75-77)
- Chi-square distribution - (pg. 77-78)
- Log-normal distribution - (pg. 78-80)

### Other Probability Density Functions
#### Uniform Distribution
Usually arises in physical situations in which there is no preferred value for the RV. It can generally represented as:
$$ f(x) = \frac{1}{x_{2}-x_{1}} \ \ x_{1}<x<x_{2}; \ \ = 0 \ otherwise $$
It is straightforward to show that:
$$ \bar{X} = \frac{1}{2}(x_{1}+x_{2}) $$
and
$$ \sigma^{2}_{x} = \frac{1}{12}(x_{2}-x_{1})^{2} $$
The pdf of a uniformly distributed random variable is obtained easily from the density function by integration:
$$ F_{x}(x)=0 \ x\leq x_{1} $$
$$ = \frac{x-x_{1}}{x_{2}-x_{1}} \ x_{1} < x \leq x_{2} $$
$$ = 1 \ x>x_{2} $$

#### Exponential and Related Distributions
For uniform distribution, events occuring at random time instants are often *assumed* to occur at times that are equally probable. If the average time interval between events is denoted as tau, then the probability that an event will occur in a time interval delta that is short compared to tau is just deltat/tau regardless of where that time interval is. Using this assumption, it is possible to derive the probability distribution function (and hence density function) for the time interval between events.
$$ 1 - F(\tau)= \ \text{probability that even did not occur between} \ t_{0} \ and \ t_{0} + \tau $$
$$ \frac{\Delta t}{\bar{\tau}} = \ \text{probability that it did occur in} / \Delta t $$
it follows that
$$ F(\tau + \Delta t) - F(\tau) = [1-F(\tau)](\frac{\Delta t}{\bar{\tau}}) $$
$$ \text{Dividing both sides by } \Delta t \ \text{and letting } \Delta t \ \text{ approach zero, it is clear that} $$
$$ lim_{\Delta t \rightarrow 0} \frac{F(\tau + \Delta t) - F(\tau)}{\Delta t} = \frac{dF(\tau)}{d\tau} = \frac{1}{\tau} [1-F(\tau)] $$
The latter to terms comprise a first-order diffeq that can be solved to yield:
$$ F(\tau) = 1 - exp(\frac{-\tau}{\bar{\tau}}) \ \ \tau \geq 0 $$

#### Delta Distributions
When the possible events could assume only discrete sets of values, the appropriate pdf was set of delta functions. For example, when a RV can only assume two possible values, x1 or x2, it is specified that it takes on x1 with probability p1 and x2 with probability p2 with p2 = 1 - p1. Thus the pdf for X is:
$$ f(x) = p_{1} \delta (x - x_{1}) + p_{2} \delta (x - x_{2}) $$
The mean value associated with the random variable is evaluated easily as:
$$ \bar{X} = \int_{-\infty}^{\infty}x[p_{1} \delta (x - x_{1}) + p_{2} \delta (x - x_{2})]dx $$
$$ = p_{1}x_{1} + p_{2}x_{2} $$
The mean-square value is determined similarily from:
$$ \bar{X^{2}} = \int_{-\infty}^{\infty}x^{2}[p_{1} \delta (x - x_{1}) + p_{2} \delta (x - x_{2})]dx $$
$$ = p_{1}x_{1}^{2} + p_{2}x_{2}^{2} $$
Hence variance can be defined as:
$$ \sigma_{x}^{2} = \bar{X^{2}} - (\bar{X})^{2} = p_{1}x_{1}^{2} + p_{2}x_{2}^{2} - (p_{1}x_{1} + p_{2}x_{2})^{2} $$
$$ = p_{1}p_{2}(x_{1}-x_{2})^{2}  $$
It should be noted that similar delta functions exist for random variables that can assume any number of of discrete levels. Thus if there are n possible levels of x with corresponding p probabilities then the pdf is:
$$ f(x) = \sum_{i=1}^{n} p_{i} \delta (x-x_{i}) $$
in which
$$ \sum_{i=1}^{n} p_{i} = 1 $$
Using this technique, the mean value of an RV can be shown to be:
$$ \bar{X} = \sum_{i=1}^{n} p_{i}x_{i} $$
and the mean-square value is:
$$ \bar{X^{2}} = \sum_{i=1}^{n} p_{i}x_{i}^{2} $$
Finally, with the variance being:
$$ \sigma_{x}^{2} = \sum_{i=1}^{n} p_{i}x_{i}^{2}- (\sum_{i=1}^{n} p_{i}x_{i})^{2} = \frac{1}{2}\sum_{i=1}^{n} \sum_{j=1}^{n} p_{i} p_{j} (x_{i} - x_{j})^{2} $$

### Conditional Probability Distribution and Density Functions
Conditional probability is the quantity expressing the probability of one event given the occurance of another event in the *same probability space*. This can extend to the case of random variables that are continuous as well.
First, define the conditional pdf for a RV X given that an event M has taken place. The distribution function is denoted and defined by:
$$ F(x|M) = Pr[X \leq x|M] $$
$$ = \frac{Pr [X \leq x,M]}{Pr(M)} \ \ {Pr(M) > 0} $$
$$ where \ \ X \leq x,M \text{ is the event of all outcomes } \xi \ such \ that $$
$$ X(\xi) \leq x \ and \ \xi \in M $$
$$ where \ X(\xi) \text{ is the outcome of the random variable X when the outcome of the experiment is } \xi $$
It can be shown that F(x|M) is a valid pdf and must have the same properties as any other distribution function by having the following characteristics:
$$ 1. \ 0\leq F(x|M) \leq 1 \ \ \ -\infty < x < \infty $$
$$ 2. \ F(-\infty|M) = 0 \ \ \ F(\infty|M) = 1 $$
$$ 3. \ F(x|M) \text{ is nondecreasing as x increases} $$
$$ 4. \ Pr[x_{1} < X \leq x_{2}|M] - F(x_{2}|M) - F(x_{1}|M) \geq 0 $$

It is necessary to say something about the event M upon which probability is conditioned:
1. Event M may be an event that can be expressed in terms of the RV X.
2. Event M may be an event that depends upon some other RV which may be continuous or discrete.
3. Event M may be an event that depends upon both the RV X and another RV.
For F(x|M) the pdf can be described as:
$$ f(x|M) = \frac{dF(x|M)}{dx} = \frac{f(x)}{F(m)} = \frac{f(x)}{\int^{m}_{-\infty}f(x)dx} \ \ x< m $$
$$ = 0 \ \ x \geq m $$
The conditional pdf can also be used to find conditional means and expectations. Conditional mean is:
$$ E[X|M] = \int^{\infty}_{-\infty} xf(x|M)dx $$
While conditional expectation of any g(x) is:
$$ E[g(X)|M] = \int^{\infty}_{-\infty} g(x)f(x|M)dx $$
You can apply Gaussian pdfs to these conditionals, an example is found on pg. 91.

### Examples and Applications (pgs. 93-101)

## Chapter 3: Several Random Variables 
### Two Random Variables
In order to deal with situations involving two random variables, it is necessary to extend the concept of probability distribution and density functions. Let the two random variables X and Y define a *joint probability distribution function* defining the event that RV X is less than or equal to x and RV Y is less than or equal to y:
$$ F(x,y) = Pr[X \leq x, Y \leq y] $$
The joint pdf has properties analagous to those of a single variable:
$$ 1. \ 0\leq F(x,y)\leq 1 \ \ \ -\infty < x < \infty \ \ \ -\infty < y < \infty $$
$$ 2. \ F(-\infty, y) = F(x, -\infty ) = F(-\infty , -\infty) = 0 $$
$$ 3. \ F(\infty , \infty) = 1 $$
$$ 4. \ F(x,y) \text{ is a nondecreasing function as either x or y, or both, increase} $$
$$ 5. \ F(\infty , y) = F_{y}(y) \ \ \ F(x, \infty ) = F_{x}(x) $$
It is also possible to define a joint pdf by differentiating the distribution function. Since there are two independent variables, this must be done partially, thus:
$$ f(x,y) = \frac{\delta^{2}F(x,y)}{ \delta x \delta y} $$
The probability element is:
$$ f(x,y)dxdy = Pr[x<X \leq x+dx, y<Y \leq y+dy] $$
Properties of joint pdfs are:
$$ 1. \ f(x,y) \geq 0 \ \ \ -\infty < x < \infty \ \ -\infty < y < \infty $$
$$ 2. \ \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(x,y)dxdy = 1 $$
$$ 3. \ F(x,y) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(u,v)dvdu $$
$$ 4. \ f_{x}(x) = \int_{-\infty}^{\infty} f(x,y)dy \ \ \ f_{y}(y) = \int_{-\infty}^{\infty} f(x,y)dx $$
$$ 5. \ Pr[x_{1} < X \leq x_{2}, y_{1} < Y \leq y_{2}] = f_{x}(x) = \int_{x1}^{x2} \int_{y1}^{y2} f(x,y)dydx $$
If you have a pair of RVs having a density function that is constant between x1 and x2 and y1 and y2, you would have:
$$ f(x,y) = \frac{1}{(x_{2}-x_{1})(y_{2}-y_{1})} \ \ \ for \ x_{1}< x \leq x_{2}; \ y_{1} < y \leq y_{2} $$
$$ = 0 \ elsewhere $$
The joint pdf can be used to find the expected value of functions of two random variables. The expected value of any function g(X,Y) can be found from:
$$ E[g(X,Y)] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} g(x,y)f(x,y)dx dy $$
When the function g(X,Y) = XY the expected value is known as the *correlation* and can be determined by:
$$ E[XY] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} xyf(x,y)dxdy $$

### Conditional Probability Revistied
From the basic definition of the conditional distribution function it would follow that:
$$ F_{x}(x|Y \leq y) = \frac{Pr[X \leq x,M]}{Pr(M)} = \frac{F(x,y)}{F_{y}(y)} $$
Another possible definition of M is when RV Y is greater than y1 but less than or equal to y2:
$$ F_{x}(x|y_{1}< Y \leq y_{2}) = \frac{F(x,y_{2})-F(x,y_{1})}{F_{y}(y_{2})-F_{y}(y_{1})} $$
In both situations, M has a nonzero probability Pr(M) > 0.

### Statistical Independence
Reminder that RVs that arise from different physical sources are almost always statistically independent. The joint pdf for SI RVs can always be factored into the two marginal density functions:
$$ f(x,y) = f_{x}(x)f_{y}(y) $$
Expected value of the product of two SI RVs is simpley the product of their mean values. If either is zero, the result will be zero.
If X and Y are SI, the joint density function is factorable and it becomes:
$$ f(x|y) = \frac{f_{x}(x)f_{y}(y)}{p_{y}(y)} = f_{x}(x) $$
$$ f(y|x) = \frac{f_{x}(x)f_{y}(y)}{f_{x}(x)} = f_{y}(y) $$

### Correlation Between Random Variables
One of the important applications of joint pdfs is that of specifying the *correlation* of two random variables (aka dependence of one RV on another). If two RVs, X and Y, have possible values of x and y, then the expected value of their product is known as the correlation as:
$$ E[XY] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} xyf(x,y)dxdy = \bar{XY} $$
If both of the RVs have nonzero means, then it can be more convenient to find the correlation with the mean values subtracted out as, also known as the *covariance*:
$$ E[(X-\bar{X})(Y-\bar{Y})] = E[XY] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} (x-\bar{X})(y-\bar{Y})f(x,y)dxdy $$
The *correlation coefficient* or *normalized covariance* can be used to determine correlation without regard to magnitude of either RV:
$$ \rho = E[[\frac{X-\bar{X}}{\sigma_{x}}][\frac{Y-\bar{Y}}{\sigma_{y}}]] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} [\frac{X-\bar{X}}{\sigma_{x}}][\frac{Y-\bar{Y}}{\sigma_{y}}]f(x,y)dxdy $$
The RV resulting from an RV having it's mean subtracted out and divided by the standard deviation is called the *standardized variable* and has zero mean and unit variance. Correlation coefficient can also be simplified by multiplying out the terms and carrying out the integration to yield:
$$ \rho = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \frac{xy - \bar{X}y - \bar{Y}x + \bar{X}\bar{Y}}{\sigma_{x}\sigma_{y}} f(x,y)dxdy $$
$$ \rho = \frac{E(XY) - \bar{X}\bar{Y}}{\sigma_{x}\sigma_{y}} $$
To investigate further properties of correlation b/t RVs, see pgs. 123-126.

### Density Functions of the Sum of Two Random Variables
Let X and Y be statistically independent RVs with density functions of fx(x) and fy(y) and a sum of Z = X + Y. To obtain the pdf of Z, fz(z), start with the probability distribution function of:
$$ F_{z}(z) = Pr(Z \leq z) = Pr(X+Y \leq z) $$
Integrate the joint pdf f(x,y) over the region below the line, x+y=z. For ever fixed y, x must be such that -infinity<x<z-y :
$$ F_{z}(z) = \int^{\infty}_{-\infty}\int^{-z-y}_{-\infty}f(x,y)dxdy $$
For cases in which X and Y are SI, the joint pdf is factorabe as:
$$ F_{z}(z) = \int^{\infty}_{-\infty}f_{y}(y)\int^{-z-y}_{-\infty}f_{x}(x)dxdy $$
The pdf of Z is obtained by differentiating Fz(z) with respenct to z:
$$ f_{z}(z) = \frac{F_{z}(z)}{dz} = \int^{\infty}_{-\infty}f_{y}(y)f_{x}(z-y)dy $$
**Note:** The above example was integrated by dx, integrating by dy would lead to something similar as:
$$ f_{z}(z) = \frac{F_{z}(z)}{dz} = \int^{\infty}_{-\infty}f_{x}(x)f_{y}(z-x)dx $$

### The Characteristic Function
Characteristic functions are a method similar to using transform functions to simplify convolutions of functions involving many random variables with variate probability density functions. The *characteristic function* of an RV defined to be:
$$ \phi(u) = E[e^{juX}] $$
and this expected value can be obtained from:
$$ \phi(u) = \int^{\infty}_{-\infty}f(x)e^{jux}dx $$
with the right side of the equation is the Fourier transform of the density function f(x). By analogy to the inverse Fourier transform, the density function can be obtained from:
$$ f(x) = \frac{1}{2\pi} \int^{\infty}_{-\infty}\phi(u)e^{-jux}dx $$
Considering the problem of finding the pdf of the sum of two independent RVs, X and Y, where Z=X+Y. The characteristic functions for these RVs would be:
$$ \phi_{x}(u) = \int^{\infty}_{-\infty}f(x)_{x}e^{jux}dx $$
$$ \phi_{y}(u) = \int^{\infty}_{-\infty}f(y)_{y}e^{jux}dy $$
Since convolution corresponds to multiplication of transforms (aka characteristic functions) it follows that the characteristic function of Z is:
$$ \phi_{z}(u) = \phi_{x}(u)\phi_{y}(u) $$
With the resulting density function becoming:
$$ f_{z}(z) = \frac{1}{2\pi} \int^{\infty}_{-\infty}\phi_{x}(u)\phi_{y}(u)e^{-jux}du $$