# Basic Probability Review
## From *Probabilistic Methods of Signal and System Analysis* by Cooper & McGillem

### Chapter 1. Introduction to Probability
#### Engineering Applications of Probability
- One feature of probability: concerns itself with situations that involve uncertainty in some form (e.g. dice, drawing cards, roullette).
- Probability is prevalent even in Physical "laws" at the microscopic level.
    - Even v(t) = Ri(t) is not exactly true at every instant of time, as evidenced by observing current through high resistance with high-gain amplifiers.
    - Physical laws are likely idealized cases that never arise in nature exactly.
##### Types of Situations Requiring Application of Probability
- **Random Input Signals** - Input signals acting on a system, in practice, usually have a certain amount of uncertainty and unpredictability to give it *randomness*.
- **Random Disturbances** - Unwanted disturbances applied to a systems input or output in addition to desired signals. Examples include snaps/pops/crackles in high-gain amplitudes through speakers and noise received by antennae.
- **Random System Characteristics** - System itself has characteristics that are unknown and vary in a random fashion. E.g. an electric power system in which load fluctuates randomly based on usage.
- **System Reliability** - Components of a system fail, causing random outages and losses.
- **Quality Control** - Instead of inspecting all pieces of a massive system, instead inspect elements selected at random.
- **Information Theory** - Provide a quantitative measure for the information content of messages.

#### Random Experiments and Events
An ** Experiment** is some action that results in an *outcome*. A random experiment is one in which the outcome is uncertain before the experiment is performed.
- Equally Likely - Coin toss, all options have same probability of outcome regardless of previous experiments.
- Elementary Event - An event for which there is only one outcome: tossing a coin to obtain a side, rolling a die to obtain an integer.
- Composite Event - An event for which there is more than one way it can be achieved: rolling a die to obtain an even number (3 possibilities (2, 4, 6)).
When the number of events that are being performed are countable, outcomes are said to be **discrete**. When number of events are not-countable, such as reading an ever-changing voltage from a voltmeter, they are said to be **continuous**.

#### Definitions of Probability
##### Relative-Frequency Probability
Closely linked to the frequency of occurence of defined events (aka, frequency of event defines probability). Consider an experiment performed N times and has four possible outcomes (A, B, C, D).
$$ N_{A}+N_{B}+N_{C}+N_{D} = N $$
You can define relative frequency of A, r(A) as:
$$ r(A) = N_{A}/N $$
and $$ r(A)+r(B)+r(C)+r(D)=1 $$
Imagining that N increases without limit a phenomenon known as *statistical regularity* applies, where frequencies tend to stabilize and approach a number, Pr(A). Pr(A) can be taken as the probability of elementary event A where $$ Pr(A) = lim_{n \rightarrow \infty} r(A) $$
Therefore $$ Pr(A)+Pr(B)+...+Pr(M)=1 $$
A summary of conecepts of Relative Frequency:
1. $$ 0\leq Pr(A) \leq 1 $$
2. $$ Pr(A) + Pr(B) + ... + Pr(M) = 1 $$
3. An impossible event is represented by Pr(A) = 0.
4. A certain event is represented by Pr(A) = 1.
When interested in more than one event at a time, you are interested in **joint probability**. Probability of two events, A and B (each equally likely to occur at P=1/2), occuring in a scenario where a successive event is not dependent on the previous (**statistical indepedence**) can be defined as: $$ Pr(A,B) = Pr(A)Pr(B)= 1/2 * 1/2 = 1/4 $$
**Conditional probability** is the probability of one event, A, occuring given that another event, B, has occurred as well, which is designated as $$ Pr(A|B) = Pr(A)/Pr(B) $$.
It is interesting to note that in terms of joint probability: $$ Pr(A,B) = Pr(A|B)Pr(B) = Pr(B|A)Pr(A) $$
Pr(A|B) = Pr(A)

##### Elementary Set Theory
Another definition of probability can be demonstrated using **set theory** which representing data as sets of elements. A set is a collection of objects known as *elements*. It can be defined as: $$ A = \{\alpha_{1}, \alpha_{2},...,\alpha_{n}\}\ where\ A\ is\ the\ set\ and\ \alpha\ are\ elements $$

If  B was a subset of A, the general notation would be $$ B\subset A $$
A helpful note is that if a set has n elements, then there are 2^n subsets.
Set Arithmetic (Many More Rules and Help in Cooper Chapter 1 pgs. 13-18)
- Equality: Set A equals set B *iff*(if and only if) every element of A is an element of B and every element of B is an element of A $$ A=B\ iff\ A \subset B\ and\ B \subset A $$
- Sums: (Also union) A set consisting of all the elements that are elements of A or of B or both. $$ A\cup B $$
- Products: (Also intersection) is the set containing elements only common to both sets. $$ A\cap B $$
- Complement: Set containing all elements of S (aka all possible elements) that are not in A. $$ A \cup \bar A = S  $$ $$ A \cap \bar A = \emptyset $$
- Difference: Difference of B from A is a set consisting of the elements of A that are not in B. $$ A - B = A \cap \bar B = A - (A \cap B) $$

##### Axiomatic Approach
Begin by defining a **probability space** whose elements are all the outcomes from an experiment. Probability space for the rolling of a six-sided die would be: S = {1,2,3,4,5,6}. Each varisous subset is defined as an **event**, such as event {2} corresponding to a die rolling to the 2-dotted face, while {1,2,3} is a subset of the die rolling to a 1, 2, or 3 sided face. S corresponds to a *certain event* such as the die rolling onto any face, null is an *impossible event*, and any event consisting of a single event is an *elementary event*. Probability is assigned to events as long as they meet the following conditions or *axioms*:
$$ Pr(A) \geq 0 $$
$$ Pr(S) = 1 $$
$$ If\ A \cap B = \emptyset, then\ Pr(A\cup B) = Pr(A) + Pr(B) $$
*These axioms are postulates, do not try to prove them.*
Corrolaries include: $$ S \cap \emptyset = \emptyset \ and\ S \cup \emptyset = S $$
By using the Axiomatic approach, you can apply concepts of Relative Frequency Probability to subsets and elements derived using elementary set theory.

#### Conditional Probability
If an event, B, is assumed to have nonzero probability the the conditional probability of A given B is:
$$ Pr(A|B) = {Pr(A\cap B)\over Pr(B)} $$
This can be simplified in some cases
$$ if A \subset B \ then \ A \cap B = A \ and \ {Pr(A|B)=Pr(A)\over Pr(B)} \geq Pr(A) $$
$$ if B \subset A \ then \ A \cap B = B \ and \ {Pr(A|B)=Pr(B)\over Pr(B)} =1 $$
In cases of multiple set, apply the following:
$$ Pr[(A \cup C) \cap B] = Pr[(A \ cap B) \cup (C \cap B)] = Pr(A \cap B) + Pr(C \cap B) $$
$$ Pr[(A \cup C) | B] = Pr[(A \cup C) \cap B]/Pr(B) = {Pr(A \cap B)\over Pr(B)} + Pr(C \cap B)/Pr(B) $$

#### Independence
Two events, A and B, are independent iff $$ Pr(A \cap B) = Pr(A)Pr(B) $$
This can be applied even further if there are n independence events to:
$$ Pr(A_{i}\cap A_{j}\cap ... \cap A_{k}) = Pr(A_{i})Pr(A_{j})...Pr(A_{k}) $$
Keep in mind that mutually exclusive events can never be statistically independent.

#### Combined Experiments
Sometimes, two experiments may be run with entirely different probability sets of S. You can combine two sets into a *cartesian product space* whose elements are all ordered pairs of both experiments. Thus if S1 has n elements and S2 has m elements the cartisian product space of S=S1xS2 with mn elements. If you were to run an experiment of a die and a coin toss, the elements of the die would be S1={1,2,3,4,5,6} while the coin toss would be S2={H,T} and the new S={(1,H), (1,T)...(6,H), (6,T)} composed of twelve elements. Considering an element of the cartesian set A, the probability of A would be defined as:
$$ Pr(A) = Pr(A_{1} x A_{2}) = Pr(A_{1})Pr(A_{2}) $$

#### Bernoulli Trials
Situations in which an experiment is repeated n times in order to find the probability that a particular event occurs exactly k times is known as the *Bernoulli Trials*. Consider an experiment for which an event A has probability Pr(A) = p and the event not occuring is q. In order for event A to happen k times, events that are not A must also occur n-k times. This probability product can be simplified down to (but only for specific ordering):
$$ p^{k} q^{n-k} $$
For non-specific ordering (aka, all possibly combinations that event A can occur k times in n experiments, otherwise known as n *choose* k):
$$ _{n}C_{k} = {{n}\choose{k}} = {n!\over k!(n-k)!} $$
Keep in mind that this will only get you the **number** of event occurence. To find the probability you must apply previously discussed probably function:
$$ p_{n}(k) = Pr(A_{ktimes}) = {{n}\choose{k}} p^{k}q^{n-k} $$

In the event that *n* is very large, binomial coefficients and powers of p and q become difficult to work with. An approximate way to carry out a probability is with the *DeMoivre-Laplace theorem* if npq >> 1 and if |k-np| < the square root of npq. This approximation is:
$$ p_{n}(k) \approx {1\over \sqrt{2\pi npq}} e^{(k-np)^{2}/2npq} $$

### Chapter 2: Random Variables
Probability can be applied to random variables, aka, situations where events are not part of a finite set. A typical random time function is demonstrated as x(t). The time function upon which analysis is being applied is one of an infinite number of time functions that might have occured. All possible time functions will be designated as {x(t)}. When probability functions are also specified, this collection is an **ensemble**. Any particular member of the ensemble is a *sample function* and the value of said function at any given time, say t<sub>1</sub>, is X(t<sub>1</sub>) or X<sub>1</sub>. A random variable is a numerical description of the outcome of a random experiment. If a random variable can assume any value within a specified range then it is *continuous*.

#### Distribution Functions
The **probability distribution function** is the probability of the event that the observed random variable, X, is less than or equal to the allowed value x:
$$ F_{x}(x)= Pr(X \leq x) $$
Since it is a probability, it must satisfy the basic axioms for all values of x.
$$ 0 \leq F_{x}(x) \leq 1 \ for \ -\infty < x < \infty $$
$$ F_{x} -\infty = 0 \ and \ F_{x} \infty = 1 $$
$$ F_{x}(x) is \ nondecreasing \ as \ x \ increases $$
$$ Pr(x_{1} < X \leq x_{2}) = F_{x}(x_{2}) - F_{x}(x_{1}) $$

#### Density Functions
For some calculations of interest, it is preferable to use the derivative of F(x) rather F(x) itself. This is called the **probability density function** and, when it exists, is defined by:
$$ f_{x}(x) = lim