# Continuous Random Variables and Probability Distributions

Last time we talked about **discrete** random variables, which attain a discrete (i.e. separated) set of values.

### Two ways to see them:
- As values on a big Probability space (universe-centric)
- Via their Probability distributions (value-centric)

Today we'll focus on **probability distributions** as a way of understanding random variables!

## Probability Density Functions (PDFs)

For a discrete random variable, we have a picture for the probability of each of the values.

<img src="images/one_die--as_dots.png">
<img src="images/two_dice--as_dots.png">

However these are usually normalized so that the **probability** that we attain a value is the **area** in each bar.
<img src="images/one_die.png">
<img src="images/two_dice.png">

A **continuous** random variable $X$ is one that attains a continuum of values.  These can be described as having a probability density function $\rho_X(x)$ tells us the **probability density** at each value.  

This means if we look at a small interval $[x, x+dx]$ around $x$ and ask about the probability that the continuous random variable $X$ takes values in there, we have that 

$$
P(x \leq X \leq x + dx) \approx \rho_X(x) dx
$$

**Warning:** This means that $\rho_X(x)$ doesn't tell us the probability $P(X = x)$ of the random variable $X$ attaining a specific value $x$ -- this is (almost) always zero!  Instead we must ask about an **interval** of values!

## Some great examples:

### Uniform random variables

A **uniform random variable** on the interval $[a,b]$ is any random variable $X$ with a probability density function satisfying
$$ $$
$$
\rho_X(x) = \begin{cases}
\frac{1}{b-a} &\qquad \text{if $a \leq x \leq b$,} \\
0 & \qquad \text{otherwise.}
\end{cases}
$$

<img width=400 
src="https://upload.wikimedia.org/wikipedia/commons/9/96/Uniform_Distribution_PDF_SVG.svg">

This is the most common distribution that vanishes outside of a closed interval.

### Gaussian / Normal Distribution

A **normal random variable** $\mathcal{N}(\mu, \sigma^2)$ with mean $\mu$ and variance $\sigma^2>0$ is any random variable $X$ with a probability density function satisfying
$$ $$
$$
\rho_X(x) = \frac{1}{\sqrt{2\pi}\sigma} e^\frac{-(x-\mu)^2}{2\sigma^2} = \frac{1}{\sqrt{2\pi}\sigma}\exp\left({\frac{-(x-\mu)^2}{2\sigma^2}}\right).
$$

<img width=400  src="https://upload.wikimedia.org/wikipedia/commons/7/74/Normal_Distribution_PDF.svg">

This is really important -- and describes a certain kind of "random" noise.

### Log-Gaussian
A **log-normal random variable** with "mean" $\mu$ and "variance" $\sigma^2 > 0$ is any random variable $X$ with a probability density function satisfying
$$ $$
$$
\rho_X(x) 
= \begin{cases}
\frac{1}{\sqrt{2\pi}x \sigma} 
e^\frac{-(\ln(x)-\mu)^2}{2\sigma^2} & \qquad \text{if $x > 0$,}
\\
0 & \qquad \text{otherwise.}
\end{cases}
$$

<img width="400" height="400"
src="https://upload.wikimedia.org/wikipedia/commons/a/ae/PDF-log_normal_distributions.svg">

This is a common non-negative random variable, that comes up when we have "mutliplicative noise".

### Exponential distribution
An **exponential random variable** with parameter $\lambda > 0$ is any random variable $X$ with a probability density function satisfying
$$ $$
$$
\rho_X(x) = 
\begin{cases}
λ e^{−λx} = \lambda \exp(-\lambda x) &\qquad \text{if $x \geq 0$,} \\
0 & \qquad \text{otherwise.}
\end{cases}
$$

<img width="400" height="400"
src="https://upload.wikimedia.org/wikipedia/commons/e/ec/Exponential_pdf.svg">

Here $\lambda$ is called the **rate parameter**, and this is often used to describe the time between occurences of events.

### Poisson distribution -- a useful discrete random variable
An **Poisson random variable** with parameter $\lambda > 0$ is any random variable $X$ with a probability density function satisfying
$$ $$
$$
\rho_X(k) = \begin{cases}
\frac{\lambda^k}{e^\lambda \cdot k!} &\qquad \text{if $k$ is a non-negative integer,} \\
0 & \qquad \text{otherwise.}
\end{cases}
$$

<img width="400" height="400"
src="https://upload.wikimedia.org/wikipedia/commons/1/16/Poisson_pmf.svg">


This is often used to describe the number of events that occur in a given amount of time.

## Independent random variables

Two random varaiables $X$ and $Y$ are called **independent** if for any two intervals $I_1$ and $I_2$ we have that 
$$ $$
$$
P(X \in I_1 \text{ and } Y \in I_2) 
= \int_{x \in I_1} \int_{y \in I_2} \rho_X(x) \rho_Y(y)\, dx\, dy.
$$

In other words the random variables $X$ and $Y$ are independent if their **joint probability density** $\rho_{X,Y}(x,y)$ has the form
$$ $$
$$\rho_{X,Y}(x,y) =\rho_X(x) \rho_Y(y).$$

Otherwise we say that $X$ and $Y$ are **dependent**

## Useful quantities

- **Expected Value / Mean**
$$ $$
$$ E(X) := \int_{x \in \mathbb{R}} x \cdot \rho_X(x) \, dx $$
    
- **Variance**
$$ $$
\begin{align} 
Var(X) := (\sigma_x)^2 &:= E\left((X - E(X))^2\right) 
\\
&= \int_{x \in \mathbb{R}} (x - E(X))^2 \cdot \rho_X(x) \, dx \geq 0
\end{align}

- **Standard Deviation**
$$ $$
$$\sigma_x := \sqrt{Var(X)} \geq 0$$


The transition from discrete to continuous random variables is essentially to replace
$$ $$
$$\sum_\text{values $x$ of $X$} P(X=x) \times (\text{Blah})$$ 
with 
$$ $$
$$\int_{x\in \mathbb{R}} (\text{Blah}) \cdot \rho_X(x)\, dx$$
in all of the formulas!

For two random variables $X$ and $Y$, their **covariance** is given by the formula
$$ $$
$$
\int_{x \in \mathbb{R}} \int_{y \in \mathbb{R}}
(x - E(X))(y - E(Y)) \cdot \rho_{X\times Y}(x,y) \, dy \, dx
$$
$$ $$
which can also be written more simply as 
$$ $$
$$
E\left((X - E(X))(Y - E(Y))\right).
$$


### Another important theorem in Probability

Last time we talked about the **Law of Large numbers**, which gave tangible meaning 
to the expected value $E(X)$ of a random variable $X$ as the **"expected average value"** 
of many independent trials for choosing $X$.

Another important theorem is the **Central Limit Theorem**, which gives a similar tangible meaning to the variance $(\sigma_X)^2$ of a random variable $X$.

**Central Limit Theorem (Informal):** Suppose that $X$ is a random variable with mean zero and finite variance.  Then the sum of $n$ trials of $X$ divided by $\sqrt{n}$ approaches the normal distribution with mean zero and the same variance as $X$.

**Central Limit Theorem (Formal):** Suppose that $X$ is a random variable with $E(X) = 0$ and variance $\sigma^2 < \infty$, and $X_1, X_2, \cdots, X_n$ are independent random variables with the same probability distribution as $X$.  Then the limit
$$ $$
$$
\lim_{n \rightarrow \infty} \frac{X_1 + X_2 + \cdots + X_n}{\sqrt{n}} 
= \mathcal{N}\left(0, \sigma^2\right)
$$
in probability.


<img src="images/16_dice.png">

**Moral Point of CLT:** If we have a lot of small independent things happening with no bias, then their cumulative effect will be given by something very close to a normal distribution!

$$ $$
**Other Comments about the CLT:**
- It tells us that normal distributions $\mathcal{N}(\mu, \sigma^2)$ are really important!
- We can almost see this from our previous numerical examples with LLN. (Look!)
- It implies the Law of Large Numbers (just shift to mean zero and divide by $\sqrt{n}$ again).


## Some Common Confusions:

- The sum of two random variables sums their outcomes, not their probability distributions!  (Look at the sum of two dice to see this!)
$$ $$
- Not all random variables are independent!
$$ $$
- We've been assuming that a nice probabiity density function $\rho_X(x)$ exists, and that the mean $E(X)$ and variance $(\sigma_X)^2$ of $X$ exist -- this is not always true and leads to lots of technical difficulties/assumptions in studying Probability!  It's very ok to assume this most of the time in practice!
$$ $$
- Expected values are your friend, Variances are nice, but probability densities really help you to see what's going on.  Don't be afraid to look! =)
