## 4. Expectation

### 4.1 Expectation of a Random Variable

The **expected value**, **mean** or **first moment** of $X$ is defined to be

$$ \mathbb{E}(X) = \int x \; dF(x) = \begin{cases}
\sum_x x f(x) &\text{if } X \text{ is discrete} \\
\int x f(x)\; dx &\text{if } X \text{ is continuous}
\end{cases} $$

assuming that the sum (or integral) is well-defined.  We use the following notation to denote the expected value of $X$:

$$ \mathbb{E}(X) = \mathbb{E}X = \int x\; dF(x) = \mu = \mu_X $$

The expectation is a one-number summary of the distribution.  Think of $\mathbb{E}(X)$ as the average value you'd obtain if you computed the numeric average $n^{-1} \sum_{i=1}^n X_i$ for a large number of IID draws $X_1, \dots, X_n$.  The fact that $\mathbb{E}(X) \approx n^{-1} \sum_{i=1}^n X_i$ is a theorem called the law of large numbers which we will discuss later.   We use $\int x \; dF(x)$ as a convenient unifying notation between the discrete case $\sum_x x f(x)$ and the continuous case $\int x f(x) \; dx$ but you should be aware that $\int x \; dF(x)$ has a precise meaning discussed in real analysis courses.

To ensure that $\mathbb{E}(X)$ is well defined, we say that $\mathbb{E}(X)$ exists if $\int_x |x| \; dF_X(x) < \infty$.  Otherwise we say that the expectation does not exist.  From now on, wheneverwe discuss expectations, we implicitly assume they exist.

**Theorem 4.6 (The rule of the lazy statician)**.  Let $Y = r(X)$.  Then

$$ \mathbb{E}(Y) = \mathbb{E}(r(X)) = \int r(x) \; dF_X(x) $$

As a special case, let $A$ be an event and let $r(x) = I_A(x)$, where $I_A(x) = 1$ if $x \in A$ and $I_A(x) = 0$ otherwise.  Then

$$ \mathbb{E}(I_A(X)) = \int I_A(x) f_X(x) dx = \int_A f_X(x) dx = \mathbb{P}(X \in A) $$

In other words, probability is a special case of expectation.

Functions of several variables are handled in a similar way.  If $Z = r(X, Y)$ then

$$ \mathbb{E}(Z) = \mathbb{E}(r(X, Y)) = \int \int r(x, y) \; dF(x, y) $$

The **$k$-th moment** of $X$ is defined to be $\mathbb{E}(X^k)$, assuming that $\mathbb{E}(|X|^k) < \infty$.  We shall rarely make much use of moments beyond $k = 2$.

### 4.2 Properties of Expectations

**Theorem 4.10**.  If $X_1, \dots, X_n$ are random variables and $a_1, \dots, a_n$ are constants, then

$$ \mathbb{E}\left( \sum_i a_i X_i \right) = \sum_i a_i \mathbb{E}(X_i) $$

**Theorem 4.12**.  Let $X_1, \dots, X_n$ be independent random variables.  Then,

$$ \mathbb{E}\left(\prod_i X_i \right) = \prod_i \mathbb{E}(X_i) $$

Notice that the summation rule does not require independence but the product does.

### 4.3 Variance and Covariance

Let $X$ be a random variable with mean $\mu$.  The **variance** of $X$ -- denoted by $\sigma^2$ or $\sigma_X^2$ or $\mathbb{V}(X)$ or $\mathbb{V}X$ -- is defined by

$$ \sigma^2 = \mathbb{E}(X - \mu)^2 = \int (x - \mu)^2\; dF(x) $$

assuming this expectation exists.  The **standard deviation** is $\text{sd}(X) = \sqrt{\mathbb{V}(X)}$ and is also denoted by $\sigma$ and $\sigma_X$.

**Theorem 4.14**.  Assuming the variance is well defined, it has the following properties:

1.  $\mathbb{V}(X) = \mathbb{E}(X^2) - \mathbb{E}(X)^2$
2.  If $a$ and $b$ are constants then $\mathbb{V}(aX + b) = a^2 \mathbb{V}(X)$
3.  If $X_1, \dots, X_n$ are independent and $a_1, \dots, a_n$ are constants then

    $$ \mathbb{V}\left( \sum_{i=1}^n a_iX_i \right) = \sum_{i=1}^n a_i^2 \mathbb{V}(X_i) $$

If $X_1, \dots, X_n$ are random variables then we define the **sample mean** to be

$$ \overline{X}_n = \frac{1}{n} \sum_{i=1}^n X_i  $$

and the **sample variance** to be

$$ S_n^2 = \frac{1}{n - 1} \sum_{i=1}^n \left(X_i - \overline{X}_n\right)^2 $$

**Theorem 4.16**.  Let $X_1, \dots, X_n$ be IID and let $\mu = \mathbb{E}(X_i)$, $\sigma^2 = \mathbb{V}(X_i)$.  Then

$$ 
\mathbb{E}\left(\overline{X}_n\right) = \mu,
\quad
\mathbb{V}\left(\overline{X}_n\right) = \frac{\sigma^2}{n},
\quad \text{and} \quad
\mathbb{E}\left(S_n^2\right) = \sigma^2
$$