# Overview

In this notebook we discuss expectation. Expectation is a precursor to a number of statistical topics includeing mean, variance, correltaion, etc. A basic understanding of probability is expected.

# Definition

The expected value $\mathbb{E}$ is expressed as the weighted average of the values observed from a random variable $X$.

The calculation has several notations which include use of the expected value operator $\mathbb{E}$ or the expansion of the operator such that the equation usses summation or integral notrations.

Recall that summations and integrals in the context of probability are an infinite sum. We will see later that the convergence of this infinite sum is an important property.

## Discrete Variables

$$ \mathbb{E}\left[ X \right] = \sum x_i f_X$$

Where $f_X$ is the probability mass function of $X$ defined fully as $f_X := f_X(x_i)$.

## Continuous Variables

$$ \mathbb{E}\left[ X \right] = \int_i f_X dx $$

## Constant Variables

In both the discrete and continuous case we see that if $X$ is a constanct such that $x_i = c, \ \forall i$ we have:

$$ \mathbb{E}[c] = c $$

# A Note On Implications Of Sample Statistics
It is often the case that sample statistics or population parameters will need to be calculated. The formulas in the definition section will still apply and will see derevations of expanded equations for different distributions in the following examples sections.

It is worth noting that it is often the case that observations in an experiment setting are regarded as equiprobable independent realizations which is why the formulas used resemble those of the uniform distributions (we will see more on this later). But its important to note that when you see an arethmetic mean being used as the expected value, we have made that assumption.

For example we may encounter an arithmetic mean as the sample mean:

$$ \mathbb{E}[X] = \frac{1}{n} \sum x_i \ \ \rightarrow f_X = \frac{1}{n} \ \ \Rightarrow X \sim \mathcal{U}(0,n), \ |x| = n $$

$$ \mathbb{E} \left[ (X - \mu)^2 \right] = \frac{1}{n} \sum (X - \mu)^2 \ \ \rightarrow f_X = \frac{1}{n} \ \ \Rightarrow X \sim \mathcal{U}(0,n), \ |x| = n $$

# Series Expansions

Sereis expansions are a technique used to approximate a given value. The basic theory is that a given value can be expressed as an infinite series. If we can calculate a few of the terms in the infinite series we can get an approximation that is close enough to the desired value.

There are a number of series that can be selected to perform the approximation. Wikipedia page has a [list of such functions](https://en.wikipedia.org/wiki/List_of_mathematical_series). 

The selection of the series depends on the structure of the function or variable being approximated.

<center>
    <table>
        <tr>
            <th>Series Name</th>
            <th>Value</th>
            <th>Series</th>
            <th>Summation Notation</th>
        </tr>
        <tr>
            <td>Exponential</td>
            <td>$$e^x$$</td>
            <td>$$1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots$$</td>
            <td>$$\sum\frac{x^n}{n!}$$</td>
        </tr>
        <tr>
            <td>Geometric</td>
            <td>$$\frac{1}{1 - x}$$</td>
            <td>$$1 + x + x^2 + x^3 + \cdots $$</td>
            <td>$$\sum x^n$$</td>
        </tr>
        <tr>
            <td>Sinusoidal</td>
            <td>$$ sin(x) $$</td>
            <td>$$ x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \cdots $$</td>
            <td>$$ \sum (-1)^n \frac{x^{2n+1}}{(2n + 1)!}$$</td>
        </tr>
        <tr>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
        </tr>
    </table>
</center>

https://bookdown.org/probability/beta/moment-generating-functions.html

https://en.wikipedia.org/wiki/List_of_mathematical_series

## Exponential Series Expansion Example
Let's look at an example, assume we want to approximate $e^2 = 7.389056...$ using the exponential series. We see the expansion below. Notice the accuracy improves with each additional term in the series.

<center><img src='images/taylor_series_expansion.png' height='300px' width='300px'></center>

https://www.mathsisfun.com/algebra/taylor-series.html

# Characteristic Function
https://www.randomservices.org/random/expect/Generating.html

# Connection With Moments
In the field of descriptive statistics, we use statistics to describe the shape and behavior of distributions and random variables which follow the distribution respectively. Another word for these types of descriptive statistics are [moments](https://en.wikipedia.org/wiki/Moment_(mathematics), and said another way, moments describe the shape of the distribution function's graph mathematically.

The [method of moments](https://en.wikipedia.org/wiki/Method_of_moments_(statistics)) is an attempt to estimate population parameters (moments) using empiracle sample statistics (parameters, moments).

As we will see there are two types of moments that one can persue for a (frequency) distirbution: raw moments and central moments.

## Definition Of Moment

### Generic Moment
The $n^{th}$ moment of a variable/function $X$ about a point $c$ is denoted as $\mu_n$ and expressed as:

$$ \mu_n := \mathbb{E} \left[ (X - c)^n \right] $$

In some contexts we may also see it expressed as:

$$ \mu_n := \mathbb{E}[X^n] $$

I tend to shy away from this notation as there is a high chance of misinterpretation when taken our of context mistakenly.

### Raw Moments
Raw moments are defined as moments which vary around the origin (ie. $c=0$).

$$ \mu_n := \mathbb{E} \left[ (X - 0)^n \right] = \mathbb{E} \left[ X^n \right] $$

The mean is a common raw moment of a distribution. Often denoted as $\mu$ or $\bar x$, the mean is the first raw moment $\mu_1$.

### Central Moments
Central moments $\bar \mu_n$ are those which are tied to the distribution's first central moment ( its mean; $\mu_1$) as with central moments.

As such we will define them as:

$$ \bar \mu_n := \mathbb{E} \left[ (x - \mu_1)^n \right] \ $$

The variance is a common central moment of a distribution. Often denoted as $\sigma$, the variance is the second central moment $\bar \mu_2$.

### Standardized Moments
A standardized moment $\tilde \mu_n$ is one which considers standardized values (ie. a z-score) of a distributions rather than the deviations of the distribution.

The typical z-score is expressed as: 

$$z := \left( \frac{x - \mu}{\sigma} \right)$$

If we rewrite this to be consistent with the notation conventions we are using and incorporate is into our moment functions we have

$$ \tilde \mu_n := \mathbb{E} \left[ \left( \frac{X - \mu_1}{\bar \mu_2} \right) ^n \right] = \mathbb{E} \left[ \left( \frac{X - \mu}{ \sigma} \right) ^n \right] $$

The skewness $\tilde \mu_3$ and kurtosis $\tilde \mu_4$ are common standardized moments of a distribution.

## Common Moments And Symbols
There is a set of "common" moments which are typically referred to as "the first four moments". We see this when looking at the generalized lambda distribution for example. These moments are as follows:

- mean := $\mu_1$
- variance := $\bar \mu_2$
- skewness := $\tilde \mu_3$
- kurtosis := $\tilde \mu_4$


## Moment Generation Function

As the name suggests, the moment generating function (MGF), denoted $M_X$ is a parameterized function which is capable of derifing the moments of the distribution of a variable. In doing so, it's important to realize that moment generating functions are alternataive representations of distributions.

[Wikipedia article](https://en.wikipedia.org/wiki/Moment-generating_function)

### Definition

If $X$ is a random variable with a cumulative density function $F_X$ then the corresponding moment generating function $M_X(c)$ is defined as:

$$M_X(c) := \mathbb{E}[e^{cX}]$$

Provided the expecataion exists for $c$ in some neighborhood of $0$. 

Taking the $n^{th}$ derivative of the MGF will yield the $n^{th}$ moment.

### Intuition

Seeing is believing. In this section we build our intuition or faith in the definition by looking at clear examples.

For any distribution there is a possibility for an infinite number of moments that can be derived. As we will see, the exponential function $e^{cX}$ was the natural selection as the definition of the MGF because of its mathematical properties.

** Connection with Taylor series??

#### Connection with infinite series

The first important property is that the expression can be decomposed as an infinite series:

$$ e^x = \sum \frac{x^n}{n!} = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots$$

$$ => e^{ax} = \sum \frac{(ax)^n}{n!} = 1 + ax + \frac{a^2x^2}{2!} + \frac{a^3x^3}{3!} + \cdots$$

#### Differentiate of infinite series
The second is that the differentiation of the expected value of the infinite series will produce a moment coresponding to the order of the differentiation. Below we can see the relationship between the $n^{th}$ moment $m_n$ and the $n^{th}$ derivative $\frac{d^n}{d^na}$.

$$ m_n = M^{(n)} := \frac{d^n}{d^na} \left[ \mathbb{E}[e^{aX}] \right] $$

$$ = \frac{d^n}{d^na} \left[ \mathbb{E}\left[\sum \frac{(aX)^n}{n!} \right]\right] $$

$$ = \frac{d^n}{d^na} \left[ 1 + a\mathbb{E}[X] + \frac{a^2 \mathbb{E}[X^2]}{2!} + \frac{a^3 \mathbb{E}[X^3]}{3!} + \cdots \right]$$

#### Derevations Of Common Expectations

##### Derive The Mean
Lets have a look at a few examples:

$$ \mu := m_1 $$ 

$$ = \frac{d}{da} \left[ 1 + a\mathbb{E}[X] + \frac{a^2 \mathbb{E}[X^2]}{2!} + \frac{a^3 \mathbb{E}[X^3]}{3!} + \cdots \right] $$

$$ = \frac{d}{da} \left[ 1 \right] + \frac{d}{da} \left[ a\mathbb{E}[X] \right] + \frac{d}{da} \left[ \frac{a^2 \mathbb{E}[X^2]}{2!} \right] + \frac{d}{da} \left[ \frac{a^3 \mathbb{E}[X^3]}{3!} \right] + \cdots] $$

$$ = \frac{d}{da} \left[ 1 \right] + \mathbb{E}[X] \frac{d}{da} \left[ a \right] + \mathbb{E}[X^2]\frac{d}{da} \left[ \frac{a^2}{2!} \right] + \mathbb{E}[X^3]\frac{d}{da} \left[ \frac{a^3 }{3!} \right] + \cdots] $$

$$ = 0 + \mathbb{E}[x]  + \mathbb{E}[X^2]\frac{d}{da} \left[ \frac{a^2}{2!} \right] + \mathbb{E}[X^3]\frac{d}{da} \left[ \frac{a^3 }{3!} \right] + \cdots] $$

$$ = 0 + \mathbb{E}[X]  + a\mathbb{E}[X^2]  + a^2\mathbb{E}[X^3] + \cdots] $$

$$ = \sum_{n=1} a^{n-1} \mathbb{E}[X^n] $$

Now when we convolute very close to the origin, say $a=0$, we have:

$$ = \sum_{n=1} 0^{n-1} \mathbb{E}[X^n] $$

Only when $n=1$ does an $a$ term have an exponent such that is doesn't cancel out the expectaion term. Recall that $x^0 = 1\rightarrow 0^0=1$ so we have:

$$ = \mathbb{E}[X] $$


##### Derive the Variance

Deriving the variance is a bit more complicated as the variance is a central moment and not a raw moment. The basis of the derevation relies on the fact that:

$$ var(X) = \mathbb{E}[X^2] - \mathbb{E}[X]^2 $$

We begin by taking the second derivative of our infinite series and seeing that the $a$ term will again cancel out everything except the term corresponding to the order of the derivative.

$$ m_2  = \frac{d^2}{d^2a} \left[ 1 + a\mathbb{E}[X] + \frac{a^2 \mathbb{E}[X^2]}{2!} + \frac{a^3 \mathbb{E}[X^3]}{3!} + \cdots \right] $$

$$ = \frac{d^2}{d^2a} \left[ 1 \right] + \frac{d^2}{d^2a} \left[ a\mathbb{E}[X] \right] + \frac{d^2}{d^2a} \left[ \frac{a^2 \mathbb{E}[X^2]}{2!} \right] + \frac{d^2}{d^2a} \left[ \frac{a^3 \mathbb{E}[X^3]}{3!} \right] + \cdots] $$

$$ = \frac{d^2}{d^2a} \left[ 1 \right] + \mathbb{E}[X] \frac{d^2}{d^2a} \left[ a \right] + \mathbb{E}[X^2]\frac{d^2}{d^2a} \left[ \frac{a^2}{2!} \right] + \mathbb{E}[X^3]\frac{d^2}{d^2a} \left[ \frac{a^3 }{3!} \right] + \cdots] $$

$$ = 0 + 0  + \mathbb{E}[X^2] + a^2 \mathbb{E}[X^3] + \cdots] $$

$$ = \sum_{n=2} a^{n-2} \mathbb{E}[X^n] $$

Now when we convolute very close to the origin, say $a=0$, we have:

$$ = \sum_{n=2} 0^{n-2} \mathbb{E}[X^n] $$

Only when $n=2$ does an $a$ term have an exponent such that is doesn't cancel out the expectaion term. Recall that $x^0 = 1\rightarrow 0^0=1$ so we have:

$$ m_2 = \mathbb{E}[X^2] $$

Putting this all together:

$$ m_1 = \mathbb{E}[X] $$
$$ m_2 = \mathbb{E}[X^2] $$
$$ m_2 - m_1^2 = \mathbb{E}[X^2] - \mathbb{E}[X]^2 $$ 
$$ var(X) = \bar m_2 =  m_2 - m_1^2$$


For more on the vairance see the [variance notebook](./Variance.ipynb).

### Apply MGF for Specific Distribution

#### Examples with different distributions
https://www.statlect.com/fundamentals-of-probability/moment-generating-function

#### Exponential Distribution
Calculating exponential random variable's MGF



https://www.randomservices.org/random/expect/Generating.html

https://bookdown.org/probability/beta/moment-generating-functions.html

#### Poisson Distribution

https://www.math.ucdavis.edu/~gravner/MAT135B/materials/ch10.pdf



#### Univariate Normal Distribution

We derive the PDF

$$ \phi(x) = \frac{1}{\sigma \sqrt{2\pi}}e^{ - \frac{1}{2}\left( \frac{x-\mu}{\sigma} \right)^2} $$

Now insert into the MGF

$$M_X(c) := \mathbb{E}[e^{cX}]$$

$$ = \int e^{cx} \phi(x) \ dx$$

$$ = \int e^{cx} \frac{1}{\sigma \sqrt{2\pi}}e^{ - \frac{1}{2}\left( \frac{x-\mu}{\sigma} \right)^2} \ dx$$

$$ = \frac{1}{\sigma \sqrt{2\pi}} \int e^{cx} e^{ - \frac{1}{2}\left( \frac{x-\mu}{\sigma} \right)^2} \ dx$$


$$ = \frac{1}{\sigma \sqrt{2\pi}} \int e^{cx} e^{- \frac{1}{2\sigma^2}(x - \mu)^2} \ dx$$

$$ = \frac{1}{\sigma \sqrt{2\pi}} \int  e^{cx - \frac{1}{2\sigma^2}(x - \mu)^2} \ dx$$




Note: This is generally speaking, the end of the road for the general formula. The reason is because we are not able to derive an analytical solution for the integral. We cannot manipulate the equation into a format that supports u-substitution. And, if we split the exponential's exponents and try integration by parts we will see that both terms have an infinitely recursive series of integrals/derivatives and cannot be solved.

The only option is to plug in values for the distribution parameters and hope they provide an algebraic simplification that allows the resulting eqation to be solved. We will see this is the case with the normal distribution.

https://www.itl.nist.gov/div898/handbook/eda/section3/eda3661.htm

#### Univariate Standard Normal

We pickup where we left off with the generic MGF for the univariate normal distribution. We introduce constraints on the population parameters ($\mu=0$ and $\sigma=1$) to help siplify the equation:

$$ = \frac{1}{ \sqrt{2\pi}} \int  e^{cx - \frac{x^2}{2}} \ dx$$

$$ = \frac{1}{ \sqrt{2\pi}} \int  e^{\frac{2cx - x^2}{2}} \ dx$$

$$ = \frac{1}{ \sqrt{2\pi}} \int  e^{\frac{- x^2 + 2cx }{2}} \ dx$$

$$ = \frac{1}{ \sqrt{2\pi}} \int  e^{\frac{- x^2 + 2cx - c^2 + c^2 }{2}} \ dx \tag{complete the square}$$

$$ = \frac{1}{ \sqrt{2\pi}} \int  e^{\frac{- (x^2 - 2cx + c^2 ) + c^2 }{2}} \ dx $$

$$ = \frac{1}{ \sqrt{2\pi}} \int  e^{\frac{- (x - c)^2 + c^2 }{2}} \ dx $$

$$ = \frac{1}{ \sqrt{2\pi}} \int e^{\frac{- (x - c)^2 }{2}} e^{\frac{c^2 }{2}} \ dx $$

$$ = \frac{1}{ \sqrt{2\pi}}e^{\frac{c^2 }{2}} \int e^{\frac{- (x - c)^2 }{2}}  \ dx $$

$$ = \frac{1}{ \sqrt{2\pi}}e^{\frac{c^2 }{2}} \int e^{-\frac{1 }{2} (x - c)^2}  \ dx $$

We can now manipulate the equation so that it resembles a normal distribution density function with $\sigma=1$ and $\mu=c$.

In doing so, we will be able to calculate the integral using the trivial notion that the integral of a density function sums to 1.

$$ = e^{\frac{c^2 }{2}} \int \frac{1}{ \sqrt{2\pi}} e^{-\frac{1 }{2} (x - c)^2}  \ dx $$

$$ = e^{\frac{c^2 }{2}} $$

https://courses.cs.washington.edu/courses/cse312/19sp/schedule/lecture23.pdf

### History

https://hsm.stackexchange.com/questions/3420/what-is-the-history-of-moment-generating-functions-and-the-more-general-charact

### Alternatives
A problem with moment generating functions is that the it may not always be known or exist. Additionally the desired moments may not exist. This is due to the fact that the integrals do not need to converge absolutely?

https://en.wikipedia.org/wiki/Moment-generating_function

There are ways to force the MGF to converge. One such way is to make the convolutional interval based on an imaginary number.

https://www.cs.toronto.edu/~yuvalf/CLT.pdf

## Properties

https://bookdown.org/probability/beta/moment-generating-functions.html

Taylor Series
https://en.wikipedia.org/wiki/Taylor_series

examples

https://www.probabilitycourse.com/chapter6/6_1_3_moment_functions.php


# Expectations Of Linear Combinations
Given a random variable $X$ we define a random variable $Y$ as a linear combination of $X$ such that $Y = a_1X_1  + \cdots a_nX_n + d$. In matrix notation we have $Y = aX + b$.

Linear combinations ofter arise while studdying joint probability. For more information see the [joint probability notebook](../Probability/Joint%20Probability.ipynb).

We can derive moments for these linear combinations:

## Expected Value

Given that $X = \{X_i\}, \  X_i \perp X_j$ we define a random variable $Y$ as a linear combination of $X$ such that $Y = a_1X_1  + \cdots a_nX_n + d$. In other words . $Y = aX + b$.

The expected value is defined by the expectation

$$ \mathbb{E}[Y] = \mathbb{E}[a_1X_1  + \cdots a_nX_n + b] $$
$$ = \mathbb{E}[a_1X_1]  + \cdots \mathbb{E}[a_nX_n] + \mathbb{E}[b] $$
$$ = a_1\mathbb{E}[X_1]  + \cdots a_n\mathbb{E}[X_n] + b $$

Because $X$ is iid., we know that $\mathbb{E}[X_i] = \mathbb{E}[X_j]$ and therefore:

$$ = a\mathbb{E}[X] + b $$

Depending on the values of $a$ and $b$ this expression can be simplieified further. For example if $b=0$, $\mathbb{E}[Y] = a\mathbb{E}[X]$. If $a=1$ and $b=0$ we have $\mathbb{E}[Y] = \mathbb{E}[X]$.

## Variance
$$ Var[Y] = \mathbb{E} \left[ (Y-\mu)^2 \right] $$

$$ = \mathbb{E}[Y^2] - \mathbb{E}[Y]^2 $$

$$ = \mathbb{E}[(aX + b)^2] - \mathbb{E}[aX + b]^2 $$

$$ = \mathbb{E}[a^2X^2 + 2abX + b^2] - (a\mathbb{E}[X] + b)^2 $$

$$ = a^2\mathbb{E}[X^2] + 2ab\mathbb{E}[X] + b^2 - a^2\mathbb{E}[X]^2 - 2ab\mathbb{E}[x] - b^2 $$

$$ = a^2\mathbb{E}[X^2]  - a^2\mathbb{E}[X]^2 $$

$$ = a^2(\mathbb{E}[X^2]  - \mathbb{E}[X]^2) $$

$$ = a^2Var[x] $$


# Expectations Of Common Distributions

Put links here to other notebooks...

# Conditional Expectation

Conditional expectation is founded on the notions of conditional probability. If you are not familiar with these topics, please review the [conditional probability notebook](../Probability/Conditional%20Probability.ipynb)

## Conditional Mean

$$ \mu_{Y|X} = \mathbb{E}\left[ Y|X \right] $$

$$ \mu_{X|Y} = \mathbb{E}\left[ X|Y \right] $$

## Conditional Variance

expand the formulas...

$$ \sigma^2_{Y|X} = \mathbb{E} \left[ \left( Y - \mu_{Y|X} \right)^2 \right] $$

If there is a specific version of $X$ or $Y$ we have:

$$ \sigma^2_{Y|X=x} = \mathbb{E} \left[ \left( Y - \mu_{Y|X=x} \right)^2 \right] $$


https://online.stat.psu.edu/stat414/lesson/19/19.3

https://online.stat.psu.edu/stat414/book/export/html/734

Expanding these equations

$$ = \mathbb{E} \left[ \left( Y - \mu_{Y|X=x} \right)^T\left( Y - \mu_{Y|X=x} \right) \right] $$

$$ = \mathbb{E}\begin{bmatrix}
\begin{bmatrix}
y_1 - \mu_{Y|X}, &
y_2 - \mu_{Y|X}, &
\cdots, &
y_n - \mu_{Y|X}
\end{bmatrix}
\begin{bmatrix}
y_1 - \mu_{Y|X} \\
y_2 - \mu_{Y|X} \\
\vdots \\
y_n - \mu_{Y|X}
\end{bmatrix}
\end{bmatrix}$$

$$ = \mathbb{E}\begin{bmatrix}
(y_1 - \mu_{Y|X})^2, & + &
(y_2 - \mu_{Y|X})^2, & + &
\cdots, & + &
(y_n - \mu_{Y|X})^2
\end{bmatrix}$$


Assume that two variables $X$ and $Y$ are jointly distributed.

# Properties Of Expectations
## Expected Value
### LinearRescaling
$$ \mathbb{E}[aX + b] = a\mathbb{E}[X] + b$$
### Linearity
$$ \mathbb{E}[X +Y] = \mathbb{E}[X] + \mathbb{E}[Y] $$
## Variance
$$ Var[X + Y] = Var[x] + Var[Y] + 2Cov[X,Y] $$
$$ Var[X - Y] = Var[x] + Var[Y] - 2Cov[X,Y] $$

https://bookdown.org/kevin_davisross/probsim-book/expected-values-of-linear-combinations-of-random-variables.html