# 1.1 Introduction Prob Stats #

### Primers on Probability and Statistics ###

For a random variable $X$, we define the mean $\mu$ as the expectation, $E(X)$.

Expectation operator is defined as:

$$
E(X) = \int_{-\infty}^{\infty} x f(x) dx
$$

where $f(x)$ is the density function of the random variable. The expectation operator is linear in its arguments. If given a sample of size $n$, an unbiased estimator of the expected value $\bar{x}$ is:

$$
\bar{x} = n^{-1} \sum_{i}^n x_i .
$$

By saying it is unbiased, we are saying $E(\bar{X}) = E (X) = \mu$.

We next define the Variance:

$$
V(X) = \sigma^2 = \int_{-\infty}^{\infty} (x-\mu)^2 f(x) dx = E\left(\left(X - E(X)\right)^2\right).
$$

A property of the variance, is that for 2 constants $a$ and $b$, and 2 random variables $X$ and $Y$

$$
V(aX + bY) = a^2 V(X) + b^2 V(Y) + 2 a b Cov(X,Y)
$$

where $Cov(X,Y)$ is the covariance between the 2 random variables,

$$
Cov(X,Y) = E\left((X-\mu_X)(Y-\mu_Y)\right).
$$

Note that an unbiased estimator for the variance is:

$$
s^2 = (n-1)^{-1} \sum_i^n (x_i - \bar{x})^2.
$$

To see where we get the $(n-1)^{-1}$ comes from, calculate the expected difference between the true variance and the estimator with only $n^{-1}$:

$$
E(\sigma^2 - s_n^2) = E\left[\frac{1}{n}\sum_i^n(x_i -\mu)^2 - \frac{1}{n}\sum_i^n(x_i -\bar{x})^2\right]\\
= \frac{\sigma^2}{n}.
$$

This can be generalized. The $k$th central moment is defined as:

$$
\mu_k = E\left((X-\mu)^k\right).
$$

Skewness is the third central moment, kurtosis is the fourth central moment. There are many different definitions of sample skewness and kurtosis to make them unbiased, especially for small samples.

The normal distribution has 0 skewness and 3 kurtosis.



In [2]:
'''Basic demonstration in python'''

import scipy.stats
import numpy as np

# Random Data between -100 and 100
data = (np.random.rand(10000)-0.5)*200
#Function to calculate moments about mean for sample
samplemean = np.average(data)
samplevar = scipy.stats.moment(data,moment=2)/(len(data)-1)
sampleskew = scipy.stats.skew(data)
samplekurt = scipy.stats.kurtosis(data)
print(samplemean)
print(samplevar)
print(sampleskew)
print(samplekurt)

0.11725329018321863
0.3325138867811026
0.005586148793054809
-1.2005744528917148


### Univariate Distributions ###

#### Binomial Distribution ####

This distribution describes the distribution of results of an experiment where there are only 2 outcomes, and where each trial is independent. If p is the results of obtaining a success,

$$
P(X=m) = {N \choose m} p ^m (1-p)^{N-m}, \\
E(X) = Np, \\
V(X) = Np(1-p).
$$

#### Poisson Distribution ####

The Poisson distribution describes the frequency of events during a fixed time interval, the the chance of each event occuring is independent. This can be derived from the Binomial distribution. We let $\lambda t = n p$, where $\lambda$ is a parameter describing the expected number of events per unit time, and $t$ is the time. This describes a scenario where the expected number of successes in $\lambda t$ is given by the binomial mean $n p$. We now substitute p out of the expressions for the binomial distribution, and take the limit $n \rightarrow \infty$. This describes the scenario where the chance of success $p$ goes to $0$ and the number of trials goes to infinity, while keeping the expected mean of successes constant. This gives:

$$
P(X=x) = \frac{\exp(-\lambda t)(\lambda t)^x}{x!}.
$$

This distribution implies that the waiting time distribution between successes associated with a Poisson process is exponential. We note that $P(X=0) = \exp(-\lambda t)$, which is the probability of having no successes before time $t$. The probability of having a success before time $t$ is thus $1-\exp(-\lambda t)$. This in other words is the continuous random variable corresponding to the waiting time between events. By differentiating it, we get the associated density function $f(t) = \lambda \exp (-\lambda t)$.

#### Uniform Distribution ####

This is a distribution equal anywhere in the range.

#### Normal Distribution ####

This is a distibution with the density function:

$$
f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}}\exp\left(- \frac{(x-\mu )^2}{2 \sigma^2}\right).
$$

If the random variable $X$ is normally distributed with expectation $\mu$ and standard deviation $\sigma$, we denote it as $X \sim N(\mu,\sigma^2)$. Any normal random variable can be transformed into the standard normal distribution with mean 0 and standard deviation 1 by using the transform $Z = \frac{X-\mu}{\sigma}$.

#### Stable Distributions ####

A distribution is stable if a linear combination of two independent random variables with that distribution has the same distribution. A normal distribution is stable, which is extremely useful in typical analysis. For example, if $X$ and $Y$ are two normal distributions, $X+Y$ is also a normal distribution. Other than the normal distribution, the other stable distributions with closed form are the Cauchy distribution and the Levy distribution.

#### Lognormal Distribution ####

This is a distibution with the density function:

$$
f(x) = \frac{1}{x\sqrt{2 \pi \sigma^2}}\exp\left(- \frac{(\ln (x)-\mu )^2}{2 \sigma^2}\right).
$$

One useful property of this distribution is that $\exp (X)$ is a lognormal distribution if and only if $X$ is a normal distribution. This makes it very useful for modelling scenarios like the case where we know the return is a normal random variable, and thus the asset price itself will be a lognormal distribution. 

#### More Exotic Distributions ####

Usually, while we like to work with normal distributions, they are usually not very accurate for modelling real financial data as they tend to be more skewed and leptokurtic than the normal distribution. One method is to combine mixtures of normal distributions $f$:

$$
g(x) = a f_1(x) + (1-a) f_2(x).
$$

For such a mixture of 2 normal distributions, the resulting distribution will have a greater kurtosis than the normal distribution of the same variance.

One other distribution we note here is the Student t distribution, with a parameter called the `degree of freedom' $\nu$:

$$
f_v(t) - (\nu \pi)^{-1/2} \Gamma (\nu/2)^{-1} \Gamma ((\nu+1)/2) (1 + t^2/\nu)^{(\frac{\nu + 1}{2})},
$$

where $\Gamma$ is the gamma function. Again, one useful property of this distribution is that it has a leptokurtic character.

There are many other types of distributions used.

### Multivariate Distributions ###

#### Bivariate Distributions ####

A bivariate density function is a function that is integrable, never negative, and is normalized.
We first define one such distribution, the joint distribution function $F(x,y)$:

$$
F(x,y) = P(X<x,Y<y) = \int_{- \infty}^y\int_{- \infty}^x f(x,y) dx dy, \\
f(x,y) = \frac{\partial^2 F(x,y)}{\partial x \partial y}.
$$

We also define the marginal distribution

$$
H(x) = F(x,\infty),\\
G(y) = F(\infty,y).
$$

The marginal densities

$$
h(x) =H^\prime (x)= \int_{- \infty}^{\infty} f(x,y) dy,\\
g(y) =H^\prime (y)= \int_{- \infty}^{\infty} f(x,y) dx.
$$

Also, the conditional distribution $F(x|y)$ denotes the distribution of $Y$ given that $X$ takes a fixed value.

$$
F(x|y) = \frac{\partial F(x,y)}{\partial x}.
$$

The conditonal density

$$
f(x|y) = \frac{f(x,y)}{g(y)}.
$$

#### Independent Random Variables ####

Two variables $X$ and $Y$ are independent if and only if $F(x,y) = H(x)G(y)$. One way of looking at this is that the conditional distributions for $X$ are all the same, and equal to the marginal distribution $H(x)$. The density functions obey a similar rule, $f(x,y) = h(x)g(y)$.

#### Covariance ####

The covariance was described earlier. It is the first central moment of the joint density function of $X$ and $Y$:

$$
Cov(X,Y) = E\left((X-\mu_X)(Y-\mu_Y)\right).
$$

It is also given by:

$$
Cov(X,Y) = E(XY) - E(X)E(Y).
$$

In parameter notation:

$$
\sigma_{XY} = \mu_{XY} - \mu_X \mu_Y.
$$

This is very important, because of a property discussed above:

$$
V(aX + bY) = a^2 V(X) + b^2 V(Y) + 2 a b Cov(X,Y).
$$

If $X$ and $Y$ are independent, then their covariance and correlation will be 0.

#### Correlation ####

This is a standardized form of covariance, independent of units of measurement

$$
Corr(X,Y) = \frac{Cov(X,Y)}{\sqrt{V(X)V(Y)}}.\\

Corr(aX,bY) = \begin{cases} Corr(X,Y) \text{ if } ab>0\\ -Corr(X,Y) \text{ if } ab<0 \end{cases}.

$$

#### Multivariate Normal Distribution ####

This describes a scenario where there might be multipel normal distributions with non-zero pairwise correlations. Suppose there are $k$ variables $\vec{x}$ with expectations $\vec{\mu}$, and a symmetric covariance matrix 
$$
\hat{V}_{ij} = \begin{cases} \sigma_i^2 \text{ if } i=j \\ \sigma_i \sigma_j \text{ otherwise } \end{cases}.
$$

The multivariate normal density function is given as:

$$
\phi(\vec{x}) = (2 \pi)^{-k/2} |\hat{V}|^{-1/2} \exp(-\frac{1}{2} (\vec{x} - \vec{\mu})^T \hat{V}^{-1} (\vec{x} - \vec{\mu})),
$$

and we write $\vec{X} \sim N_k(\vec{\mu}, \hat{V})$ for the vector random variable.

One important property is that every portfolio $R$ containing these assets $R = \sum_i \omega_i x_i $ is a normal distribution $R\sim N(\mu, \sigma^2), \mu = \vec{\omega}^T\vec{\mu}, \sigma^2 = \vec{\omega}^T \hat{V} \vec{\omega}$.

When two random variables have a bivariate normal distribution they are independent if and only if their correlation is 0. More generally, if $X$ and $Y$ are bivariate normal variables with correlation $\rho_{XY}$ then $E(Y|X) = \mu_y + \rho_{XY} \frac{\sigma_x}{\sigma_y} (X-\mu_x)$, $V(Y|X) = \sigma_y^2(1 - \rho^2)$.