# 1: Introduction and Multivariate Normals

# Why sampling random vectors is hard
So for, our primary focus has been sampling independent sets of random numbers. However, let's suppose we are interested in sampling instead from a random vector in Rd, which potentially has an arbitrary joint pdf (i.e., it may not be independent). 

Obviously, if we are assuming they are independent, we can use all of our previous techniques once we have factored the pdf/pmf/cdf appropriately.

However, the first thing we will notice is that for a random vector; there is generally no such thing as an inverse transform. the first thing we will notice is that for a random vector; there is generally no such thing as an inverse transform. Let's suppose it has multidimensional cdf F(x1,...,xd). Then the equation F(x1,...,xn)=u for u∈(0,1) does not have a unique solution. This will typically define some sort of d−1 dimensional curve in Rd. 

What, then, about rejection sampling?

Well, the tricky part with rejection sampling in higher dimensions is that (1) we need to be able to sample from a proposal density in Rd in the first place, which we don't have a good method for and (2) generally speaking, the values of c will almost always get very large (the curse of dimensionality). 

In this section, we'll start with probably the simplest distribution to sample from: the multivariate normal.

## Sampling from the multivariate normal distribution

In [1]:
import numpy as np
from numpy.linalg import cholesky, eig

from scipy.stats import multivariate_normal, norm

## Part 1

n = 1000

mu = np.array([0.25, 0.5, -1])
Sigma = [[2., 0.9, 1.2],
         [0.9, 1.4, 1.],
         [1.2 , 1., 1.]]
Sigma = np.array(Sigma) 

# Eigenvalue decomposition of the covariance matrix
w,v = eig(Sigma)
print(w)
# Cholesky decomposition
c = cholesky(Sigma)

Z = norm.rvs(size=(3, n))

X = np.zeros((3, n))
X = c.dot(Z)

## Part 2

mvn = multivariate_normal(mean=mu, cov=Sigma)
X = mvn.rvs(size=n)

[3.58828805 0.76275282 0.04895913]


# 2: Copulas

## Motivating Example
Suppose have a data set X1,...,Xn of d-dimensional vectors, and we wish to generate more such samples. 

Perhaps these are the arrival times of a customer, how long it takes them to place an order, how long it takes for their order to be made once they are in service, and how much they spent. It makes sense that the last three of these would be correlated - the longer someone spends ordering, the more likely they are to have ordered more items, and thereby spent more money. Perhaps customers arriving at certain times of day might have longer orders - dinner patrons at a restaurant might order multiple courses for multiple people, whereas morning patrons may just be buying a coffee.

In this case, the marginal distributions are likely non-normal, but perhaps they have a joint structure that is similar to a normal distribution. So, if we change these into something that is (approximately) multivariate normal, we can estimate their covariance matrix and work backwards.

### Sampling from the gaussian copula
The gaussian copulas with parameter Σ (positive definite matrix) is the copula of a multivariate normal with μ=0 (this is actually not important) and covariance Σ where Σi,i =1. We can sample (U1,...Ud) as follows:

- Generate a sample (X1,...Xd)∼N(0,Σ) using the previously discussed method with the Cholesky decomposition

- Let (U1,...Ud)∼(Φ(X1),...,Φ(Xd))

We can also use the scipy.stats.multivariate_normal object with method .rvs() to sample this

## Gaussian Copula

### Part I

Write a function that generates random vectors from the Gaussian Copula given a matrix Sigma.

### Part 2

Suppose $X_i$ is a random vector with d=3, a Gaussian Copula, and the marginals are $exp(1)$. Let $Y_i = \sum_j=1^3 X_i(j)$

Calculate the variance of Y_i when $\Sigma=I_3$ (the identity matrix) or when it is the matrix given below. Calculate the covariance matrix of $X_i$.

In [4]:
import numpy as np
from scipy.stats import multivariate_normal, norm, expon

# part I

def gaussian_copula(Sigma, n):
    d = len(np.diag(Sigma))
    X = multivariate_normal.rvs(cov=Sigma, size=n)
    C = np.zeros((n,d))
    for j in range(0, d):
        C[:,j] = norm.cdf(X[:,j], scale=np.sqrt(Sigma[j,j]))
    return C.T

# part II

C0 = gaussian_copula(np.eye(3), 1000)
X0 = expon.ppf(C0)
Y0 = np.sum(X0, axis=0)

Sigma = np.array([[1, 0.8, 0.6,], [0.8, 1, 0.7], [0.6, 0.7, 1]])
C1 = gaussian_copula(Sigma, 1000)
X1 = expon.ppf(C1)
Y1 = np.sum(X1, axis=0)

print(np.var(Y0))
print(np.var(Y1))

print(np.cov(X0))
print(np.cov(X1))

3.057390168962806
7.159829864691528
[[ 0.97720046  0.03737646 -0.01648467]
 [ 0.03737646  0.99180865  0.0178858 ]
 [-0.01648467  0.0178858   1.01388633]]
[[0.99119078 0.78858536 0.59409564]
 [0.78858536 1.02239611 0.68206927]
 [0.59409564 0.68206927 1.02390941]]


## Sampling from the t-copula
A common downside of the gaussian copula is that tail dependence is very weak. Let's consider the following toy example - let's suppose AAPL and SPX have a joint multivariate normal distribution. Let's say they each have a standard deviation of daily returns of 1%, and their correlation is quite high 0.7, and let's say their expected return is something like 0. If the SPX is down 1% today, we would expect AAPL to be down about 0.7% on average, which seems reasonable. 

But what if the SPX crashed down 50% tomorrow? If they had a gaussian copula, then we would expect AAPL to only be down about 35 - but in reality, if there is a crash, we essentially expect correlation to go to one - i.e. AAPL should be down about 50% as well. 

The t-copula is an alternate copula with much stronger tail dependences. It is, as you might expect, the copula for a multivariate t-distribution. What exactly is a multivariate t-distribution? 

This will induce fatter tails in the marginals and greater tail dependencies because outliers occur because of small values of Y.

We can use the following algorithm to sample from the tcopula with parameters Σ and n:
- Sample Z∼N(0,Σ) and Y χn^2
- Let $X~Z/\sqrt(Y/n)$
- Return U=(F(X1),...,F(Xd)) where F is the t cdf
​


## T-copula

## Part I

Write code that samples from a t-copula.

## Part II

Suppose $X_i$ is a random vector with d=3, and the marginals are $exp(1)$. Let $Y_i = \sum_j=1^3 X_i(j)$

Let Sigma be the matrix below. Estimate the probability that $Y_i>10$ if we have a gaussian copula, and if we have a t-copula with 4 degrees of freedom.

In [2]:
import numpy as np
from scipy.stats import chi2, expon, multivariate_normal, norm, t

def gaussian_copula(Sigma, n):
    d = len(np.diag(Sigma))
    X = multivariate_normal.rvs(cov=Sigma, size=n)
    C = np.zeros((n,d))
    for j in range(0, d):
        C[:,j] = norm.cdf(X[:,j], scale=np.sqrt(Sigma[j,j]))
    return C.T

def t_copula(Sigma, df, n):
    d = len(np.diag(Sigma))
    X = multivariate_normal.rvs(cov=Sigma, size=n)
    v = chi2.rvs(df, size=n)
    T = X
    for i, x in enumerate(X):
        T[i] = x / np.sqrt(v[i] / df)
    C = np.zeros((n,d))
    for j in range(0, d):
        C[:,j] = t.cdf(T[:,j], df=df, scale=np.sqrt(Sigma[j,j]))
    return C.T

Sigma = np.array([[1, 0.8], [0.8, 1.0]])

n = 10_000

C0 = gaussian_copula(Sigma, n)
X0 = expon.ppf(C0)
Y0 = np.sum(X0, axis=0)

C1 = t_copula(Sigma, 1, n)
X1 = expon.ppf(C1)
Y1 = np.sum(X1, axis=0)

print(np.mean(Y0 > 10))
print(np.mean(Y1 > 10))

0.0055
0.0056


## Other Copulas
There are of course, infinitely many other copulas (every possible joint distribution has a copula). There are relatively few other copulas, though, that have neat parametric forms and can be simulated from in a simple manner.

The most common types of these are what are called Archmidean copulas; however, working with them is beyond the scope of this course.

# 3: Markov Chain Monte Carlo

## Introduction
While the copulas can be useful for many applications, ultimately they only work exactly if the distribution we are looking to sample from happens to have a copula we know how to work with.

Let's return to the basic question: how do we sample from a d-sample dimensional density f(x1,...,xd)?

Markov Chain Monte Carlo gives us a way of generating samples that closely approximate this distribution. Moreover, it turns out that we often don't even have to know the exact density; we just need to know it up to a constant of proportionality.

## A note about Bayesian Statistics
This kind of problem frequently arises in Bayesian statistics. The Bayesian approach works as follows. Our data X1,...,Xn is assumed to come from a parametric model $P(X_1,...,X_n|\Theta)$. We have some parameters Θ about which we have some prior information, which can be expressed in terms of a prior distribution $P(\Theta)$ - this could be the outcome of some previous experiments, some knowledge that the investigator has, or just the application of certain heuristics. Given the data and the prior, we can apply Bayes rule to get the posterior distribution for Θ - our estimates will be the expectation or mode or medians of this distribution:

## The basic idea
As you presumably know, a Markov chain is a sequence of random variables X1,X2,... with shared support S, a starting probability distribution p1 for X1 and a transition kernel p(x∣y)=p(x∣Xi−1=y) i.e. the conditional density (or mass function) of Xi∣Xi−1=y. Notably, the transition kernel depends only on the value of the previous element in the sequence; the history of the path before that does not matter.

Generally speaking, the unconditional distribution pj of Xj will not be the same as pi of Xi; however, given some technical conditions, there is a choice of p1 - we'll call it π that causes the Markov chain to be stationary; that is, every random variable has the same distribution π.

Moreover, regardless of the choice of p1,limt→∞pi=π.

So the idea for MCMC is that we are going to choose a transition kernel on our support in a clever manner, such that the limiting/stationary distribution is f(x1,...,xn). Then, regardless of how we X1, Xj for j large will be have approximately distribution f(x1,...,xn), and our sequence will be a (correlated sample) from our target distribution.

In [6]:
import numpy as np

# Cholesky decomposition L
L = np.array([[1, 0], 
              [1, 1]])

# Calculating the covariance matrix Sigma
Sigma = np.dot(L, L.T)

# Extracting the variances and covariance
var_X = Sigma[0, 0]  # Variance of X
var_Y = Sigma[1, 1]  # Variance of Y
cov_XY = Sigma[0, 1]  # Covariance between X and Y

# Calculating the correlation
correlation_XY = cov_XY / np.sqrt(var_X * var_Y)
correlation_XY

0.7071067811865475

In [11]:
Sigma

array([[1, 1],
       [1, 2]])

In [10]:
1/np.sqrt(2)

0.7071067811865475