# Joint Probability Distributions

## Linear Combination
Let $Y$ be a joint random variable. By definition It follows that $Y$ is the linear combination of a set of random variables $X=\{X_i\}$. There are a number of ways (and notations) we can represent the linear combination. In the  simplest form:

$$Y = X_1 + X_2 + \cdots + X_n $$

Adding come complexity, we may attach a coefficient to each term to add complexity to the polynomial expression:

$$Y = c_1 X_1 + c_2 X_2 + \cdots + c_n X_n $$

Summation notation or matrix notation has the most susinct way or representing this structure:

$$ Y = \sum c_iX_i $$

$$ Y = cX $$

And going a step further for completeness:

$$ Y = cX + d $$

Where $d$ is an arbitrary constant.

With this formula for $Y$ in terms of $X$ we can make statements about the probability and thus expectations related to the random variable Y. 

To do so, we must find the joint density function that expressed the probability of the realization of the intersection of a set of exents.

$$ \mathcal{P}\left( Y = Y_i \right) = \mathcal{P}(\cap Y_i) $$

$$ = \mathcal{P}\left( Y = \begin{bmatrix}y_{i,1}, \cdots y_{i,n}\end{bmatrix} \right) $$

$$ = \mathcal{P}(y_{n,1} \cap \cdots \cap y_{i,n}) $$

Depending on how much we know about the set of variables $\{X_i\}$ these statements will range in specificity. Generally, as $Y$ is dependent on $X$ the process of deriving the density function to make these statements centers around exploiting known relationships to state thedensity function in terms of $X$.

## Independent and Identically Distributed
This situation is typically where probability studies start as it is the most simple

Assume $Y$ is a multivariate normal variable consisting of identically distributed independent random variables $X$.

$$ X_i \perp X_j \ \ \forall i,j$$
$$ X_i \sim \mathcal{N}(\,u, \Sigma)$$



Using the definition of joint probability

$$ \mathbb{P}(A \cap B) = \mathbb{P}(A|B) \mathbb{P}(B)$$

Given that $X_i \perp X_j$ we thus have have 

$$ \mathbb{P}(X_i \cap X_j) = \mathbb{P}(X_i|X_j) \mathbb{P}(X_j)$$
$$ = \mathbb{P}(X_i) \mathbb{P}(X_j)$$

We can thus derive a joint distribution density function as
$$ f(X_i, X_j) = f_{X_i}(X_i) f_{X_j}(X_j)$$

$$ f(X) = \prod f_{x_i}(X_i) $$

The general definition of the univariate normal density function is stated as follows:

$$ f(x) = \frac{1}{\sigma\sqrt{2\pi}}
e^{-\frac{1}{2}\left( \frac{x - \mu}{\sigma} \right)^2}
$$

And thus the product of this expression becomes

$$ \prod f_{x_i}(X_i) = \prod \frac{1}{\sigma_{X_i}\sqrt{2\pi}}
e^{-\frac{1}{2}\left( \frac{X_i - \mu_{X_i}}{\sigma_{X_i}} \right)^2}$$

We can use the properties of exponents to change a product into a sum

$$ = e^{ 
-\frac{1}{2} \left( \frac{X_1 - \mu_{X_1}}{\sigma_{X_1}} \right)
- \cdots
-\frac{1}{2} \left( \frac{X_n - \mu_{X_n}}{\sigma_{X_n}} \right)
}
\prod \frac{1}{\sigma_{X_i}\sqrt{2\pi}}
$$

Rewriting things in matrix algebta we then have:

$$e^{(X - \mu)^T\Sigma^{-1}(X - \mu)} \prod \frac{1}{\sigma_{X_i}\sqrt{2\pi}}
$$

Now we can distribute the product using a few tricks. Fist we can raise the constant to a power rather than using a product operator:

$$e^{(X - \mu)^T\Sigma^{-1}(X - \mu)} \frac{1}{\sqrt[n]{2\pi}}\prod \frac{1}{\sigma_{X_i}}
$$

Next we can replace the product of the sigma terms by the determiniant of the square root of the covariance matrix. Because the variables are independent, the covariance matrix is orthoganal and thus equivalent to the product. For more see the notebook on determinants.

$$e^{(X - \mu)^T\Sigma^{-1}(X - \mu)} 
\frac{1}{\sqrt[n]{2\pi}}
\frac{1}{|\sqrt{\Sigma}|}
$$

Rearanging the equation into its classical form we have

$$\frac{1}{\sqrt{(2\pi)^n|\Sigma|}}
e^{(X - \mu)^T\Sigma^{-1}(X - \mu)} 
$$

But this simple proof is not the most useful, we will need to derive a more general case.

## Non-independant and Identically Distributed

We can represent a multivariate random variable $Y$ as a linar combination of independent random variables $X_i$.

$$ Y = cX + d $$

$$ = c_1 X_1 + \cdots + c_nX_n + d $$

$$ = Y_1 + \cdots Y_n + d $$

In other words the transformation $c$ is the covariance matrix between the $X$ terms and $d$ is an arbitrary set of constant terms.

The definition of a joint probability distribution gives us that:

$$ F_Y(y_i) = \int \cdots \int f_Y(y_i) \ \ \partial_{Y_{i,1}} \cdots \partial_{Y_{i,n}}$$

Thinking about this, in the univariate case, cdf's are expressed as an integral over the domain of the pdf. 

In the bivariate case we have integral of an integral. This is because our pdf takes two variables and returns a point coresponding to that two dimensional domain. As such we need to sum accross two dimensions which produces the double integral (integral of integrals).

Extending this to the multi-dimensional scenario we start to see a stacking of integrals (integral of integral of ... etc). One integral for each dimension.

Given that $ Y = cX + d $ we can do a u-substitution (ie. a change of variable) in the equation. We can restate the equation in terms of $X$ rather than $Y$.

have $ F_Y(Y_i) =  F_Y(cX_i + d) $. As such we can introduce the expression containing $X$ into the equation by doing a u-substitution. In doing so, we are performing a change of variable. As such we must also introduce the determinant of the inverse jacobian.

$$ = \int \cdots \int f_Y(cX_i + d) \ \ 
\partial_{Y_{i,1}} \cdots \partial_{Y_{i,n}}$$

$$ = \int \cdots \int f_Y(c_1 X_1 + c_2 X_2 + \cdots + c_n X_n + d) \ \
|\mathbb{J}_{f_Y}|$$

$$ = \int \cdots \int \mathbb{P}(c_1 X_1 \cap c_2 X_2 \cap \cdots \cap c_n X_n) \ \ 
|\mathbb{J}_{f_Y}|$$

And thus the joint distribution can be stated as:

$$ = \int \cdots \int 
f_{X_1}(c_1 X_1) + f_{X_2}(c_2 X_2) + \cdots + f_{X_n}(c_n X_n + d)
|\mathbb{J}^{-1}_{F_Y}| \ \ 
\ \ \partial_{Y_{i,1}} \cdots \partial_{Y_{i,n}}
$$

$$ = 
\int 
\left[
\cdots
+
\int 
\left[
f_{X_1}(c_1 X_1) \ \ \partial_{Y_{i,1}}
\right]
\cdots
+
\int 
\left[
f_{X_2}(c_2 X_2) \ \ \partial_{Y_{i,2}}
\right]
\cdots
\right]
|\mathbb{J}^{-1}_{F_Y}| \ \ 
 \ \ \partial_{Y_{i,n}}
$$

$$ = 
(|\mathbb{J}^{-1}_{F_Y}|)^n \ \ 
\int 
\left[
\cdots
+
\int 
\left[
f_{X_1}(c_1 X_1) \ \ \partial_{Y_{i,1}}
\right]
\cdots
+
\int 
\left[
f_{X_2}(c_2 X_2) \ \ \partial_{Y_{i,2}}
\right]
\cdots
\right]
 \ \ \partial_{Y_{i,n}}
$$

$$ = 
|\mathbb{J}^{-n}_{F_Y}| \ \ 
\int 
\left[
\cdots
+
\int 
\left[
f_{X_1}(c_1 X_1) \ \ \partial_{Y_{i,1}}
\right]
\cdots
+
\int 
\left[
f_{X_2}(c_2 X_2) \ \ \partial_{Y_{i,2}}
\right]
\cdots
\right]
 \ \ \partial_{Y_{i,n}}
$$

Because all the $X_i$ are identically distributed we have

$$ = 
|\mathbb{J}^{-n}_{F_Y}| \ \ 
\int 
\left[
\cdots 
\int 
\left[
c_1 f_{X}(X_1) \ \ \partial_{Y_{i,1}}
\right]
\cdots
\right]
 \ \ \partial_{Y_{i,n}}
$$

If we take the multi-derivative of this expression we cancel out the integrals and are left with the marginal density function of each jointly random variable in the set:

$$ f_Y(Y_i)= 
c_i |\mathbb{J}^{-n}_{F_Y}| \ \ 
f_{X}(X)
$$

The general definition of the univariate normal density function is stated as follows:

$$ f(x) = \frac{1}{\sigma\sqrt{2\pi}}
e^{-\frac{1}{2}\left( \frac{x - \mu}{\sigma} \right)^2}
$$

We can restate this equation in terms of one of the random variables $X_1$. In doing so we will need to convert into matrix algebra. To do this here are two major changes we need to make. 

The first is the quadratic in the exponential. We will restructure $\left( \frac{x - \mu}{\sigma} \right)^2$ to

$$ \left(\frac{X_i - \mu_{X_i}}{\sigma_{X_i}} \right)^2$$

$$ = \frac{(X_i - \mu_{X_i})^T(X_i - \mu_{X_i})}{\sigma_{X_i}\sigma_{X_i}^T} $$

$$ = \frac{(X_i - \mu_{X_i})^T(X_i - \mu_{X_i})}{\Sigma_{X_i}}$$

$$ = (X_i - \mu_{X_i})^T\Sigma_{X_i}^{-1}(X_i - \mu_{X_i})$$


The second change is in the denominator of the constant being multiplied by the exponential function (ie. $\sigma$). We need to restate this value in a way that is consistent with matrix algebra but functionally more eligant for our final equation. We will use $|\Sigma^\frac{1}{2}|$ and thus the equation becomes:


$$ f_{X_i}(X_i) = \frac{1}{|\Sigma^{-\frac{1}{2}}|\sqrt{2\pi}}
e^{-\frac{1}{2} \left( X_i - \mu_{X_i} \right) \Sigma^{-1}_{X_i} \left( X_i - \mu_{X_i}\right)}
$$

$$ = \frac{1}{\sqrt{2\pi|\Sigma|}}
e^{-\frac{1}{2} \left( X_i - \mu_{X_i} \right) \Sigma^{-1}_{X_i} \left( X_i - \mu_{X_i}\right)}
$$

Plugging all this back in

Plugging this into the equation we then have:

$$ = 
|\mathbb{J}^{-1}_{F_Y}| \ \ 
\int 
\left[
\cdots 
\int 
\left[
    \frac{1}{\sqrt{2\pi}}
    \Sigma^{-1}_{X_i}
    e^{-\frac{1}{2} \left( x - \mu \right) \Sigma^{-1}_{X_i} \left( x - \mu \right)}
\ \ \partial_{Y_{i,1}}
\right]
\cdots
\right]
 \ \ \partial_{Y_{i,n}}
$$

Removing the complexities of the double integral by differentiating:

$$ \frac{\partial}{\partial X}F_Y = f_Y$$

$$ = 
|\mathbb{J}^{-1}_{F_Y}| \ \ 
\frac{1}{\sqrt{2\pi}}
\Sigma^{-1}_{X_i}
e^{-\frac{1}{2} \left( x - \mu \right) \Sigma^{-1}_{X_i} \left( x - \mu \right)}
$$

As the jacobian of $F$ is defined as $\mathbb{J}_F = \frac{\partial Y}{\partial X}$, the inverse jacobian $\mathbb{J}^{-1}$ is equal to $\frac{\partial X}{\partial Y}$. If we look at these terms, the $\partial X$ can be thought of as $X - \mu_X$ and therefore the Jacobian can be represented by the standard deviation. Thus $\mathbb{J}^{-1}_{F_Y} = \Sigma_X^{\frac{1}{2}}$

$$ = 
\Sigma_X^{\frac{1}{2}} \ \ 
\frac{1}{\sqrt{2\pi}}
\Sigma^{-1}_{X_i}
e^{-\frac{1}{2} \left( x - \mu \right) \Sigma^{-1}_{X_i} \left( x - \mu \right)}
$$

$$ = 
\frac{1}{\sqrt{2\pi}}
\Sigma^{-\frac{1}{2}}_{X_i}
e^{-\frac{1}{2} \left( x - \mu \right) \Sigma^{-1}_{X_i} \left( x - \mu \right)}
$$

$$ = 
\frac{1}{\sqrt{2\pi \Sigma_{X_i}}}
e^{-\frac{1}{2} \left( x - \mu \right) \Sigma^{-1}_{X_i} \left( x - \mu \right)}
$$

$$ M_X(c) = \mathbb{E}\left[ e^{cX} \right] $$

$$ = \mathbb{E}\left[ e^{ c(X_1 + \cdots + X_n)} \right] $$

$$ = \mathbb{E}\left[ e^{cX_1 + \cdots + c X_n} \right] $$

Becuase of independence

$$ = \mathbb{E}\left[ \prod e^{cX_i} \right] $$

$$ = \prod \mathbb{E}\left[ e^{cX_i} \right] $$

$$ = \prod M_{X_i}(c) $$


Focusing on the moment generating function for an individual variable

$$ e^{cX_i} = \sum \frac{(cX_i)^j}{j!}  = \left[ 1 + cX_i + \frac{c^2 X^2_i}{2!} + \frac{c^3 X^3_i}{3!} + \cdots \right]$$

$$ M_{X_i}(c) = \mathbb{E}\left[ e^{cX_i} \right] $$



$$ = \left[ 1 + c\mathbb{E}[X] + \frac{c^2 \mathbb{E}[X^2]}{2!} + \frac{c^3 \mathbb{E}[X^3]}{3!} + \cdots \right]$$

Now to derive the pdf

$$ M_{X_i}(c) = \mathbb{E}\left[ e^{cX_i} \right] $$


$$ M_{X_i}(c) = \int e^{cX_i} \frac{1}{\sigma_{X_i} \sqrt{2\pi}}
e^{-\frac{1}{2}\left( \frac{X_i - \mu_{X_i}}{\sigma_{X_i}} \right)^2}  \ dx
$$

$$ = \frac{1}{\sqrt{2\pi}} e^c \int e^{X_i} \frac{1}{\sigma_{X_i}}
e^{-\frac{1}{2}\left( \frac{X_i - \mu_{X_i}}{\sigma_{X_i}} \right)^2}  \ dx
$$

Now we do a u-substitution. 

Let $u = X_i - \mu_{X_i}$

$$ \frac{du}{dx} =  1\Rightarrow du = dx$$

$$ = \frac{1}{\sqrt{2\pi}} e^c \int e^{X_i} \frac{1}{\sigma_{X_i}}
e^{-\frac{1}{2}\left( \frac{u}{\sigma_{X_i}} \right)^2}  \ du
$$