Conditional Distribution
=============

#### Definition

Let $X$ and $Y$ be two **discrete** random variables, with probability mass functions: $P_X$ and $P_Y$. Then, the conditional probability mass function of $Y$ given $X$ is the following:

$$P_{Y|X} (y|x) = \mathbf{P}[Y = y | X = x] = \frac{\mathbf{P}[Y=y \text{ and }X = x]}{\mathbf{P}[X=x]} = \frac{P_{X,Y}(x,y)}{P_X(x)}$$

where $P_{X,Y}(x,y)$ is the joint-probability mass function of $X,Y$.

#### Definition

Assume $X\&Y$ are **continuous** random variables with $f_X=\text{marginal density of }X$ and $f_{X,Y} = \text{joint density of }(X,Y)$, then, the conditional density of $Y$ given $X$ is 

$$f_{Y|X}(y|x) = \frac{f_{X,Y}(x,y)}{f_X(x)}$$

###### Intuition: why does it make sense?

We re-interpret the left-hand side as follows. $\displaystyle \frac{\mathbf{P}[Y\in dy \cap X\in dx]/(dx~dy)}{\mathbf{P}[X \in dx]/dx} = \frac{f_{X,Y}(x,y)~dx~dy/(dx~dy)}{f_X(x) dx / dx}$

**Note:** It is easy to find the joint density if we know the conditional density and marginal density. 

Indeed: $f_{X,Y}(x,y) = f_X(x) f_{Y|X}(y|x)$


_______________

_______________

For now, let's concentrate on the discrete case.

##### Example:

$X_1 \sim Poi(\lambda_1)$ and $X_2 \sim Poi(\lambda_2)$ and $X_1,X_2$ are independent.

Let $T=X_1+X_2$. We have $T \sim Poi(\lambda_1 + \lambda_2)$

Find the conditional distribution of $X_1$ given $T$.



**This question applies to any scenario where we observe the total number of events but we don't know exactly how many events fall into each of several categories.**

We compute the $\displaystyle P_{X_1|T}(x_1 | t) = \mathbf{P}[X_1 = x_1 | T = t_1]$

$$= \frac{P_{X_1|T}(x_1|t)}{P_T(t)}$$

  * The denominator is the Poisson with parmaeter $\lambda_1 + \lambda_2$
  

$$=\frac{\mathbf{P}[X_1 = x_1 \& X_1+X_2 = t]}{\displaystyle e^{-(\lambda_1 + \lambda_2)} \frac{(\lambda_1 + \lambda_2)^t}{t!}}$$

A powerfull little thing is that we can replace $X_1 + X_2 = t$ with $X_2 = t - x_1$

$$=\frac{\mathbf{P}[X_1 = x_1 \text{ AND } X_2 = t - x_1]}{\displaystyle e^{-(\lambda_1 + \lambda_2)} \frac{(\lambda_1 + \lambda_2)^t}{t!}}$$

And remember that $X_1$ and $X_2$ are independent, so we write:

$$=\frac{\mathbf{P}[X_1 = x_1\ \times  \mathbf{P}[X_2 = t - x_1]}{\displaystyle e^{-(\lambda_1 + \lambda_2)} \frac{(\lambda_1 + \lambda_2)^t}{t!}}$$

And we know the probability mass functions of each of $X_1$ and $X_2$, which is Poisson:

$$=\frac{ \displaystyle  e^{-\lambda_1} \frac{\lambda_1^{x_1}}{x_1!} \times e^{-\lambda_2} \frac{\lambda_2^{t-x_1}}{(t-x_1)!}}{\displaystyle e^{-(\lambda_1 + \lambda_2)} \frac{(\lambda_1 + \lambda_2)^t}{t!}}$$

$$=\frac{t!}{x_1! (t-x_1)!} \times \frac{\lambda_1^{x_1} \lambda_2^{t-x_1}}{(\lambda_1+\lambda_2)^t}$$


The last results looks pretty much similar to **binomial** distribution. Let's simplify it further

$$\left(\begin{array}{c}t\\x_1\end{array}\right) \times \left(\begin{array}{c}\frac{\lambda_1}{\lambda_1 + \lambda_2}\end{array}\right)^x_1 \times \left(\begin{array}{c}\frac{\lambda_2}{\lambda_1 + \lambda_2}\end{array}\right)^{t-x_1}$$

We can see that $p=\displaystyle \left(\begin{array}{c}\frac{\lambda_1}{\lambda_1 + \lambda_2}\end{array}\right)$ and $1-p=\left(\begin{array}{c}\frac{\lambda_2}{\lambda_1 + \lambda_2}\end{array}\right)$

We proved that $X_1 \text{ given } T=t ~~ \sim Binomial(n=t, p=\frac{\lambda_1}{\lambda_1 + \lambda_2})$

##### Example:

Similar to the previous one (opposite in some sence): **mixture distributions**

Assume $T\sim Poi(\lambda)$. And that $\text{given }T=t, \ \ X\sim Binom(t, \theta)$

Question: What is the (non-conditional) distribution of $X$ by itself?

Answer: $X\sim Poi(p\lambda)$

The reason that we say $X$ has mixture distribution is because one of the parameters in its (conditional) distribution is a random variable. Symbollically, we can write: $$X\sim Binom(T, p)\text{ where }T\sim Poi(\lambda)$$

### Generic Other Way of Using Conditional Distribution: Discrete Markov Chain


 * Construction: A Markov chain is a stochastic process $\{X(t): ~ t\in N\}$
We want to define the joint distribution of all the $X(t)$'s for all $t$'s simultaneously. 

 * Prescription: Given conditional distribution of $X(t+1)$ given $X(t)$, we also insist that this conditional distribution is the same as $X(t+1)$ given all $X(s)$'s for $s\le t$.
 
 * Notation: the $X(t)$'s all take values in the ***"state space"*** $I=\{x_1,x_2,...x_m\}$
 
 We only need to prescribe these values:
 
 $$\mathbf{P}[X(t+1) = x_j | X(t)=x_i] = P_{ij}$$
 
These $P_{ij}$ are called ***transition probabilities***. These are arranged in a matrix $\mathbf{P}$ which is the ***transition matrix***.

##### Example: Random Walk

Let $Y_0,Y_1,Y_2,...,Y_n,...$ be $ii$, with $\displaystyle Y_i = \left\{\begin{array}{lrr}1 & \text{with probability} & p\\-1 & \text{with probability} & 1-p\end{array}\right.$

Define $X(0)=0$ and $\forall t\ge 0, X(t+1) = X(t) + Y_{t}$. Therefore, we notice that $X(t) = Y_0+Y_1 + Y_2 + ... +Y_{t-1}$

From this definition, we see that $X(t)$ is a Markov chain.