# Sampling from Distributions

Let us start by recapping where we are:  We have met both generic and some particular *named* distributions that model situations, and in the case of *named* distributions that have sufficient structure or symmetry that we can compute with them.

A couple of problems then confront us:  In reality we rarely know the distribution or we may know the *type* of distribution but not the exact parameters for a random variable, and our only way to gather information about the distribution is to sample from it. 

An analogy:  We are trying to identify a movie. Maybe we know it is a spy thriller. But otherwise all we have are some still images from it. How many images would we need to identify the movie? 

We will take class to sample from our *named* distributions and see what happens. I'll give one example here to illustrate the method:  Consider a shopping center where on average 2 customers arrive at the chashiers every minute. The number of customers $Y$ that will arrive at the chasiers in a minute is then a random variable given by the Poisson distribution:  

$$ P(Y=x) = \frac{2^r}{r!} e^{-2} $$

Suppose we measure the number of customers arriving each minute for 5 minutes:

In [4]:
sample = rpois(5, 2)
sample

Some questions:

- What is the mean number of customers in our sample?
- What do we get as we increase the size of the sample?


In [13]:
sample = rpois(200, 2)
mean(sample)

Suppose we were to treat the mean of the sample $\bar{Y}$ as a random variable itself. Fix the size of the sample and see what we get if we run the experiment many times, what shape do you see in the histogram for values of $\bar{Y}$ and what happens to that shape as the size of the sample is then increased?

## Sampling

Run the experiment above with a variety of our named distributions including the normal distribution. What is happening?  Can you develop a hypothesis?

## Multivariate Distributions

In an effort to make our hypothesis we have develop precise, we need to build up some langauge for modeling this situation. 

A *multivariate distribution* is defined to be a distribution on multiple random variables describing the likliehood that values of them will come together. Explicitly given a set of discrete random variables $Y_1, Y_2, \dots, Y_n$ we can define their joint probability distribution by:  $P(Y_1 = x_1, Y_2 = x_2, \dots, Y_n = x_n) $ to be the probability that each of the $Y$s has the given value.

In the continuous case, we need to proceed as we did before:  We define the joint cummulative distribution by:

$$ F(x_1, x_2, \dots, x_n) = P( Y_1 \leq x_1, Y_2 \leq x_2, \dots, Y_n \leq x_n) $$

and then the joint probability density function is given by:

$$ f(x_1, x_2, \dots, x_n) = \frac{\partial^n F}{\partial x_1 \partial x_2 \dots \partial x_n$$

Again do not fall for the trap of thinking about the density as the distribution:  the values of $f$ do not give the likliehood of a particular outcome.

### Marginal Probability

Given two discrete random variables $Y_1$ and $Y_2$: we define the marginal probability of $Y_1$ to be the total likliehood that $Y_1$ occurs.

$$ P(Y_1 = x_1) = \sum_{x_2} P(Y_1 = x_1, Y_2 = x_2) $$ 

In the continuous case the marginal PDF for $Y_1$ is found by:

$$ f_1(x_1) = \int f(x_1, x_2) dx_2 $$

### Conditional Probability

Consider two discrete random variables: We define the conditional probability $P(Y_1=x_1 | Y_2 = x_2)$, the probability that $Y_1 = x_1$ given that we have observed that $Y_2 = x_2$. This can be compute by recognizing that the likliehood of $Y_1$ occuring given a value for $Y_2$ is the likliehood that both values occured randomly divided by the marginal probablity of $Y_1$:

$$ P(Y_1 = x_1 | Y_2 = x_2) = \frac{P(Y_1 =x_1, Y_2 = x_2)}{P(Y_2 = x_2)}$$

In other words the conditional probability given $Y_2=x_2$ is the proportion of the time that $Y_1=x_1$ occured out of all cases where $Y_2 = x_2$ occured.

In the continuous case, the conditional PDF is found by:

$$ f(x_1 | x_2) = \frac{f(x_1, x_2)}{ f_2(x_2) } $$

### Idependent Random Variables

Consider the case of two discrete random varialbes with distribution $P(Y_1 = x_1, Y_2 = x_2)$. The two variables are independent if the likliehood of them both occuring is just the product of each one indvididually occuring (the marginal probabilities of each):  

$$ P(Y_1 = x_1, Y_2 = x_2) = P(Y_1 = x_1) P(Y_2 = x_2) $$

equivalently what we are saying is that the variables are independent if conditioning by one of them just gives the marginal likliehood:

$$ P(Y_1 = x_1 | Y_2 = x_2) = P(Y_1 = x_1) $$

In the continuous case, indepdent random variables have a joint PDF that is just a product of marginal PDFs:

$$ f(x_1, x_2) = f_1(x_1) f_2(x_2) $$

or in terms of the conditional PDF we will have:

$$ f(x_1 | x_2) = f_1(x_1) $$

Again the idea is that if the variables are independent, knowing the value we got for one of them should not change our probability density of the other one. 

#### Example

Find the constant $c$ such that 

$$ f(x_1, x_2) = \left\{ \begin{matrix} C x_1 x_2 & 0 \leq x_1, x_2 \leq 1 \\ 0 & \mbox{otherwise} \end{matrix} \right. $$

is a valid PDF. Are $x_1$ and $x_2$ indepdent?

#### Example 2

Find the constant $c$ such that 

$$ f(x_1, x_2) = \left\{ \begin{matrix} C x_1 x_2 & 0 \leq x_1 < x_2 \leq 1 \\ 0 & \mbox{otherwise} \end{matrix} \right. $$

is a valid PDF. Are $x_1$ and $x_2$ indepdent?

Reminder:  You can use wolfram alpha to compute the integral, but you may need to think about how to set it up correctly. You could determine independence/dependence without actually doing the integral if you appeal to the meaning of the definitions above.

## Correlation and Covariance

Given a pair of jointly distributed continuous random variables $Y_1$ and $Y_2$ one measure of their dependence is to ask how far they jointly move from the means. Let 

$$ \mu_1 = E(Y_1) \qquad \mbox{and} \qquad \mu_2 = E(Y_2) $$ 

we have the variances

$$ \sigma_1^2 = E( (Y_1 - \mu_1)^2) \qquad \mbox{and} \qquad \sigma_2^2 = E( (Y_2 - \mu_2)^2 ) $$ 

which again measure the extent to which $Y_1$ (resp. $Y_2$) is likely to be far from its mean.

However, it is also interesting to ask how far $Y_1$ will stray from its mean while $Y_2$ is simmultaneously measured from its mean. We defin the covariance of $Y_1$ and $Y_2$ to be

$$ \mbox{Cov}(Y_1, Y_2) = E( (Y_1 - \mu_1) (Y_2 - \mu_2) ) $$

Note that if $Y_1$ and $Y_2$ are independent then 

$$ \mbox{Cov}(Y_1, Y_2) = E( (Y_1 -\mu_1) ) E( (Y_2 - \mu_2) ) = 0 \cdot 0 = 0 $$

The units of $\mbox{Cov}(Y_1, Y_2)$ are the products of the two units. Which means that this is not an absolute measure of the correlations between two varaibles; for example if one of the variables has a large variance it will mean a larger covariance even if the correlation is weak. We can get something that is unitless by dividing this by the square root of the product of variances. The *coefficient of correlation* is iven by

$$ \rho = \frac{\mbox{Cov}(Y_1, Y_2)}{\sqrt{ V(Y_1) V(Y_2) }}  $$

### Algebra

Some algebra of the expected values gives:

$$ \mbox{Cov}(Y_1, Y_2) = E( Y_1 Y_2) - E(Y_1) E(Y_2) $$

and so one way to think about covariance is that it is measuring how far the expected value of the product varies from what it would be if the variables were not correlated.


### Example

Be careful no correlation does not imply indepdence. Consider the joint density:

$$ f(x_1, x_2) = \left\{ \begin{matrix} C & -1 < x_1 < 0; \qquad 0 < x_2 < 1 + x_1 \\ C & 0 < x_1 < 1; \qquad 0 < x_2 < 1-x_1 \\ 0 & \mbox{otherwise} \end{matrix} \right. $$

Show that $\mbox{Cov}(Y_1, Y_2) = 0$ but that $Y_1$ and $Y_2$ are dependent. 

### Linear Combinations of Independent Random Variables

Suppose that $Y_1, \dots, Y_n$ are independent random variables with $E(Y_i) = \mu_i$ and $V(Y_i) = \sigma_i$. Let $U = \sum a_i Y_i $ and $V = \sum b_i Y_i$ for some coefficients $a_i$ and $b_i$. What can you say about 

- $ E(U)$ and $E(V)$?
- $ V(U)$ and $V(V)$?
- $\mbox{Cov}(U, V)$?

- when will $U$ and $V$ have no correlation?
- when will $U$ and $V$ be independent?
- when will $U$ and $V$ have no correlation but be dependent?

## Random Sampling

This then brings us back to our sampling question. We can think of a random sample from a distribution, $Y_1, Y_2, \dots, Y_n$ as a multivariate distribution of independent identically distributed random variables. 

If the $Y_i$ are discrete with distribution $P(Y = x) = p(x)$ then:

$$ P(Y_1 = x_1, Y_2 = x_2, \dots, Y_n = x_n) = p(x_1) p(x_2) \dots p(x_n) $$ 

If the $Y_i$ are continuous with PDF $f(x)$ then:

$$ f(x_1, x_2, \dots, x_n) = f(x_1) f(x_2) \dots f(x_n)$$

