In [1]:
import pandas as pd
import numpy as np

# Conditional Expecations

There some neat tricks we can play with conditional probabilities and expectations that are worth discussing here. 

Given two random variables $Y_1$ and $Y_2$, we define *Conditional Expectation* of $g(Y_1)$ given $Y_2=y_2$ to be 

$$ E(g(Y_1) | Y_2=y_2) = \int g(y_1) f(y_1 | y_2 ) dy_1 $$

Where $f(y_1 | y_2) $ is the conditional PDF. 

## Expected Values 

Then note if we take the expected value of the conditional expected value we get back to just the expected value in $Y_1$:

$$ E( E(Y_1 | Y_2) ) = E( Y_1 )$$

## Variances

More interestingly if we consider variances:

$$ V(Y_1) = E[ V(Y_1 | Y_2) ] + V[ E( Y_1 | Y_2 ) ] $$

where $$ V( Y_1 | Y_2) = E( Y_1^2 | Y_2) - [ E(Y_1 | Y_2 ) ]^2$$



### Example

It is maybe not clear why these conditional expectations and the related theorems are useful. They come up frequently in models where the parameters of the random distribution are themselves unkown. Consider:  

The viral load of person with COVID-19 is $Y$ in virus particulates per mg of saliva, and fits an exponential distribution with the $\beta$ parameter a uniformly distributed random variable between $(0, 200)$. I.e. the evidence is that the $\beta$ parameter itself changes from infection to infection.

For a given $\beta$ the expected value and variance of $Y$ are known: $$ E(Y | \beta) = \beta $$ and $$ V(Y | \beta) = \beta^2 $$

Find the $E(Y)$ and $V(Y)$.

# Multinomial Distirbutions

Now that we have the language of covariance, it is a good place to discuss the multinomial distribution, a generalization of the binomial distribution.  Consider the following example:

### Example

The population of Weld County has the following distribution by ethnicity/race according to the 2010 Census (with one adjustment to get the total to be 1.000).  Note demography is hard as race and ethnicity are distinct items and in particular people can be Hispanic of any race:

In [9]:
weld = pd.DataFrame( [ ['White, not Hispanic', 0.649], ['Hispanic or Latino', 0.257], ['Black or African American', 0.016], 
                     ['American Indian or Alaskan Native', 0.017], 
                     ['Asian', 0.018], 
                     ['Native Hawaiian or Pacific Islander', 0.02], ['Two or more races', 0.023] ] )
weld.loc[7, 0] = 'Total'
weld.loc[7, 1] = weld.loc[:, 1].sum()    
                    
weld

Unnamed: 0,0,1
0,"White, not Hispanic",0.649
1,Hispanic or Latino,0.257
2,Black or African American,0.016
3,American Indian or Alaskan Native,0.017
4,Asian,0.018
5,Native Hawaiian or Pacific Islander,0.02
6,Two or more races,0.023
7,Total,1.0


Juries are made up of 12 people. Suppose we select 12 people randomly from the county, what is the distribution of the demographics of the jury?

If the jury pool assembled for a major case has 100 people in it, how likely is it that there will be 2 or fewer people in each of the five smallest race categories?

### With or Without Replacement

With replacement means we are going to ignore the fact that each choice we make has an effect on the probabilities for the remaining people. This is a valid assumption if the number we are choosing in our group is much less than the total population.

The total population of Weld County is 324,429. The City of Greeley has a population of 105,888 and so if we are asking about the demographics of Greeley, rather than of juries of 12 or 100 people, we would need to start thinking about sampling without replacement as this number is now a big enough proportion to affect the probabilities involved.

## Definition of a Multinomial Distribution

A multinomial distribution is composed of n trials, where each individual trial has k possible outcomes with probabilities $p_1, p_2, \dots, p_k$.  Note that $\sum p_i = 1$. The random variables are then $Y_i$ the number of times in the n trials outcome i occured. 

Note that $\sum_{i=1}^k Y_i = n $ the total number of trials or otherwise the probability is zero.

The distribution when it is non-zero is given by:

$$ p(y_1, y_2, \dots, y_k) = \frac{n!}{y_1! y_2! \dots y_k!} p_1^{y_1} p_2^{y_2} \dots p_k^{y_k} $$

I remember this by noting that when $k=2$ this gives us the binomial distribution.

So note the pertinent idea and why we did not introduce this distribution earlier in the course. The $Y_i$ form a multivariate distribution almost certainly with some dependence. 

## 1. Expected Value of the $Y_i$

Show that $E( Y_i) = n p_i $


## 2. Variance of the $Y_i$ 

Show that $ V(Y_i) = n p_i (1- p_i) $



## 3. Covariance of $Y_s$ and $Y_t$

We will show that if $s\neq t$ then $\mbox{Cov}(Y_s, Y_t) = - n p_s p_t $ 

Note that the negative covariance makes sense - the larger $Y_s$ is the smaller the other variables will have to be. 

For the proof, I note that I wrote this up ahead of time to try and get it right.

The trick here is to define some new random variables. Let:

$$ U_i = \left\{ \begin{matrix} 1 & \mbox{if the ith trial results in outcome s} \\ 0 & \mbox{otherwise} \end{matrix} \right. $$

$$ W_i = \left\{ \begin{matrix} 1 & \mbox{if the ith trial results in outcome t} \\ 0 & \mbox{otherwise} \end{matrix} \right. $$

*This may look a little strange*, but it is actually a fairly common trick. $U_i$ and $W_i$ are discrete analogues of $\delta$ functions that are zero everywhere except for one place and they are used here in a similar way to how $\delta$ functions appear in results about integral transforms.

We then note that 

$$ Y_s = \sum_{i=1}^n U_i $$ and $$ Y_t = \sum_{j=1}^n W_j$$

We then need a series of results about these variables:

1. The $U_i$ are all independent and the $W_i$ are all independent.

2. $U_i$ and $W_i$ cannot both be 1 as trial i can only be one of outcome s or t and not both. *It could be neither* in which case both $U_i$ and $W_i$ are 0.

3. Result 2. does imply the that $U_i$ and $W_i$ are dependent. **Why?**

4. Because the product of $U_i W_i = 0$ (see 2.) we have that $E( U_i W_i) = 0 $ for each i.

5. $E(U_i)$ is the likeliehood that result i is outcome $s$ and so is $p_s$

6. Likewise $E(W_i) = p_t$.

7. $ \mbox{Cov}(U_i, W_j) = 0 $ if $i\neq j$ because the trials are independent.

8. $\mbox{Cov}(U_i, W_i) = E( U_i W_i) - E(U_i) E(W_i) = 0 - p_s p_t $ 



Putting this all together then, using our results from 2-18 we have that:

$$ \mbox{Cov}( Y_s, Y_t) = \sum_{i, j} \mbox{Cov}(U_i, W_j) $$
$$ = \sum_{i=1}^n \mbox{Cov}(U_i, W_i) $$
$$ = \sum_{i=1}^n (-p_s p_t) = - n p_s p_t $$

# Example

Consider our jury pool of 100 people. Our question is do we have evidence that this pool is randomly selected from the county or not?

1. In a group of 100 randomly selected people from Weld County, what is the expected number of people from each of the race/ethnicity categories?

2. What is the covariance between the number of Black and the number of Native Indian people in the jury pool?

3. How likely is it that in a group of 100 people randomly selected from the county that there are 3 or fewer people in each of the five smallest race categories?

This example mirrors a problem done for an actual discrimination case in a rural county of California where the US Department of Justice showed that juries in the county were not reflective of the demographics of the county and had an injunction applied changing the way juriies are recruited in Californial.

