In [1]:
import pandas as pd
import numpy as np

# Multinomial Distirbutions

Now that we have the language of covariance, it is a good place to discuss the multinomial distribution, a generalization of the binomial distribution.  Consider the following example:

### Example

The University of Northern Colorado has the following populations of majors in its five colleges (note I'm coding this in a Pandas DataFrame because it makes some of the arithemetic we need to do easier - if you are running this on Python/Jupyter on your own computer you may have to download the Pandas module).

In [4]:
unc = pd.DataFrame( [ ['CEBS', 1460], ['HSS', 1500], ['MCB', 834], ['NHS', 2297], ['PVA', 723]], 
                  columns = ['College', 'Students'])
unc.loc[:, 'Proportion'] = unc.loc[:, 'Students'] / unc.loc[:, 'Students'].sum()
unc.loc[5, :] = ['Total', unc.loc[:, 'Students'].sum(), unc.loc[:, 'Proportion'].sum()]
unc

Unnamed: 0,College,Students,Proportion
0,CEBS,1460.0,0.214265
1,HSS,1500.0,0.220135
2,MCB,834.0,0.122395
3,NHS,2297.0,0.3371
4,PVA,723.0,0.106105
5,Total,6814.0,1.0


Suppose we randomly select a group of 5 students, let's assume with replacement for now, how likely is it that one of them is from each college?

### With or Without Replacement

With replacement means we are going to ignore the fact that each choice we make has an effect on the probabilities for the remaining students. This is a valid assumption if the number we are choosing in our group (five) is much less than the total 6814.

On the other hand if we were asking how likely is it that a group of 1000 students had a given distribution between the five colleges, that number is big enough that the choices made would have affected the probabilities involved.

## Definition of a Multinomial Distribution

A multinomial distribution is composed of n trials, where each individual trial has k possible outcomes with probabilities $p_1, p_2, \dots, p_k$.  Note that $\sum p_i = 1$. The random variables are then $Y_i$ the number of times in the n trials outcome i occured. 

Note that $\sum_{i=1}^k Y_i = n $ the total number of trials or otherwise the probability is zero.

The distribution when it is non-zero is given by:

$$ p(y_1, y_2, \dots, y_k) = \frac{n!}{y_1! y_2! \dots y_k!} p_1^{y_1} p_2^{y_2} \dots p_k^{y_k} $$

I remember this by noting that when $k=2$ this gives us the binomial distribution.

So note the pertinent idea and why we did not introduce this distribution earlier in the course. The $Y_i$ form a multivariate distribution almost certainly with some dependence. 

## 1. Expected Value of the $Y_i$

Show that $E( Y_i) = n p_i $


## 2. Variance of the $Y_i$ 

Show that $ V(Y_i) = n p_i (1- p_i) $



## 3. Covariance of $Y_s$ and $Y_t$

We will show that if $s\neq t$ then $\mbox{Cov}(Y_s, Y_t) = - n p_s p_t $ 

Note that the negative covariance makes since - the larger $Y_s$ is the smaller the other variables will have to be. I wrote this up ahead of time to try and get it right.

The trick here is to define some new random variables. Let:

$$ U_i = \left\{ \begin{matrix} 1 & \mbox{if the ith trial results in outcome s} \\ 0 & \mbox{otherwise} \end{matrix} \right. $$

$$ W_i = \left\{ \begin{matrix} 1 & \mbox{if the ith trial results in outcome t} \\ 0 & \mbox{otherwise} \end{matrix} \right. $$

*This may look a little strange*, but it is actually a fairly common trick. $U_i$ and $W_i$ are discrete analogues of $\delta$ functions that are zero everyone except for one place and they are used here in a similar way to how $\delta$ functions appear in results about integral transforms.

We then note that 

$$ Y_s = \sum_{i=1}^n U_i $$ and $$ Y_t = \sum_{j=1}^n W_j$$

We then need a series of results about these variables:

1. The $U_i$ are all independent and the $W_i$ are all independent.

2. $U_i$ and $W_i$ cannot both be 1 as trial i can only be one of outcome s or t and not both. *It could be neither* in which case both $U_i$ and $W_i$ are 0.

3. Result 2. does imply the that $U_i$ and $W_i$ are dependent. **Why?**

4. Because the product of $U_i W_i = 0$ (see 2.) we have that $E( U_i W_i) = 0 $ for each i.

5. $E(U_i)$ is the likeliehood that result i is outcome $2$ and so is $p_s$

6. Likewise $E(W_i) = p_t$.

7. $ \mbox{Cov}(U_i, W_j) = 0 $ if $i\neq j$ because the trials are independent.

8. $\mbox{Cov}(U_i, W_i) = E( U_i W_i) - E(U_i) E(W_i) = 0 - p_s p_t $ 



Putting this all together then, using our results from 2-18 we have that:

$$ \mbox{Cov}( Y_s, Y_t) = \sum_{i, j} \mbox{Cov}(U_i, W_j) $$
$$ = \sum_{i=1}^n \mbox{Cov}(U_i, W_i) $$
$$ = \sum_{i=1}^n (-p_s p_t) = - n p_s p_t $$

# Example

A student club is supposed to be open to all students. However out of the 100 students in the club there are only 5 from each of PVA and CEBS. 

1. In a group of 100 randomly selected students from campus, what is the expected number of students from each of these two colleges?

2. What is the covariance between the number of students in the group from these two colleges? (In other words how strongly do we expect the number of students from PVA to affect the number of students from CEBS)

3. How likely is it that in a group of 100 students randomly selected from campus we will have 5 or fewer students from these two colleges?



## Example

The question above appears innoculous, however consider if we were asking a similar question about the demographic makeup of juries in Weld County court. The multinomial distribution tells us, if they are randomly selected, how we expect the distribution of the jury to change. 