In [1]:
import numpy as np
import sympy as sp

# Review

Last class we learned how to produce a distribution for the signle random variables $U$ produced from some formula applied ot a set of multivariate random variables. One of the specific problems we wanted to solve was to produce the distribution of a statistics computed from a sample of a signle variate random variable, for example the mean of the sample.

We came up with threee methods, and let's illustrate one of them with an example:

### Example

The City of Greeley finds that their 10 most dangerous intersections have a similar distribution for the number of accidents per day:  They are Poisson with a mean of 0.5 accidents per day. Let $U$ be the average number of accidents at the 10 intersections in one day. I.e. $Y_i$ is the number of accidents at intersection $i$ and 

$$ U = \frac{1}{10} Y_1 + \frac{1}{10} Y_2 + \dots + \frac{1}{10} Y_{10} $$

Find the distribution of $U$. 

#### Poisson Distribution 

The distributions of the $Y_i$ satisfy:  

$$ p(y) = \frac{1}{2^y y!} e^{-1/2} $$ 

#### Method of Distributions

Then the method of distributions gives:

$$ p_U(u) = \sum_{y_1 + y_2 + \dots + y_{10} = u} p(y_1) p(y_2) \cdots p(y_{10}) $$

(where I cheated a little bit, form a discrete random varialbe we know that the the formula for the cummulative distribution turns into the distribution by just changing the inequality to equality)

which because they are all exponential gives:

$$ p_U(u) = \frac{1}{2^u u!} e^{-5} \sum_{y_1 + y_2 + \dots + y_{10} = u} \frac{u!}{y_1! y_2! \cdots y_{10}!} $$

And the sum then collapses to $10^u$ by the multinomial identity. [See here.](https://en.wikipedia.org/wiki/Multinomial_theorem).  You could also think of this in combinatorics as the number of ways of putting $u$ objects into $10$ bins.

$$ p_U(u) = \frac{5^u}{u!} e^{-5} $$

Which is the Poisson distribution with mean $5$ accidents per day.


# Order Statistics

However statistics computed by a formula applied to the sample are just one type of statistic we might care about. Another example of the type of statistics we might want to compute from a sample arise from the order of the sample:

- The largest or smallest value from our sample.
- The *Median* of our sample depends on the ordering of the values.

### Example

In the example above we may wonder, in a given day what we expect the distribution of the maximum number of accidents at a single intersection to be. 

## Approach

So the approach is the following:  Given a set of independent identically distributed random variables $Y_1, Y_2, \dots Y_n$ we put them in order from smallest to largest 

$$ Y_{(1)}, Y_{(2)}, Y_{(3)} \dots, Y_{(n)} $$

The probability density function of the $Y_i$ is given by the product of the density applied to each:  
$$ f(y_1, y_2, \dots, y_n) = f(y_1) f(y_2) \cdots f(y_n)$$


### Example - Maximum $Y_{(n)}$

Now we would like to find the density or distribution of the $Y_{(1)}$. We use the method of distributions and compute the cummulative distribution:

$$ F_{(n)}(y) = P( Y_{(n)} < y ) $$

This will only happen if all of the $Y_i$ are less than $y$ and so we have that 

$$ F_{(n)}(y) = P( Y_1 < y, Y_2 < y, \dots, Y_n < y) = F(y)^n $$

Where the last step is because the $Y_i$ are independent. 

We then get the density of the maximum value by taking a derivative:  $$ g_n(y) = n F(y)^{n-1} f(y) $$



### Example - Uniform Random Variables

Let $Y_1, Y_2, \dots, Y_5$ be a sample pulled from the uniform random variable on $[0, 1]$.  How likely is it that the largest value in the samplle is bigger than $0.9$?


### Example - Minimum $Y_{(1)}$

We compute this one in an analogy to the previous one:

$$ F_{(1)}(y) = 1 - P( Y_{(1)} > y ) $$

and the previous argument gives 

$$ F_{(1)}(y) = 1 - (1 - F(y) )^n $$

taking the derivative gives:  

$$ g_1(y) = n \left[ 1 - F(y) \right]^{n-1} f(y) $$



## Going Further $Y_{(k)}$

Going further what can we say about the values away from the extremes. 

The idea here is to recognize that the definition of the density as the derivative of the cummulative distribution implies that 

$$ P( y - dy/2 \leq Y \leq y + dy/2 ) \approx f(y) dy $$

I.e. while it is not true that $f(y)$ is the probablity that the value of the random variables is y. It is the value of the proportionality for the probability the random variable is in a small interval about $y$; a statement which becomes more accurate as the width of that interval is decreased.

If we are then wondering how likely it is that $Y_{(k)}$ is near $y$ we note there are three things we need to have happen:

1. We need $k-1$ of the $Y_i$ to be less than $y$.
2. We need one of the $Y_i$ to be near $y$.
3. We need $n-k$ of the $Y_i$ to be greater than $y$.

Within each class the $Y_i$ are idependent and their probablity comes a product of probabilities.



### Example - Median

For a sample of five values form the uniform distribution on $[0, 1]$ the *median* is the middle value corresponding to $Y_{(3)}$.  Find the distribution of $Y_{(3)}$. 

## Joint Distributions of $Y_{(j)}, Y_{(k)}$ where $j<k$

We can supe up our computation of the marginal distribution of $Y_{(k)}$ above to the case of two of the ordered statistics by adding two more cases to the list of 3.

