In [3]:
pip install -q notebook==6.5.6

Note: you may need to restart the kernel to use updated packages.


# Problem

_You’re thinking about doing a backpacking trip through Rainier National Park but there’s one big concern; that’s Samsquanch territory! After thinking it over for a moment you realize you have one big advantage; Samsquanch’s broad shoulders make it more difficult to navigate dense trees while your small, nerdy frame can fit quite easily. In order to make sure you have a chance to escape, you want to make sure the forest you’re hiking through has sufficient tree density._

_Let’s model our forest as a 10x10 square. The top will be the entrance and the bottom will be the exit. For simplicity, we will assume the sides are impassable barriers. Each tree will be modeled by a point and Samsquanch will be modeled by a circle of radius 1. Trees will be distributed uniformly in the 10x10 square. Let’s further assume that Samsquanch can’t fit between any two trees that are less than 2R apart. We need to figure out what density the trees need to be in order for you to evade Samsquanch._

...

# Solution

So we have a number of trees that exist on this line and we need to figure out the probability that there's a gap between neighboring trees that's greater than length 2. Right off the bat, the 'neighboring' trees description should clue us into the fact that we'll want to use order statistics. Instead of jumping right to this though, let's start a bit simpler. With just one tree, the length of each segment will, of course is a uniform distribution. With two trees, things get a bit more complex. Let $X_1$ and $X_2$ be the location of the trees. The we can descibe the length of the three resulting segments as 

$ 
L_1 = min(X_1, X_2) \\
L_3 = 1 - max(X_1, X_2) \\
L_2 = |X_2 - X_1|
$

You may already know that $L_1$ and $L_3$ have distributions

$ 
f_{L_1} = 2 - 2x \text{ where } 0 \leq x \leq 1 \\
f_{L_3} = 2 - 2x \text{ where } 0 \leq x \leq 1
$ 

but let's quick verify.

$ F_{L_1}(l_1) = P (L_1 \leq l_1) = P (min(X_1, X_2) \leq l_1) = P (X_1 \leq l_1 \text{ and } X_2 \leq l_1)  = P (X_1 \leq l_1 ) P(X_2 \leq l_1) = F_{X_1} F_{X_2} $

Now we just need to take a deriviative (product rule)

$ \frac{d}{dL_1} F_{L_1} = \frac{d}{dL_1} F_{X_1} F_{X_2}  = f_{X_2} F_{X_1} + f_{X_1} F_{X_2} = x + x = 2x $

A similar procedure can be used for $L_3$. 

Now for $L_2$.

$ \begin{align}
F_{L_2}(t) &= P(|X_1 - X_2| \leq t) \\
&= 1 - P(|X_1 - X_2| > t)
\end{align}
$

Since $X_1$ and $X_2$ are independent and uniformly distributed, their joint is uniformly distribute on the 1x1 square, se we can just calculate the area within the 1x1 square and outside of $X_1 + t$ and $X_1 - t$ (two triangles) to get

$ P(|X_1 - X_2| > x) = 2 \times \left( \frac{(1 - x)^2}{2} \right) = (1 - x)^2 \\
\rightarrow F_{L_2}(x) = 1 - (1 - x)^2 = 2x - x^2 $

And so the PDF is 

$ f_{L_2}(x) = \frac{d}{dx} F_{L_2}(x) = 2 - 2x $

It makes sense that they're all the same given their symmetry. 

Now we could procede in a similar maner iteratively using $max$, $min$ and taking convolutions but if we reframe the problem, we can more easy get the generalized solution for any n-th order statistic.

If we have a set of random variables $X_1, X_2, \cdots , X_N$ (let's think of them as points on the line $[0,1]$) that have been ordered and originate from a standard uniform distribution. Consider for a moment that if we want to find the distribution of the location of the k-th order statistic (i.e. $X_k$) then there are $k$ points that are less than or equal to $X_k$ and $N-k$ points that are above or equal. This distinction is a binary one and so we can formulate the CDF with the help of the Bernoulli distribution. Recall that the Bernoulli distribution computes the probability that we observe exactly $k$ 'successes' and is formulated as follows

$ F_X = \binom{N}{k} x^k (1-x)^{N-k} $

In our case, we want to cumulative distribution of the $X_k$ order statistic so we need the sum from $1$ to $k$. 

$ F_{X_k} = \sum_{j=0}^k \binom{N}{j} x^j (1-x)^{N-j} $

And there it is! To get the PDF all we need is take the derivative with respect to x. I omit the details here and we get

$ f_{X_{(k)}}(x) = \frac{n!}{(k - 1)! \, (n - k)!} \, x^{k - 1} (1 - x)^{n - k}, \quad \text{for } 0 \leq x \leq 1 $

It should be noted that this also happens to be the Beta distribution, which crops up in a lot of interesting places. 

With the a general form for of the distribution we next need to compute the probabilities

$
\begin{align}
P(X_1 &\geq 2) \\
P(X_2 -X_1 &\geq 2) \\
P(X_3 -X_2 &\geq 2) \\
&\vdots \\
P(X_7 -X_6 &\geq 2) \\
P(1 - X_7 &\geq 2)
\end{align}
$

Here again we can use convolutions however these integrations get a bit unruly so I will instead using the well documented joint distribution (honestly, the spacing distribution for order statistics is also reasonably well documented but let's not take all the fun out if it).

$ f_{X_{(i)}, X_{(j)}}(x, y) = \frac{n!}{(i - 1)! \, (j - i - 1)! \, (n - j)!} \, x^{i - 1} (y - x)^{j - i - 1} (1 - y)^{n - j}, \quad \text{for } 0 \leq x \leq y \leq 1 $

We want the distance $D$ from one order statistic to the next so we can set $j=i+1$ and express the distribution of the distance as follows

$ f_{X_{(i)}, D}(x, d) = f_{X_{(i)}, X_{(i+1)}}(x, x + d) = \frac{n!}{(i - 1)! \, (n - i - 1)!} \, x^{i - 1} (1 - x - d)^{n - i - 1} $ 

Using this, we can find the marginal of $D$ by integrating over $x$. Note the $1-d$ in the upper limit of the integral becuase of the change of variables $y=x+d$. So if we were integrating from $0 \leq x \leq y 1 $ then we get $x+d = y \leq 1 $ implying $ x \leq 1-d$

$ f_{D}(d) = \int_{0}^{1 - d} f_{X_{(i)}, D}(x, d) \, dx = \frac{n!}{(i - 1)! \, (n - i - 1)!} \int_{0}^{1 - d} x^{i - 1} (1 - x - d)^{n - i - 1} \, dx $

Evaluating this with software we get

$ f_D(d) = n(1-d)^{(n-1)}$

We'll notice that this too is a Beta distribution! $ B(1,n)$

Of course, we could have just employed a symmetry argument to establish that each of these different spacings had to have the same distribution, the easiest of which to calculate is the spaceing from 0 to $X_1$ which is obviously just $B(1,n)$.

So if compute

$ P(X_1 \geq 2) = \int_0^{0.2} B(1,n) dx $

And raise to the eighth power, we have our answer...

Except that DOESNT WORK becuase each spacing variable is NOT independent (sigh). I feel a little stupid. I should have realized the approach wasn't valid way before :) 

So how should we do this?

I need some more time to figure this out! Updates will be forthcoming!