# Expected Area of Region on Random Binary Image

Suppose we have an infinitely large 2D image of binary pixels. What's the expectation of the area of each region?

Let's first think about 1D images. Try simulating the expected length of continus pixels. (Note: the simulation is on finante array, and therefore the result may not be accurate.)


In [12]:
import numpy as np
from typing import List

def make_binary_image_1d(n: int):
    return np.random.randint(0, 2, (n,))

def get_lengths(line):
    i, j = 1, 0
    lengths: List[float] = []
    while i < N:
        if line[i] != line[j]:
            lengths.append(float(i - j))
            j = i
        i += 1
    lengths.append(float(i - j))
    return lengths

N = 10
line = make_binary_image_1d(N)
print('binary image 1d:', line)
lengths = get_lengths(line)
print('lengths of each area:', lengths)
avg = np.mean(lengths)
print('average lengths:', avg)

binary image 1d: [1 0 0 1 0 0 1 1 1 0]
lengths of each area: [1.0, 2.0, 1.0, 2.0, 3.0, 1.0]
average lengths: 1.6666666666666667


In [14]:
N = 10000
M = 1000
avg_lengths: List[float] = []
for _ in range(M):
    line = make_binary_image_1d(N)
    lengths = get_lengths(line)
    avg_lengths.append(np.mean(lengths))
print('expected length:', np.mean(avg_lengths))

expected length: 2.0000178422341426


# Expected Length of Region on 1D Random Binary Image

The simulated result is $2$. It's easy to understand. Let's say the length is $l$. Suppose the first pixel is black, then
$$
\begin{aligned}
E[l] &= P(\text{second is white}) \times 1 + P(\text{second is black}) \times (E[l] + 1) \\
     &= \frac 1 2 + \frac 1 2 \times (E[l] + 1) \\
E[l] &= 2
\end{aligned}
$$


# Expected Area of Region on Random Binary Image

Maybe we can do similar math on 2D problem. Let's saya the area (say it's black) is $A$. Then
$$
A = \sum_{i=0}^{n} s_i
$$
where $s_i$ is the number of black pixels of $i$-th line of the region. And
$$
s_i = \sum_{j=0}^{m_i} l_{ij}
$$
where $l_{ij}$ is the length of one continus black segment on $i$-th line. There should be $m_i$ black segments and a total of $2m_i + 1$ segments on $i$-th line. 

We know that $E[l] = 2$. Therefore $E[s] = 2E[m]$.
However, it is hard to tell how many lines is a region expected to have, as well as how many segments is a line supposed to have.

Consider a line of a black region. If the next line has white pixels on corresponding positions of every black pixels in this line, then the region ends here. Otherwise we still have a line. Therefore,
$$
\begin{aligned}
E[n] &= 2^{-E[s]} \times 1 + (1-2^{-E[s]}) \times (E[n] + 1) \\
     &= 1 + (1-2^{-E[s]}) E[n] \\
E[n] &= 2^{E[s]} = 2^{2E[m]} \\
     &= 4^{E[m]}
\end{aligned}
$$

Now it leaves us the final question: how to calculate $E[m]$. 


# Wrong Solution

At first I thought the two dimensions of the image are symmetric. Therefore length of all pixels (black and white) is expected to be the same as the number of lines $n$. Therefore we have
$$
\begin{aligned}
E[n] &= E[\sum_{j=0}^{m_i} l_{ij} + \sum_{j=0}^{m_i-1} l_{ij}'] \\
     &= (2E[m] - 1) \times 2 \\
     &= 4E[m] - 2
\end{aligned}
$$

Therefore we have an equation for $E[m]$:
$$
4^{E[m]} - 4E[m] + 2 = 0
$$

Yet this equation does not have solution.

Then I realized that $n$ should be the length of the longest line, and maybe we can find it using the distribution of line length.

And again I realized that's incorrect too. $n$ should be larger than the length of the longest line, because there might be multiple long lines with position offsets.

There I'm done. No new idea now. Let's run some simulations then maybe we can find some new ideas.
