<h3>Geometric distribution</h3>
<h4>Javier A. Tiniaco Leyba</h4>

<p>A simple application of <a href="https://en.wikipedia.org/wiki/Geometric_distribution">geometric distribution.</a><br/>
You can play with different distributions interactively in <a href="http://mathlets.org/mathlets/probability-distributions/">MIT</a></p>

<strong>In some island each couple have babies until they get a girl. Suppose that the probability of having a girl or boy is the same and equal to 0.5, and that babies gender is independent from previous ones. Under this policy, what is the ratio of boys to girls in the island?</strong>

<p>One can formulate the process as a geometric random variable. We can use either of the formulations of the geometric distribution to get the same result and even simulate the process:</p>

<h4>Simulation</h4>

<p>X is the random variable that represents the number of trials (babies) needed to get a "success" (girl).</p>

In [10]:
from scipy.stats import geom

In [11]:
x = geom.rvs(p=0.5,size=100000)
x_mean = x.mean()
x_mean

1.99824

<p>We can see that the expected number of trials (babies per couple) until a "success" (girl) is approximately two, so on average each couple will have one boy and one girl following the policy. As a result the ratio of boys two girls is close to one.</p>

<h4>Mathematical formulation 1</h4>

<p>X represents the number of babies until a girl. We can use the analytical formula to calculate the expected number of trials:</p>

$$ E[X] = \mu_x = \frac{1}{p} = \frac{1}{0.5} = 2 $$

<p>So we obtain the same result than by performing a simulation. By the same logic as before, we obtain the same ratio of boys to girls of one.</p>

<h4>Mathematical formulation 2</h4>

<p>X represents the number of <strong>boys</strong> until a girl. The analytical function is in this case:</p>

$$ E[X] = \mu_x = \frac{1-p}{p} = \frac{1-0.5}{0.5} = 1 $$

<p>Notice that in this case, we are not counting the last baby which we know is a girl. So even though the formulation and the result is different, we are still modeling the same process and the ratio of boys to girls is one as in the other cases, since for every girl there is a boy.</p>



<h4>Conclusions</h4>

<p>Care must be taken when using the geometric distribution, since it can be formulated in different ways and the result must be interpreted accordingly. So even though we obtain the same final result with both formulations, the analytical functions are different, so if they are used without knowing the hypothesis behind them, one could be mislead drastically by the results, since as shown before, the expected value or average will differ. It is critically important to understand what formulation a software implementation is using in order to interpret the results.</p>