## Motivating Example

In the casino game craps, after the “point” has been set, two dice are rolled repeatedly until either the “point” or a 7 comes up, at which time the round ends. Suppose the point is 4. What is the probability that it takes more than 6 rolls for the round to end?

We can calculate the probability that the round ends on a given roll by directly counting the 36 possible outcomes of two dice: $P(\text{roll a 4 or a 7})=\frac{9}{36}$
 
But how do we use this to determine the probability that it takes more than 6 rolls for the round to end?

## Theory

- This logic is actually exactly the same as the binomial one. We are drawing from a box with replacement, **until** we get a 1

- Theorem 14.1: Unsurprisingly, the PMF for the geometric distribution look almost identical to the binomial
    - $$
        f(x) = \frac{N_0^{x-1} N_1}{N^x}
        $$
    - main difference: there is no permutation term
    - This is because the sequence ends the moment you get a 1, so the 1 MUST be at the end of the sequence

### Proving the theorem

- Total ways to draw make $x$ draws from $N$ with replacement: $N^x$
- The first $x-1$ draws must be 0s, else you will not get to make the $x$-th draw
- The last draw must be 1 by definition

### Visualising distribution

- What happens when $\frac{N_1}{N}$ increases?
    - The higher the proportion of 1s, the more likely the sequence terminates early, so higher density in the lower values of X

In [32]:
import numpy as np
import scipy

display(scipy.stats.geom.pmf(range(10), 0.2))
display(scipy.stats.geom.pmf(range(10), 0.5))
display(scipy.stats.geom.pmf(range(10), 0.8))

array([0.        , 0.2       , 0.16      , 0.128     , 0.1024    ,
       0.08192   , 0.065536  , 0.0524288 , 0.04194304, 0.03355443])

array([0.        , 0.5       , 0.25      , 0.125     , 0.0625    ,
       0.03125   , 0.015625  , 0.0078125 , 0.00390625, 0.00195312])

array([0.000e+00, 8.000e-01, 1.600e-01, 3.200e-02, 6.400e-03, 1.280e-03,
       2.560e-04, 5.120e-05, 1.024e-05, 2.048e-06])

## Solving the example

- When rolling 2 dice, there are 36 possible outcomes
- There is a $\frac{6}{36}$ chance of a 7, and a $\frac{3}{36}$ chance of a 4, so each roll have a $\frac{9}{36}$ chance of ending the sequence
- We want to know the probability that it takes more than 6 rolls, or $F(6)$
- To find this, we take $1 - F(5) = 1 - f(5) - f(4) - ... f(0)$
- Simulation and calculation below

In [87]:
import numpy as np
import scipy

print(1 - np.sum(scipy.stats.geom.pmf(range(7), 9/36)))

population=range(1,7)
samples=np.array([int(np.sum(np.random.choice(population, size=2, replace=True)) in [4,7]) for _ in range(100_000)])
split_index=np.where(samples==True)[0]
split_samples=np.split(samples, split_index+1)
len_split_samples = [len(x) > 6 for x in split_samples]
np.mean(len_split_samples)

0.177978515625


0.17955212479395327