Question:
* If we roll five distinct six-sided dice, what is the probability that the number of 1s rolled is equal to the number of 2s rolled?

This combinatorics question can be solved directly without a computer (if you're interested, try to to figure out where these numbers come from, or try to come up with an alternative method for getting this same answer):
$$
\text{prob} = \left(\frac{4}{6}\right)^5 + 20 \, \left(\frac{1}{6}\right)^2 \left(\frac{4}{6}\right)^3 + 30 \,\left(\frac{1}{6}\right)^4 \left(\frac{4}{6}\right) = \frac{101}{324} \approx 0.311728395
$$
In this notebook, we'll solve the question in several different ways in Python.
* [A probability estimate using NumPy](#A-probability-estimate-using-NumPy)
* [A probability estimate using pandas](#A-probability-estimate-using-pandas)
* [The exact probability using itertools.product](#The-exact-probability-using-itertools.product)

In [1]:
import numpy as np
import pandas as pd
from itertools import product

## A probability estimate using NumPy

In [2]:
rng = np.random.default_rng()

In [3]:
rng.integers?

In [6]:
A = rng.integers(1,7,size=(10,5))
A

array([[1, 4, 2, 6, 6],
       [5, 6, 4, 3, 5],
       [4, 4, 3, 4, 3],
       [2, 1, 3, 2, 6],
       [2, 4, 2, 6, 4],
       [6, 3, 3, 1, 4],
       [4, 2, 3, 4, 1],
       [1, 5, 5, 3, 3],
       [6, 5, 1, 3, 5],
       [3, 1, 6, 2, 5]])

In [7]:
A == 1

array([[ True, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False,  True, False, False, False],
       [False, False, False, False, False],
       [False, False, False,  True, False],
       [False, False, False, False,  True],
       [ True, False, False, False, False],
       [False, False,  True, False, False],
       [False,  True, False, False, False]])

In [10]:
(A == 1).sum(axis=1)

array([1, 0, 0, 1, 0, 1, 1, 1, 1, 1])

In [11]:
(A == 2).sum(axis=1)

array([1, 0, 0, 2, 2, 0, 1, 0, 0, 1])

In [12]:
((A == 1).sum(axis=1)) == ((A == 2).sum(axis=1))

array([ True,  True,  True, False, False, False,  True, False, False,
        True])

In [14]:
(((A == 1).sum(axis=1)) == ((A == 2).sum(axis=1))).sum()

5

In [17]:
def est_prob(n):
    A = rng.integers(1,7,size=(n,5))
    return (((A == 1).sum(axis=1)) == ((A == 2).sum(axis=1))).sum()/n

In [21]:
est_prob(10**7)

0.3117939

In [22]:
%%timeit
est_prob(10**7)

629 ms ± 7.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## A probability estimate using pandas

In [23]:
A = rng.integers(1,7,size=(10,5))
A

array([[5, 6, 3, 6, 5],
       [1, 2, 6, 4, 3],
       [3, 3, 5, 2, 6],
       [5, 5, 2, 6, 5],
       [2, 1, 5, 3, 2],
       [4, 4, 5, 3, 1],
       [5, 4, 3, 3, 1],
       [4, 4, 5, 4, 1],
       [4, 6, 6, 5, 4],
       [1, 1, 6, 1, 4]])

In [25]:
df = pd.DataFrame(A)
df

Unnamed: 0,0,1,2,3,4
0,5,6,3,6,5
1,1,2,6,4,3
2,3,3,5,2,6
3,5,5,2,6,5
4,2,1,5,3,2
5,4,4,5,3,1
6,5,4,3,3,1
7,4,4,5,4,1
8,4,6,6,5,4
9,1,1,6,1,4


In [27]:
(df == 1).sum(axis=1)

0    0
1    1
2    0
3    0
4    1
5    1
6    1
7    1
8    0
9    3
dtype: int64

In [28]:
df2 = df.copy()

In [36]:
df2["ones"] = (df == 1).sum(axis=1)
df2["twos"] = (df == 2).sum(axis=1)
df2["equal"] = (df2["ones"] == df2["twos"])

In [38]:
b = True

In [39]:
type(b)

bool

In [40]:
int(b)

1

In [41]:
int("one")

ValueError: invalid literal for int() with base 10: 'one'

In [42]:
df2

Unnamed: 0,0,1,2,3,4,ones,twos,equal
0,5,6,3,6,5,0,0,True
1,1,2,6,4,3,1,1,True
2,3,3,5,2,6,0,1,False
3,5,5,2,6,5,0,1,False
4,2,1,5,3,2,1,2,False
5,4,4,5,3,1,1,0,False
6,5,4,3,3,1,1,0,False
7,4,4,5,4,1,1,0,False
8,4,6,6,5,4,0,0,True
9,1,1,6,1,4,3,0,False


In [43]:
df2["equal"]

0     True
1     True
2    False
3    False
4    False
5    False
6    False
7    False
8     True
9    False
Name: equal, dtype: bool

In [44]:
df2["equal"].sum()

3

In [48]:
def est_prob2(n):
    A = rng.integers(1,7,size=(n,5))
    df = pd.DataFrame(A)
    df2 = df.copy()
    df2["ones"] = (df == 1).sum(axis=1)
    df2["twos"] = (df == 2).sum(axis=1)
    df2["equal"] = (df2["ones"] == df2["twos"])
    return (df2["equal"].sum())/n

In [49]:
est_prob2(1000)

0.315

In [50]:
%%timeit
est_prob2(10**7)

1.04 s ± 160 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## The exact probability using itertools.product

In [51]:
product(["a","b","c"],[0,1,10])

<itertools.product at 0x7fefc8ac7340>

In [52]:
list(product(["a","b","c"],[0,1,10]))

[('a', 0),
 ('a', 1),
 ('a', 10),
 ('b', 0),
 ('b', 1),
 ('b', 10),
 ('c', 0),
 ('c', 1),
 ('c', 10)]

In [53]:
for x in product(["a","b","c"],[0,1,10]):
    print(x)

('a', 0)
('a', 1)
('a', 10)
('b', 0)
('b', 1)
('b', 10)
('c', 0)
('c', 1)
('c', 10)


In [56]:
for x in product(range(1,7),repeat=5):
    print(x)

(1, 1, 1, 1, 1)
(1, 1, 1, 1, 2)
(1, 1, 1, 1, 3)
(1, 1, 1, 1, 4)
(1, 1, 1, 1, 5)
(1, 1, 1, 1, 6)
(1, 1, 1, 2, 1)
(1, 1, 1, 2, 2)
(1, 1, 1, 2, 3)
(1, 1, 1, 2, 4)
(1, 1, 1, 2, 5)
(1, 1, 1, 2, 6)
(1, 1, 1, 3, 1)
(1, 1, 1, 3, 2)
(1, 1, 1, 3, 3)
(1, 1, 1, 3, 4)
(1, 1, 1, 3, 5)
(1, 1, 1, 3, 6)
(1, 1, 1, 4, 1)
(1, 1, 1, 4, 2)
(1, 1, 1, 4, 3)
(1, 1, 1, 4, 4)
(1, 1, 1, 4, 5)
(1, 1, 1, 4, 6)
(1, 1, 1, 5, 1)
(1, 1, 1, 5, 2)
(1, 1, 1, 5, 3)
(1, 1, 1, 5, 4)
(1, 1, 1, 5, 5)
(1, 1, 1, 5, 6)
(1, 1, 1, 6, 1)
(1, 1, 1, 6, 2)
(1, 1, 1, 6, 3)
(1, 1, 1, 6, 4)
(1, 1, 1, 6, 5)
(1, 1, 1, 6, 6)
(1, 1, 2, 1, 1)
(1, 1, 2, 1, 2)
(1, 1, 2, 1, 3)
(1, 1, 2, 1, 4)
(1, 1, 2, 1, 5)
(1, 1, 2, 1, 6)
(1, 1, 2, 2, 1)
(1, 1, 2, 2, 2)
(1, 1, 2, 2, 3)
(1, 1, 2, 2, 4)
(1, 1, 2, 2, 5)
(1, 1, 2, 2, 6)
(1, 1, 2, 3, 1)
(1, 1, 2, 3, 2)
(1, 1, 2, 3, 3)
(1, 1, 2, 3, 4)
(1, 1, 2, 3, 5)
(1, 1, 2, 3, 6)
(1, 1, 2, 4, 1)
(1, 1, 2, 4, 2)
(1, 1, 2, 4, 3)
(1, 1, 2, 4, 4)
(1, 1, 2, 4, 5)
(1, 1, 2, 4, 6)
(1, 1, 2, 5, 1)
(1, 1, 2, 5, 2)
(1, 1, 2

(2, 3, 1, 2, 3)
(2, 3, 1, 2, 4)
(2, 3, 1, 2, 5)
(2, 3, 1, 2, 6)
(2, 3, 1, 3, 1)
(2, 3, 1, 3, 2)
(2, 3, 1, 3, 3)
(2, 3, 1, 3, 4)
(2, 3, 1, 3, 5)
(2, 3, 1, 3, 6)
(2, 3, 1, 4, 1)
(2, 3, 1, 4, 2)
(2, 3, 1, 4, 3)
(2, 3, 1, 4, 4)
(2, 3, 1, 4, 5)
(2, 3, 1, 4, 6)
(2, 3, 1, 5, 1)
(2, 3, 1, 5, 2)
(2, 3, 1, 5, 3)
(2, 3, 1, 5, 4)
(2, 3, 1, 5, 5)
(2, 3, 1, 5, 6)
(2, 3, 1, 6, 1)
(2, 3, 1, 6, 2)
(2, 3, 1, 6, 3)
(2, 3, 1, 6, 4)
(2, 3, 1, 6, 5)
(2, 3, 1, 6, 6)
(2, 3, 2, 1, 1)
(2, 3, 2, 1, 2)
(2, 3, 2, 1, 3)
(2, 3, 2, 1, 4)
(2, 3, 2, 1, 5)
(2, 3, 2, 1, 6)
(2, 3, 2, 2, 1)
(2, 3, 2, 2, 2)
(2, 3, 2, 2, 3)
(2, 3, 2, 2, 4)
(2, 3, 2, 2, 5)
(2, 3, 2, 2, 6)
(2, 3, 2, 3, 1)
(2, 3, 2, 3, 2)
(2, 3, 2, 3, 3)
(2, 3, 2, 3, 4)
(2, 3, 2, 3, 5)
(2, 3, 2, 3, 6)
(2, 3, 2, 4, 1)
(2, 3, 2, 4, 2)
(2, 3, 2, 4, 3)
(2, 3, 2, 4, 4)
(2, 3, 2, 4, 5)
(2, 3, 2, 4, 6)
(2, 3, 2, 5, 1)
(2, 3, 2, 5, 2)
(2, 3, 2, 5, 3)
(2, 3, 2, 5, 4)
(2, 3, 2, 5, 5)
(2, 3, 2, 5, 6)
(2, 3, 2, 6, 1)
(2, 3, 2, 6, 2)
(2, 3, 2, 6, 3)
(2, 3, 2, 6, 4)
(2, 3, 2

(3, 3, 6, 6, 2)
(3, 3, 6, 6, 3)
(3, 3, 6, 6, 4)
(3, 3, 6, 6, 5)
(3, 3, 6, 6, 6)
(3, 4, 1, 1, 1)
(3, 4, 1, 1, 2)
(3, 4, 1, 1, 3)
(3, 4, 1, 1, 4)
(3, 4, 1, 1, 5)
(3, 4, 1, 1, 6)
(3, 4, 1, 2, 1)
(3, 4, 1, 2, 2)
(3, 4, 1, 2, 3)
(3, 4, 1, 2, 4)
(3, 4, 1, 2, 5)
(3, 4, 1, 2, 6)
(3, 4, 1, 3, 1)
(3, 4, 1, 3, 2)
(3, 4, 1, 3, 3)
(3, 4, 1, 3, 4)
(3, 4, 1, 3, 5)
(3, 4, 1, 3, 6)
(3, 4, 1, 4, 1)
(3, 4, 1, 4, 2)
(3, 4, 1, 4, 3)
(3, 4, 1, 4, 4)
(3, 4, 1, 4, 5)
(3, 4, 1, 4, 6)
(3, 4, 1, 5, 1)
(3, 4, 1, 5, 2)
(3, 4, 1, 5, 3)
(3, 4, 1, 5, 4)
(3, 4, 1, 5, 5)
(3, 4, 1, 5, 6)
(3, 4, 1, 6, 1)
(3, 4, 1, 6, 2)
(3, 4, 1, 6, 3)
(3, 4, 1, 6, 4)
(3, 4, 1, 6, 5)
(3, 4, 1, 6, 6)
(3, 4, 2, 1, 1)
(3, 4, 2, 1, 2)
(3, 4, 2, 1, 3)
(3, 4, 2, 1, 4)
(3, 4, 2, 1, 5)
(3, 4, 2, 1, 6)
(3, 4, 2, 2, 1)
(3, 4, 2, 2, 2)
(3, 4, 2, 2, 3)
(3, 4, 2, 2, 4)
(3, 4, 2, 2, 5)
(3, 4, 2, 2, 6)
(3, 4, 2, 3, 1)
(3, 4, 2, 3, 2)
(3, 4, 2, 3, 3)
(3, 4, 2, 3, 4)
(3, 4, 2, 3, 5)
(3, 4, 2, 3, 6)
(3, 4, 2, 4, 1)
(3, 4, 2, 4, 2)
(3, 4, 2, 4, 3)
(3, 4, 2

(4, 4, 6, 4, 2)
(4, 4, 6, 4, 3)
(4, 4, 6, 4, 4)
(4, 4, 6, 4, 5)
(4, 4, 6, 4, 6)
(4, 4, 6, 5, 1)
(4, 4, 6, 5, 2)
(4, 4, 6, 5, 3)
(4, 4, 6, 5, 4)
(4, 4, 6, 5, 5)
(4, 4, 6, 5, 6)
(4, 4, 6, 6, 1)
(4, 4, 6, 6, 2)
(4, 4, 6, 6, 3)
(4, 4, 6, 6, 4)
(4, 4, 6, 6, 5)
(4, 4, 6, 6, 6)
(4, 5, 1, 1, 1)
(4, 5, 1, 1, 2)
(4, 5, 1, 1, 3)
(4, 5, 1, 1, 4)
(4, 5, 1, 1, 5)
(4, 5, 1, 1, 6)
(4, 5, 1, 2, 1)
(4, 5, 1, 2, 2)
(4, 5, 1, 2, 3)
(4, 5, 1, 2, 4)
(4, 5, 1, 2, 5)
(4, 5, 1, 2, 6)
(4, 5, 1, 3, 1)
(4, 5, 1, 3, 2)
(4, 5, 1, 3, 3)
(4, 5, 1, 3, 4)
(4, 5, 1, 3, 5)
(4, 5, 1, 3, 6)
(4, 5, 1, 4, 1)
(4, 5, 1, 4, 2)
(4, 5, 1, 4, 3)
(4, 5, 1, 4, 4)
(4, 5, 1, 4, 5)
(4, 5, 1, 4, 6)
(4, 5, 1, 5, 1)
(4, 5, 1, 5, 2)
(4, 5, 1, 5, 3)
(4, 5, 1, 5, 4)
(4, 5, 1, 5, 5)
(4, 5, 1, 5, 6)
(4, 5, 1, 6, 1)
(4, 5, 1, 6, 2)
(4, 5, 1, 6, 3)
(4, 5, 1, 6, 4)
(4, 5, 1, 6, 5)
(4, 5, 1, 6, 6)
(4, 5, 2, 1, 1)
(4, 5, 2, 1, 2)
(4, 5, 2, 1, 3)
(4, 5, 2, 1, 4)
(4, 5, 2, 1, 5)
(4, 5, 2, 1, 6)
(4, 5, 2, 2, 1)
(4, 5, 2, 2, 2)
(4, 5, 2, 2, 3)
(4, 5, 2

(5, 5, 6, 2, 1)
(5, 5, 6, 2, 2)
(5, 5, 6, 2, 3)
(5, 5, 6, 2, 4)
(5, 5, 6, 2, 5)
(5, 5, 6, 2, 6)
(5, 5, 6, 3, 1)
(5, 5, 6, 3, 2)
(5, 5, 6, 3, 3)
(5, 5, 6, 3, 4)
(5, 5, 6, 3, 5)
(5, 5, 6, 3, 6)
(5, 5, 6, 4, 1)
(5, 5, 6, 4, 2)
(5, 5, 6, 4, 3)
(5, 5, 6, 4, 4)
(5, 5, 6, 4, 5)
(5, 5, 6, 4, 6)
(5, 5, 6, 5, 1)
(5, 5, 6, 5, 2)
(5, 5, 6, 5, 3)
(5, 5, 6, 5, 4)
(5, 5, 6, 5, 5)
(5, 5, 6, 5, 6)
(5, 5, 6, 6, 1)
(5, 5, 6, 6, 2)
(5, 5, 6, 6, 3)
(5, 5, 6, 6, 4)
(5, 5, 6, 6, 5)
(5, 5, 6, 6, 6)
(5, 6, 1, 1, 1)
(5, 6, 1, 1, 2)
(5, 6, 1, 1, 3)
(5, 6, 1, 1, 4)
(5, 6, 1, 1, 5)
(5, 6, 1, 1, 6)
(5, 6, 1, 2, 1)
(5, 6, 1, 2, 2)
(5, 6, 1, 2, 3)
(5, 6, 1, 2, 4)
(5, 6, 1, 2, 5)
(5, 6, 1, 2, 6)
(5, 6, 1, 3, 1)
(5, 6, 1, 3, 2)
(5, 6, 1, 3, 3)
(5, 6, 1, 3, 4)
(5, 6, 1, 3, 5)
(5, 6, 1, 3, 6)
(5, 6, 1, 4, 1)
(5, 6, 1, 4, 2)
(5, 6, 1, 4, 3)
(5, 6, 1, 4, 4)
(5, 6, 1, 4, 5)
(5, 6, 1, 4, 6)
(5, 6, 1, 5, 1)
(5, 6, 1, 5, 2)
(5, 6, 1, 5, 3)
(5, 6, 1, 5, 4)
(5, 6, 1, 5, 5)
(5, 6, 1, 5, 6)
(5, 6, 1, 6, 1)
(5, 6, 1, 6, 2)
(5, 6, 1

(6, 6, 5, 6, 1)
(6, 6, 5, 6, 2)
(6, 6, 5, 6, 3)
(6, 6, 5, 6, 4)
(6, 6, 5, 6, 5)
(6, 6, 5, 6, 6)
(6, 6, 6, 1, 1)
(6, 6, 6, 1, 2)
(6, 6, 6, 1, 3)
(6, 6, 6, 1, 4)
(6, 6, 6, 1, 5)
(6, 6, 6, 1, 6)
(6, 6, 6, 2, 1)
(6, 6, 6, 2, 2)
(6, 6, 6, 2, 3)
(6, 6, 6, 2, 4)
(6, 6, 6, 2, 5)
(6, 6, 6, 2, 6)
(6, 6, 6, 3, 1)
(6, 6, 6, 3, 2)
(6, 6, 6, 3, 3)
(6, 6, 6, 3, 4)
(6, 6, 6, 3, 5)
(6, 6, 6, 3, 6)
(6, 6, 6, 4, 1)
(6, 6, 6, 4, 2)
(6, 6, 6, 4, 3)
(6, 6, 6, 4, 4)
(6, 6, 6, 4, 5)
(6, 6, 6, 4, 6)
(6, 6, 6, 5, 1)
(6, 6, 6, 5, 2)
(6, 6, 6, 5, 3)
(6, 6, 6, 5, 4)
(6, 6, 6, 5, 5)
(6, 6, 6, 5, 6)
(6, 6, 6, 6, 1)
(6, 6, 6, 6, 2)
(6, 6, 6, 6, 3)
(6, 6, 6, 6, 4)
(6, 6, 6, 6, 5)
(6, 6, 6, 6, 6)


In [57]:
i = product(range(1,7),repeat=5)

In [58]:
next(i)

(1, 1, 1, 1, 1)

In [59]:
next(i)

(1, 1, 1, 1, 2)

In [60]:
tup = next(i)
tup

(1, 1, 1, 1, 3)

In [61]:
tup.count(1)

4

In [62]:
tup.count(2)

0

In [63]:
tup.count(1) == tup.count(2)

False

In [64]:
type(i)

itertools.product

In [65]:
len(i)

TypeError: object of type 'itertools.product' has no len()

In [66]:
t = 0
s = 0
for tup in product(range(1,7),repeat=5):
    t = t+1
    if tup.count(1) == tup.count(2):
        s = s+1
p = s/t
p

0.3117283950617284

In [67]:
101/324

0.3117283950617284

In [68]:
t

7776

In [69]:
6**5

7776

In [70]:
len(list(product(range(1,7),repeat=5)))

7776

In [71]:
def exact_prob():
    t = 0
    s = 0
    for tup in product(range(1,7),repeat=5):
        t = t+1
        if tup.count(1) == tup.count(2):
            s = s+1
    p = s/t
    return p

In [72]:
# Very fast in this case, because 6^5 is a small number, but this method doesn't generalize as well
# as the NumPy and pandas methods.
%%timeit
exact_prob()

2.22 ms ± 189 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
