# The Birthday Paradox
## *How Big Does a Group Need to Be?*

---

Imagine you walk into a room with 22 other people — 23 total. Nobody knows each other. Nothing was planned.

What are the chances that at least two people in that room share the same birthday? Not the same birth *year* — just the same month and day.

Take a guess before you read any further. Write it down if you can.

Most people's intuition is wildly off on this one — and understanding *why* is the whole point.

## A Clue: Think About Pairs

Here's where your intuition likely goes wrong. When you hear "23 people," your brain thinks: *"There are 365 days in a year and only 23 people — the odds must be tiny."*

But the question isn't whether someone shares *your* birthday. It's whether *any two people* in the room match. And 23 people can be arranged into $\binom{23}{2} = 253$ different pairs. That's a lot more comparisons than you might expect.

The exact probability works by calculating the chance that *nobody* matches — each new person must land on a birthday not yet taken:

$$P(\text{no match}) = \frac{365}{365} \times \frac{364}{365} \times \frac{363}{365} \times \cdots \times \frac{365-n+1}{365}$$

The chance of at least one match is then $1 - P(\text{no match})$. But don't take the formula's word for it — let's watch it play out.

In [None]:
import random
import math
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns
import ipywidgets as widgets

sns.set_theme(style="ticks", font_scale=1.2)

In [None]:
def exact_birthday_probability(n: int) -> float:
    """Exact probability that at least 2 people in a group of n share a birthday."""
    if n > 365:
        return 1.0
    p_no_match = 1.0
    for k in range(n):
        p_no_match *= (365 - k) / 365
    return 1 - p_no_match


def simulate_birthday_probability(n: int, trials: int = 5000) -> float:
    """Monte Carlo estimate of the birthday match probability."""
    matches = 0
    for _ in range(trials):
        birthdays = [random.randint(1, 365) for _ in range(n)]
        if len(birthdays) != len(set(birthdays)):
            matches += 1
    return matches / trials


# Pre-compute the full curve (exact)
group_sizes  = list(range(1, 81))
exact_probs  = [exact_birthday_probability(n) for n in group_sizes]

Drag the slider to change the group size. Start small and watch the curve climb. *How many people does it take to cross 50%?* Keep going — how fast does it reach near-certainty?

Was your original guess close?

In [None]:
@widgets.interact(n=widgets.IntSlider(
    value=23, min=2, max=80, step=1,
    description="Group size:",
    style={"description_width": "initial"},
    layout=widgets.Layout(width="500px"),
))
def plot_birthday(n):
    p_exact = exact_birthday_probability(n)
    pairs   = n * (n - 1) // 2

    fig, ax = plt.subplots(figsize=(9, 5))

    ax.plot(group_sizes, [p * 100 for p in exact_probs],
            color="#5B8FB9", linewidth=2.5, zorder=2, label="Exact probability")

    ax.fill_between(group_sizes[:n], [p * 100 for p in exact_probs[:n]],
                     alpha=0.15, color="#5B8FB9")

    ax.scatter([n], [p_exact * 100], color="#E8575A", s=120, zorder=5,
               label=f"n = {n}  →  {p_exact*100:.1f}%")

    ax.axhline(50, color="gray", linestyle="--", linewidth=1, label="50% threshold")

    ax.set_xlabel("Number of People in the Group", fontsize=12)
    ax.set_ylabel("Probability of a Shared Birthday (%)", fontsize=12)
    ax.set_title("Birthday Paradox — Probability vs. Group Size",
                 fontsize=14, weight="bold")
    ax.set_ylim(0, 105)
    ax.set_xlim(1, 80)
    ax.legend(fontsize=11)

    sns.despine()
    plt.tight_layout()
    plt.show()

    print(f"Group of {n} people")
    print(f"  Possible birthday pairs: {pairs:,}")
    print(f"  Probability of a shared birthday: {p_exact*100:.1f}%")
    if p_exact >= 0.5:
        print(f"  ✅ More likely than not!")
    else:
        print(f"  Still less than 50% — keep adding people!")

interactive(children=(IntSlider(value=23, description='Group size:', layout=Layout(width='500px'), max=80, min…

The curve looks convincing, but that was calculated from a formula. Can we trust it? Let's verify by actually running the experiment — generating thousands of random groups and counting how many contain a birthday match.

In [None]:
# Sample a few checkpoints and compare exact vs simulated
checkpoints = [10, 23, 30, 50, 70]

print(f"{'Group Size':>12} | {'Exact %':>10} | {'Simulated %':>12}")
print("-" * 40)
for n in checkpoints:
    exact = exact_birthday_probability(n) * 100
    sim   = simulate_birthday_probability(n, trials=10000) * 100
    print(f"{n:>12} | {exact:>9.1f}% | {sim:>11.1f}%")

  Group Size |    Exact % |  Simulated %
----------------------------------------
          10 |      11.7% |        12.3%
          23 |      50.7% |        50.9%
          30 |      70.6% |        70.9%
          50 |      97.0% |        97.3%
          70 |      99.9% |        99.9%


## Why This Matters

Think about what just happened. You had an intuition — and it was probably wrong. Not because you're bad at math, but because human brains think about probability in a linear, one-at-a-time way. We imagine adding people one by one. But probability doesn't work like that — it compounds through *pairs*, and pairs grow much faster than people.

That gap between intuition and reality is exactly where data science lives. When your gut says one thing and the data says another, the data wins. And sometimes the best way to believe the math is to run the experiment yourself.

---

*I know someone in my elementary class who shares the same birthday*