# [Can You Sweep the Series?](https://thefiddler.substack.com/p/can-you-sweep-the-series)
## May 9 2025

## Problem
_On the ESPN show, “First Take,” the discussion turned to the NBA’s New York Knicks, who would be facing the favored Boston Celtics in the best-of-seven Eastern Conference Semifinals. (Note that this segment aired prior to the beginning of the series.) The question was whether the Knicks were more likely to be “swept” (i.e., lose the series in four games) or for the series to go to seven games. Here’s what Stephen had to say:_

_'I got [the Knicks] losing this in five games, which means they’re closer to a sweep than a seven-game series. That’s how I’m looking at it right now.'_

_Let’s look at the first part of Stephen’s statement, that he believed the Knicks would lose to the Celtics in five games._

_Let p represent the probability the Celtics win any given game in the series. You should assume that p is constant (which means there’s no home-court advantage) and that games are independent._

_For certain values of p, the likeliest outcome is indeed that the Celtics will win the series in exactly five games. While this probability is always less than 50 percent, this outcome is more likely than the Celtics winning or losing in some other specific number of games. In particular, this range can be specified as a < p < b._

_Determine the values of a and b._

## Solution

Let's first establish an expression for the probability of winning in 4, 5, 6, or 7 games. In every case, you win 4 games and must win the last game in order for the series to have actually ended at that number of games. With that we can emplot a modification of the binomial theorem to determine the probability. 

$$
\begin{align*}
&p_4 = \binom{3}{3} p^4 (1-p)^0 \\
&p_5 = \binom{4}{3} p^4 (1-p)^1 \\
&\vdots \\
&p_n = \binom{n-1}{3} p^4 (1-p)^{n-4}
\end{align*}
$$

Now we need to figure out the range of p for which its most probable that we have a 5 game series. We can determine the boundaries by noting one of them occurs when a given p is just as likely to yield a 4 game series and the other occurs when p is as likely to yield a 5 game series. Setting up and solving these equations we get

$$
\begin{align*}
\binom{3}{3} p^4 = \binom{4}{3} p^4 (1-p) \\
1 = 4 (1-p) \\
p=\frac{3}{4}
\end{align*}
$$

$$
\begin{align*}
\binom{4}{3} p^4 (1-p)^1 = \binom{5}{3} p^4 (1-p)^2 \\
4 = 10 (1-p) \\
p = \frac{3}{5}
\end{align*}
$$

And there we have the bounds $\boxed{\frac{3}{5} < p < \frac{3}{4}}$

## Extra Credit

_Now that you’ve determined the values of a and b, let’s analyze the rest of Stephen’s statement. Is it true that losing in five games is “closer to a sweep than a seven-game series”?_

_Let p4 represent the probability that the Celtics sweep the Knicks in four games. And let p7 represent the probability that the series goes to seven games (with either team winning)._

_Suppose p is randomly and uniformly selected from the interval (a, b), meaning we take it as a given that the most likely outcome is that the Knicks will lose the series in five games. How likely is it that p4 is greater than p7? In other words, how often will it be the case that probably losing in five games means a sweep is more likely than a seven-game series?_

We want to find the probability associated with $p_4 > p_7$. we already have expressions for these, but we need to account for $p_7$ having either team win.

$$
\begin{align*}
p^4 > \binom{6}{3} p^4 (1-p)^3 + \binom{6}{3} p^3 (1-p)^4 \\
p > 20 (1-p)^3 (p + 1-p) \\
p > 20 (1-p)^3 
\end{align*}
$$

Solve for p to find $p \approx .6766$. So $p_4$ is more likely when $p>.6766$ and $p_7$ is more likely when $p<.6766$. Given that $p$ is uniformly distributed between .6 and .75, the probability of of $p_4 > p_7$ is

$ \frac{.6766 - .6}{.75 - .6} \approx \boxed{.4895}$

So the seven game series is actually closer than the sweep. 

In [None]:
# some simulation verifications

import numpy as np
import pandas as pd

# np.random.seed(492)

def simulate_best_of_seven(p, n_series):
    games = np.random.rand(n_series, 7) < p
    cumwins = games.cumsum(axis=1)

    a_won_series = cumwins[:, -1] >= 4
    first4_idx = np.where(a_won_series, (cumwins >= 4).argmax(axis=1), 7)

    win_len_freq = {}
    for k in (4, 5, 6, 7):
        win_len_freq[k] = (first4_idx == k - 1).mean()

    a_wins_6 = cumwins[:, 5]
    b_wins_6 = 6 - a_wins_6
    reaches7 = (a_wins_6 <= 3) & (b_wins_6 <= 3)

    p4 = win_len_freq[4]
    p7 = reaches7.mean()

    return win_len_freq, (p4 > p7)

# PART 1: Where is 5-game win most likely?
p_grid = np.arange(0.05, 0.951, 0.005)
series_per_p = 50000

records = []
for p in p_grid:
    win_len, _ = simulate_best_of_seven(p, series_per_p)
    records.append({"p": p, **{f"P(win in {k})": win_len[k] for k in (4, 5, 6, 7)}})

df_grid = pd.DataFrame(records)
mask_5_most_likely = (
    (df_grid["P(win in 5)"] > df_grid["P(win in 4)"]) &
    (df_grid["P(win in 5)"] > df_grid["P(win in 6)"]) &
    (df_grid["P(win in 5)"] > df_grid["P(win in 7)"])
)

p_values_5_most = df_grid.loc[mask_5_most_likely, "p"]
lower_cutoff = p_values_5_most.min()
upper_cutoff = p_values_5_most.max()

# PART 2: Compare sweep vs. 7-game series
# np.random.seed(12345)
N = 10000
p_samples = lower_cutoff + (upper_cutoff-lower_cutoff) * np.random.rand(N)
sweep_more_likely = []

for p in p_samples:
    _, is_sweep_gt_g7 = simulate_best_of_seven(p, 100000)
    sweep_more_likely.append(is_sweep_gt_g7)

prob_sweep_gt_g7 = np.mean(sweep_more_likely)

# Results
print("\nSimulation Results:")
print(f"  Lower cutoff where 5G is most likely: {lower_cutoff:.3f}")
print(f"  Upper cutoff where 5G is most likely: {upper_cutoff:.3f}")
print(f"  P(sweep > 7-game series) when p ~ U(0.6, 0.75): {prob_sweep_gt_g7:.3f}")



Simulation Results:
  Lower cutoff where 5G is most likely: 0.600
  Upper cutoff where 5G is most likely: 0.750
  P(sweep > 7-game series) when p ~ U(0.6, 0.75): 0.479
