Hey! This repository deals with fantasy basketball. If you are unfamiliar with how it works, here are some useful links
Fantasy basketball has a standard way of quantifying player value across categories, called 'Z-scoring', and it is used to make objective rankings of players. However, as far as I know, nobody has ever laid out exactly why Z-scores should work. They just seem intuitively sensible, so people use them.
I looked into the math and did manage to derive a justification for Z-scores. However, the justification is only appropriate for the "Rotisserie" format. When the math is modified for head-to-head formats, a different metric that I call "G-score" pops out as the optimal way to rank players instead. I wrote a paper to that effect a few months ago which is available here.
I realize that the paper's explanation is incomprehensible to anyone without a background in math. To that end, I am providing a simplified version of the argument in this readme, which hopefully will be easier to follow
You may have come across Z-scores in a stats 101 class. In that context, they are what happens to a set of numbers after subtracting the mean (average) signified by
For use in fantasy basketball, a few modifications are made to basic Z-scores
- The percentage categories are adjusted by volume. This is necessary because players who shoot more matter more; if a team has one player who goes
$9$ for$9$ ($100\%$ ) and another who goes$0$ for$1$ ($0\%$ ) their aggregate average is$90\%$ rather than$50\%$ . The fix is to multiply scores by the player's volume, relative to average volume -
$\mu$ and$\sigma$ are calculated based on the$\approx 156$ players expected to be on fantasy rosters, rather than the entire NBA
Denoting
- Player
$p$ 's weekly average as$m_p$ -
$\mu$ of$m_p$ across players expected to be on fantasy rosters as$m_\mu$ -
$\sigma$ of$m_p$ across players expected to be on fantasy rosters as$m_\sigma$
Z-scores for standard categories (points, rebounds, assists, steals, blocks, three-pointers, and sometimes turnovers) are
The same definition can be extended to the percentage categories (field goal % and free throw %). With
See below for an animation of weekly blocking numbers going through the Z-score transformation step by step. First the mean is subtracted out, centering the distribution around zero, then the standard deviation is divided through to make the distribution more narrow. Note that a set of
BlockVisv2.mp4
Adding up the results for all categories yields an aggregate Z-score
It is impractical to calculate a truly optimal solution for Rotisserie or any other format, since they are so complex. However, if we simplify the Rotisserie format, we can at least demonstrate that Z-scores are a reasonable heuristic for it
Consider this problem: Team one has
This problem statement makes a few implicit simplifications about Rotisserie drafts
- The goal is to maximize the expected value of the number of categories won against an arbitrary opponent in a single week, where all players perform at their season-long means. Using weekly means instead of season-long totals is just a convenience, to align with the definition of Z-scores. And optimizing for victory against an arbitrary opponent is equivalent to optimizing for total score at the end of a season, since each category victory over any opponent is worth one point
- Besides the player being drafted, all others are assumed to be chosen randomly from a pool of top players. This assumption is obviously not exactly true. However, it is somewhat necessary because we are trying to make a ranking system which does not depend on which other players have been drafted. It is also not as radical as it may seem, since real teams have a mix of strong and some weak players chosen from a variety of positions, making them random-ish in aggregate
- Position requirements, waiver wires, injury slots, etc. are ignored. Drafters use their drafted players the whole season
The simplified problem can be approached by calculating the probability for team one to win each category, then optimizing for their sum
The difference in category score between two teams tells us which team is winning the category and by how much. By randomly selecting the
SimulationVisv2.mp4
You may notice that the result looks a lot like a Bell curve even though the raw block numbers look nothing like a Bell curve. This happens because of the surprising "Central Limit Theorem", which says that when adding a bunch of random numbers together, their sum always ends up looking a lot like a Bell curve. This applies to all the other categories as well
The mean and standard deviation of the Bell curves for category differences can be calculated via probability theory. Including the unchosen player with category average
- The mean is
$m_\mu - m_p$ - The standard deviation is
$\sqrt{2N-1} * m_\sigma$ (The square root in the formula comes from the fact that$STD(X + Y) = \sqrt{STD(X)^2 + STD(Y)^2}$ where$STD(X)$ is the standard deviation of$X$ )
When the category difference is below zero, team one will win the category
The probability of this happening can be calculated using something called a cumulative distribution function.
The
We already know
And analagously for the percentage statistics
The last two equations included Z-scores. Adding up all the probabilities to get the expected number of categories won by team one, with
It is clear that the expected number of category victories is directly proportional to the sum of the unchosen player's Z-scores. This tells us that under the aforementioned assumptions, the higher a player's total Z-score is, the better they are for Rotisserie
"Head-to-Head: Each Category" is deceptively similar to Rotisserie, in the sense that winning one category against one opponent is worth one point. Yes, head-to-head matchups are one at a time rather than simultaneous, but that doesn't matter when the goal is just to do as well as possible against an arbitrary opponent. The main substantive difference between the two formats is that head-to-head matchups occur over a single week, rather than over an entire season. This is important because it means that players don't necessarily perform at their season-long averages for any given matchup. Instead, their performances are somewhat random, depending on how they happen to perform during the week of the matchup.
For Rotisserie, we handled uncertainty about which other players would be chosen by assuming they were chosen randomly. To extend this for head-to-head, we can assume that their performances are randomly chosen as well. Effectively, the revamped assumption is that we are randomly choosing player/weekly performance combos from a set of top players and their performances for a season, rather than just choosing a player and taking their average. Below, see how metrics for blocks change when we look at every weekly performance of the top
ComparisonVis.mp4
Although the mean remains the same, the standard deviation gets larger. This makes sense, because week-to-week "noise" adds more volatility, which is reflected in the additional
Most of the logic from section 2 can also be applied to Head-to-Head: Each Category. The only difference is that we need to use metrics from the pool of players and performances, as laid out in section 3, rather than just players as we did in section 2. The mean is still
And analagously for the percentage statistics,
I call these G-scores, and it turns out that these are quite different from Z-scores. For example, steals have a very high week-to-week standard deviation, and carry less weight in G-scores than Z-scores as a result.
Intuitively, why does this happen? The way I think about it is that investing heavily into a volatile category will lead to only a flimsy advantage, and so is likely less worthwhile than investing into a robust category. Many drafters have this intuition already, de-prioritizing unpredictable categories like steals relative to what Z-scores would suggest. The G-score idea just converts that intuition into mathematical rigor
Our logic relies on many assumptions, so we can't be sure that G-scores work in practice. What we can do is simulate actual head-to-head drafts and see how G-score does against Z-score.
The code in this repository simulates a simplistic version of head-to-head fantasy basketball, via a
The expected win rate if all strategies are equally good is
G-score vs 11 Z-score | Z-score vs. 11 G-score | |
---|---|---|
9-Cat | ||
2021 | ||
2022 | ||
2023 | ||
Overall | ||
8-Cat | ||
2021 | ||
2022 | ||
2023 | ||
Overall |
When interpreting these results, it is important to remember that they are for an idealized version of fantasy basketball. Still, the dominance displayed by G-scores in the simulations suggests that the G-score modification really is appropriate.
To confirm the intuition about why the G-score works, take a look at its win rates by category against
G-score win rate | |
---|---|
Points | |
Rebounds | |
Assists | |
Steals | |
Blocks | |
Three-pointers | |
Turnovers | |
Field goal % | |
Free throw % | |
Overall |
The G-score drafter performs well in stable/high-volume categories like assists and poorly in volatile categories like turnovers, netting to an average win rate of slightly above
Simulations also suggest that G-scores work better than Z-scores in the Head-to-Head: Most Categories format. I chose not to include the results here because it is a very strategic format, and expecting other drafters to go straight off ranking lists is probably unrealistic for it.
Another possible use-case is auctions. There is a well-known procedure for translating player value to auction value, outlined e.g. in this article. If the auction is for a head-to-head format, it is reasonable to use G-scores to quantify value rather than Z-scores
Any situation-agnostic value quantification system is suboptimal, since a truly optimal strategy would adapt to the circumstances of the draft/auction.
In the paper, I outline a methodology called H-scoring that dynamically chooses players based on the drafting situation. It performs significantly better than going straight off G-score and Z-score. However, it is far from perfect, particularly because it does not fully understand the consequences of punting. There is a lot of room for improvement, and I hope that I, or someone else, can make a better version in the future!