![meme](https://i.imgflip.com/6x0g1f.jpg)

# Optimizing fantasy basketball

Fantasy basketball is an extremely popular pasttime for nba fans. Participants 'draft' players before the season, then get rewarded for their proxy's in-game performances during the season. 

One popular format is 'Head-to-Head: Most Categories'. Quoting from ESPN's description of the rules:

>Head-to-Head: Most Categories: H2H Most Categories allows you to set "X" number of statistic categories. For each scoring period (usually Monday through Sunday) team totals are accumulated in each of the categories. At the end of the scoring period, the winner is determined by which team wins the most number of categories. The end result is a win (1-0-0), loss (0-1-0) or tie (0-0-1). These results correspond directly to each team's overall record.

Common settings specify 9 categories: points, rebounds, assists, steals, blocks, 3-pointers, field goal %, free throw %, and turnovers. 

Participants are paired up to compete week-by-week, and at the end of the season, the player with the best record wins.

Understanding the game of basketball helps win fantasy drafts. However, it's not the whole ballgame: even if we had precise probability distributions for player performance beforehand, or exact numbers with no uncertainty, it would not be obvious how to draft correctly. Do we want to optimize for all 9 categories, or just some of them? Do we try to compete on the categories that other drafters are going for, or ones they are not? Do we embrace high-volatility players or low-volatility players? The problem becomes a rich mathematical one, more familiar to data scientists than to basketball enthusiasts

There is plenty of speculation about all this within the fantasy basketball community. However, few in the community realize that their strategies are leveraging high-level mathematical intuition. If we lean into the math, and treat this as an optimization problem, can we derive a method for fantasy drafting that delivers consistently high performance?

Our approach will be to break down the drafting problem into three mathematical steps
- Retrospectively drafting a full previous season, with full knowledge of player performance. Weekly totals will be randomly sampled from actual weekly totals for each player. Fantasy drafting this way becomes a purely mathematical problem, and an optimal strategy definitely exists. We will explore different ways of looking for it 
- Incorporating uncertainty into the priors of restrospective-drafting. We will explore how the strategy for retrospective drafting changes when there is uncertainty about the underlying probability distributions of player statistics
- Using predictive data science methods to make estimates of priors/posteriors for actual performance. This last step will allow us to design an actual drafting algorithm that could be used for a season of fantasy basketball

# Retrospective drafting

In [274]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from itertools import combinations
from collections import Counter
pd.set_option('display.max_rows', 100)

We will use the raw data from [link](https://www.advancedsportsanalytics.com/nba-raw-data)

In [5]:
stat_df = pd.read_csv('../data/ASA All NBA Raw Data.csv')
essential_info = stat_df[['player','game_date','pts','trb','ast','stl','blk','fg3','fg','fga','ft','fta','tov']]

In [33]:
player_totals = essential_info.drop(columns= 'game_date').groupby(['player']).sum()
player_totals.loc[:,'ft%'] = player_totals['ft']/player_totals['fta']
player_totals.loc[:,'fg%'] = player_totals['fg']/player_totals['fga']
player_totals = player_totals.drop(columns = ['ft','fta','fg','fga'])

In [362]:
scaler = StandardScaler()
player_totals_scaled = pd.DataFrame(scaler.fit_transform(player_totals)
                                    , index = player_totals.index
                                    ,columns = player_totals.columns)
player_totals_scaled['tov'] = - player_totals_scaled['tov']
naive_guess_1 = player_totals_scaled.sum(axis = 1).sort_values(ascending = False)
punt_strat_1 = player_totals_scaled.drop(columns = ['tov']).sum(axis = 1).sort_values(ascending = False)

In [353]:
stats_by_player = essential_info.groupby('player')

def run_many_head_to_heads(t1,t2):
    return np.mean([run_single_head_to_head(t1,t2) for i in range(1000)])

def run_single_head_to_head(t1, t2):
    total_stats_t1 = pd.concat([stats_by_player.get_group(player).sample(1) for player in t1]).sum(axis = 0)
    total_stats_t2 = pd.concat([stats_by_player.get_group(player).sample(1) for player in t2]).sum(axis = 0)

    res = evaluate_winner_by_total_stats(total_stats_t1,total_stats_t2)
    return res

def run_many_seasons(teams, n_seasons):
    return Counter([run_season(teams) for i in range(n_seasons)])

def run_season(teams):
    combos = combinations(range(len(teams)), 2)
    team_points = [0 for i in range(len(teams))]
    for combo in combos:
        t1 = teams[combo[0]]
        t2 = teams[combo[1]]

        res = run_single_head_to_head(t1,t2)
        if res:
            team_points[combo[0]] += 1
        else: 
            team_points[combo[1]] += 1
    return team_points.index(max(team_points))
    
def winner_by_stats_1win(total_stats_t1, total_stats_t2):
    pts = total_stats_t1['pts'] > total_stats_t2['pts']
    trb = total_stats_t1['trb'] > total_stats_t2['trb'] 
    ast = total_stats_t1['ast'] > total_stats_t2['ast'] 
    blk = total_stats_t1['blk'] > total_stats_t2['blk'] 
    stl = total_stats_t1['stl'] > total_stats_t2['stl']
    fg3 = total_stats_t1['fg3'] > total_stats_t2['fg3'] 
    
    fgp_1 = np.where(total_stats_t1['fga'] > 0
                   ,total_stats_t1['fg']/total_stats_t1['fga']
                    , 0)
    
    fgp_2 = np.where(total_stats_t2['fga'] > 0
                   ,total_stats_t2['fg']/total_stats_t2['fga']
                    , 0)
    fgp = fgp_1 > fgp_2
    
    ftp_1 = np.where(total_stats_t1['fta'] > 0
                   ,total_stats_t1['ft']/total_stats_t1['fta']
                    , 0)
    
    ftp_2 = np.where(total_stats_t2['fta'] > 0
                   ,total_stats_t2['ft']/total_stats_t2['fta']
                    , 0)
    ftp = ftp_1 > ftp_2 
    
    tov = total_stats_t1['tov'] < total_stats_t2['tov']
   
    return (int(pts) + int(trb) + int(ast) + int(blk) + int(stl) + int(fg3) + int(fgp) + int(ftp) + int(tov)) >= 5



In [344]:
def run_draft(agents, number_of_rounds):
    players_available = set(player_totals.index)
    
    teams = [[] for j in range(len(agents))]
    for i in range(number_of_rounds):
        for j in range(len(agents)):
            a = agents[j]
            chosen_player = a(players_available)
            players_available.remove(chosen_player)
            teams[j].append(chosen_player)
        for j in reversed(range(len(agents))):
            a = agents[j]
            chosen_player = a(players_available)
            players_available.remove(chosen_player)
            teams[j].append(chosen_player)

    return teams

In [363]:
def naive_agent_1(players_available):
    f = pd.Series(naive_guess_1.index.isin(players_available), index = naive_guess_1.index)
    chosen_player = next(i for i,v in f.iteritems() if v)
    return chosen_player

def punt_agent_1(players_available):
    f = pd.Series(punt_strat_1.index.isin(players_available), index = punt_strat_1.index)
    chosen_player = next(i for i,v in f.iteritems() if v)
    
    return chosen_player

def random_agent(players_available):
    return random.choice(tuple(players_available))


In [368]:
res = [[] for i in range(12)]
for i in range(12):
    agents = [naive_agent_1]* (11-i) + [punt_agent_1] + [naive_agent_1]*i
    teams = run_draft(agents ,3)
    res[i] = run_many_seasons(teams, 100)

  ,total_stats_t2['ft']/total_stats_t2['fta']
  ,total_stats_t2['ft']/total_stats_t2['fta']
  ,total_stats_t1['ft']/total_stats_t1['fta']
  ,total_stats_t1['ft']/total_stats_t1['fta']
  ,total_stats_t2['ft']/total_stats_t2['fta']
  ,total_stats_t1['ft']/total_stats_t1['fta']
  ,total_stats_t1['ft']/total_stats_t1['fta']
  ,total_stats_t2['ft']/total_stats_t2['fta']
  ,total_stats_t2['ft']/total_stats_t2['fta']
  ,total_stats_t1['ft']/total_stats_t1['fta']
  ,total_stats_t2['ft']/total_stats_t2['fta']
  ,total_stats_t1['ft']/total_stats_t1['fta']
