This notebook is a Monte Carlo simulation of a LearnedLeague season. 

It ingests the following information:

1. The recent stats of all players in the league, defined as average TCA and DE in the three most recent seasons they have played in the past twelve months
2. The league's remaining schedule

Remaining TCA is projected based on a weighted average of the recent history and to-date performance that season, with an assumption towards mean regression that diminishes as the season continues.

Individual matchups are determined by projecting a sequence of integers 0-6 that sum to the remaining estimated TCA, running that number through a function assigning a random score weighted by your opponent's defensive efficiency, and compared to your opponent's score.

Final output is a frame sorted by median final placement, with additional promotion/relegation percentage chances added for public leagues.

*Model Limits*

1. Newer players are likely disadvantaged by this system, as it is to be expected that they will outperform their previous seasons as they learn to "read" the questions.
2. Players who have significantly improved themsleves between seasons will not be recognized as such until midseason or later. Additionally, because the model does not give special weight to the most recent season, recent improvement is not assumed to be permanent until repeated in multiple seasons.
3. Individual matchups are blind to the category stats of the individual players; it is my belief that a matchup between players with very different relative strengths is more likely to yield an upset than two players of equivalent relative ability.
4. The "defensive table" and weighting curve used here are not based on meaningful research, and are merely my attempts to make reasonable assumptions about proper inputs.

Example output below is based on a run of Tundra B as of MD0, LL100.

## User Inputs

In [176]:
league_type = 'public' #private #public

season = 100
league_name = 'Tundra'
rundle = 'B'
division = 0 #1, 2, 3 #zero for non-divided rundles

players = 28

#for public leagues; will be ignored in private leagues

promotion = 3
relegation = 7


### Package Installs + Setup

In [160]:
import matplotlib.pyplot as plt
import numpy as np

import pandas as pd
import collections
from functools import reduce

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

from webdriver_manager.chrome import ChromeDriverManager

import time
import random
from statistics import median

pd.set_option('display.max_columns', 30)

In [177]:
def formatted_lg(rundle, string, division):
    if (league_type == 'public') & (division == 0):
        return(rundle + '_' + string.capitalize())
    elif (league_type == 'public') & (division != 0):
        return(rundle + '_' + string.capitalize() + '_Div_' + str(division))
    elif league_type == 'private':
        return(string.title().replace(' ', '_'))
    
formatted_league = formatted_lg(rundle, league_name, division)

formatted_league

'B_Tundra'

In [178]:
def league_string(league_type, season, league_name, rundle, division):
    '''
    function that sets up the url for the league
    '''
    
    league_name = formatted_lg(rundle, league_name, division)
    
    if (league_type == 'public'):
        final_string = str(season) + '&' + league_name
        return(final_string)
        #if (division == 0):
        #    return(final_string)
        #else:
        #    return(final_string + '_Div_' + str(division))
    elif (league_type == 'private'):
        final_string = str(season) + '&' + league_name
        return(final_string)
    
def standings_url(string):
    return('https://learnedleague.com/standings.php?' + string)

In [179]:
league = league_string(league_type, season, league_name, rundle, division)
standings_url(league)

'https://learnedleague.com/standings.php?100&B_Tundra'

In [180]:
service = Service('chromedriver')
driver = webdriver.Chrome()
driver.get('https://learnedleague.com/')

In [181]:
credentials = pd.read_json('credentials.json')

login = driver.find_element(By.NAME, 'username')
login.send_keys(credentials['learnedleague']['username'])

pw = driver.find_element(By.NAME, 'password')
pw.send_keys(credentials['learnedleague']['password'])

clickable = driver.find_element(By.NAME, 'login')
clickable.click()

time.sleep(3)

driver.get(standings_url(league))

## Getting Baseline Info

In [182]:
def profile_getter(players):
    '''
    be sure to be navigated to the standings page already in selenium
    '''
    urls = []
    for i in range(1, players + 1):
        y = driver.find_element(By.XPATH, '//*[@id="lft"]/div[1]/table/tbody/tr[' + str(i) + ']/td[3]/a')
        url = y.get_attribute('href')
        urls.append(url)
        
    return(urls)

def schedule_generator():
    #run once at start of season to get schedule
    
    sched = pd.DataFrame(columns=['MD', 'P1', 'P2'])
    for k in range(1,26): #number of weeks in reg season less one
        time.sleep(1)
        if league_type == 'private':
            driver.get('https://learnedleague.com/schedule.php?' + str(season) + '&' + str(k) + '&' + formatted_league)
        elif league_type == 'public':
            driver.get('https://learnedleague.com/schedule.php?' + str(season) + '&' + str(k) + '&' + formatted_league)
            
        row_count = int((players / 2) + 1) #stops after the last row of players
        
        for i in range(1,row_count):
                x1 = driver.find_element(By.XPATH, '//*[@id="main"]/div/div[1]/div[2]/table/tbody/tr[' + str(i) + ']/td[2]').text
                x2 = driver.find_element(By.XPATH, '//*[@id="main"]/div/div[1]/div[2]/table/tbody/tr[' + str(i) + ']/td[4]').text
                print(k, x1, x2)
                
                sched.loc[len(sched.index)] = [k, x1, x2]  
    
    sched.to_csv('LL' + str(season) + '_' + league_name + '_' + formatted_league + '_rundle_sched.csv')
    return sched

In [None]:
schedule_generator() # only need to run if you haven't before, can comment out otherwise

In [184]:
driver.get(standings_url(league))
urls = profile_getter(players)

In [187]:
def last_five_seasons(number):
    minus_one = 'LL' + str(season - 1)
    minus_two = 'LL' + str(season - 2)
    minus_three = 'LL' + str(season - 3)
    minus_four = 'LL' + str(season - 4)
    minus_five = 'LL' + str(season - 5)
    return([minus_one, minus_two, minus_three, minus_four, minus_five])

def average_stats(urls):
    
    # pulls average TCA and DE for the most recent three seasons, with a tolerance of one missed season this year; 
    # you can mess with the "last four seasons" function above if you want to use a longer range of recent seasons
        
    stats = pd.DataFrame(columns=['Player', 'TCA', 'DE'])
    time.sleep(1)
    for url in urls:
        driver.get(url + "&2")
        time.sleep(1)
        
        try:
            name = driver.find_element(By.XPATH, '//*[@id="main"]/div/div[1]/div[3]/div[2]/h1').text
        except:
            name = driver.find_element(By.XPATH, '//*[@id="main"]/div/div[1]/div[2]/div[2]/h1').text
        
        df = pd.read_html(driver.page_source)[4]
        df = df[df.Season.str.contains('LL')]
        df = df[df.Season.isin(last_five_seasons(season))]
        df = df[:3] #if they've played all four seasons, only take most recent three
        df['DE'] = [float(i) for i in df.DE]
        
        stats.loc[len(stats.index)] = [name, round(df.TCA.mean()), round(df.DE.mean(),3)]  
    return(stats)

In [188]:
player_stats = average_stats(urls)

player_stats.sort_values('TCA', ascending=False)

Unnamed: 0,Player,TCA,DE
9,DrennanJ,109,0.694
6,CookeFB,107,0.678
26,VelkerJ,105,0.665
22,SchultzT,100,0.714
17,KrugK,99,0.661
21,ReillyCM,98,0.62
16,JungL,97,0.627
4,CarterK,96,0.664
23,SilverbergS,95,0.712
24,StilgenbauerA,95,0.63


### Matchup Sim Functions

In [198]:
def defensive_table(de):
    '''
    Just a little something I cooked up to make defense kinda-sorta matter
    and convert correct answers into point totals
    '''
    probs_table = [{0: 1, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9:0},
               {0: (2*.1667)*de, 1: (2*.3333)*de, 2: (2*.3333)*(1-de), 3: (2*.1667)*(1-de), 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9:0},
               {0: 0, 1: (2*.1111)*de, 2: (2*.2222)*de, 3: .3333, 4: (2*.2222)*(1-de), 5: (2*.1111)*(1-de), 6: 0, 7: 0, 8: 0, 9:0},
               {0: 0, 1: 0, 2: (2*.08333)*de, 3: (2*.1667)*de, 4: (2*.25)*de, 5: (2*.25)*(1-de), 6: (2*.1667)*(1-de), 7: (2*.08333)*(1-de), 8: 0, 9:0},
               {0: 0, 1: 0, 2: 0, 3: 0, 4: (2*.1111)*de, 5: (2*.2222)*de, 6: .3333, 7: (2*.2222)*(1-de), 8: (2*.1111)*(1-de), 9:0},
               {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: (2*.1667)*de, 7: (2*.3333)*(de), 8: (2*.3333)*(1-de), 9:(2*.1667)*(1-de)},
               {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9:1}]
    return pd.DataFrame(probs_table)

def heads_up_points(score, DE):
    '''
    random points total for one shot
    '''
    
    probs = list(defensive_table(DE).loc[score])
    points = random.choices([0,1,2,3,4,5,6,7,8,9], weights=probs, k=1)
    
    return(points[0])

def heads_up_points_noisy(score, DE):
    '''
    random points total for one match with a little bit of bullshit
    because sometimes nonsense happens and we gotta account for that
    '''
    
    noise = random.randint(1,100)
    points = heads_up_points(score, DE)
    
    if (noise < 6) and (points > 0) and (score != 6):
        return(points - 1)
    if (noise > 95) and (points < 9) and (score != 0):
        return(points + 1)
    else:
        return(points)
    
def set_of_results(m, n):
    '''
    Generates a set of numbers of length (n) between 1 and 6, normally distributed,
    that sums up to total (m) 
    
    To be used to produce a random set of results for the rest of the season
    '''
    flag = 0
    while flag == 0:
        s = np.random.normal(m/n, 1, n) #mu = 1 is a number I made up as well that seemed to produce a coherent set of #s

        value_list = []

        for i in range(0,len(s)):
            j = round(s[i])
            if j > 6:
                j = 6
            if j < 0:
                j = 0
            value_list.append(j)

        if (sum(value_list) < (m + 1)) and (sum(value_list) > (m - 1)):
            flag = 1
        
    return(value_list)

def get_medians(df):
    '''sort that i like the most for this'''
    median_col = []
    
    for player in df.index:
        sim_pos = list(df.loc[player])
        all_results = []
        
        for i in range(1,len(sim_pos)+1):
            rel_list = list([i] * sim_pos[i-1])
            all_results = all_results + rel_list

        med = median(all_results)
        median_col.append(med)
        
    return(median_col)

### The Loop

The top commented out portion pulls the current standings and creates a stats that remain constant and unmodified in the following loops.

The i loop can be killed at any time to continue down the notebook. To continue appending after, run the "loop" cell again.

In [190]:
driver.get(standings_url(league))

sched_main = pd.read_csv('LL' + str(season) + '_' + league_name + '_' + formatted_league + '_rundle_sched.csv')
sched_main = sched_main[['MD', 'P1', 'P2']]

all_sims = pd.DataFrame(columns=['win', 'lose', 'tie', 'points', 'TCA', 'score', 'differential','place'])

#driver.get(standings_url(league))

to_date = pd.read_html(driver.page_source)[0][['Player', 'W', 'L', 'T', 'PTS', 'MPD', 'TMP', 'TCA', 'FL']]

## example of ad-hoc fix if player name is too long
#to_date = to_date.replace({'Evaskis-Garre.': 'Evaskis-GarrettC'})

for col in ['W', 'L', 'T', 'PTS', 'MPD', 'TMP', 'TCA', 'FL']:
    to_date[col] = [float(i) for i in to_date[col]]

completed = int(to_date['W'][0] + to_date['L'][0] + to_date['T'][0])

if completed > 0:
    to_date['PER_DAY'] = round((to_date['TCA'] / (to_date['W'] + to_date['L'] + to_date['T'] - to_date['FL'])), 2)
    to_date['PER_DAY_PACE'] = to_date['PER_DAY'] * 25
    
else:
    to_date['PER_DAY'] = 0
    to_date['PER_DAY_PACE'] = 0

MD_weight = [0,	0,	0,	0,	0,	0,	0.03,	0.07,	0.11,	0.17,\
             0.24,	0.32,	0.4,	0.5,	0.6,	0.66,	0.76,\
             0.83,	0.89,	0.93,	0.97,	1,	1,	1,	1,	1]


In [208]:
#begin generating sims

session_sims = pd.DataFrame(columns=['win', 'lose', 'tie', 'points', 'TCA', 'score', 'differential','place'])

for i in range(0, 10000): #you can lower this if you want, or pause this cell and continue running after
    seed = random.randint(0,5000000000) #move to higher level when processing

    sim = player_stats
    sim = sim.merge(to_date, on='Player', how='left', suffixes=['','_todate'])
    
    sim['weighted_TCA'] = (sim['TCA'] * (1- MD_weight[completed])) + (sim['PER_DAY_PACE'] * (MD_weight[completed]))
    sim['weighted_TCA'] = [round(i, 0) for i in sim.weighted_TCA]        
    sim['TCA_reversion'] = [round(i - (j*completed),0) for i, j in zip(sim.weighted_TCA, sim.PER_DAY)] #round(i - (j*completed),0)
    sim['random_matches'] = [set_of_results(i, 25 - completed) for i in sim.TCA_reversion]

    sched = sched_main

    sched = sched_main[sched_main['MD'] > completed]
    sched = sched.reset_index()

    sched['P1_TCA'] = [sim[sim.Player == i]['random_matches'].iloc[0][j-completed-1] for i, j in zip(sched.P1, sched.MD)]
    sched['P2_TCA'] = [sim[sim.Player == i]['random_matches'].iloc[0][j-completed-1] for i, j in zip(sched.P2, sched.MD)]

    sched['P1_DE'] = [player_stats[player_stats.Player == i]['DE'].iloc[0] for i in sched.P1]
    sched['P2_DE'] = [player_stats[player_stats.Player == i]['DE'].iloc[0] for i in sched.P2]

    sched['P1_score'] = [heads_up_points_noisy(i,j) for i,j in zip(sched.P1_TCA, sched.P2_DE)]
    sched['P2_score'] = [heads_up_points_noisy(i,j) for i,j in zip(sched.P2_TCA, sched.P1_DE)]

    P1_win = []
    P2_win = []

    P1_tie = []
    P2_tie = []

    P1_lose = []
    P2_lose = []

    for i in range(0, len(sched)):
        if sched.P1_score[i] > sched.P2_score[i]:
            P1_win.append(1)
            P2_win.append(0)
            P1_tie.append(0)
            P2_tie.append(0)
            P1_lose.append(0)
            P2_lose.append(1)
        elif sched.P1_score[i] < sched.P2_score[i]:
            P1_win.append(0)
            P2_win.append(1)
            P1_tie.append(0)
            P2_tie.append(0)
            P1_lose.append(1)
            P2_lose.append(0)
        else:
            P1_win.append(0)
            P2_win.append(0)
            P1_tie.append(1)
            P2_tie.append(1)
            P1_lose.append(0)
            P2_lose.append(0)

    sched['P1_win'] = P1_win
    sched['P2_win'] = P2_win

    sched['P1_lose'] = P1_lose
    sched['P2_lose'] = P2_lose

    sched['P1_tie'] = P1_tie
    sched['P2_tie'] = P2_tie

    sched['differential'] = sched.P1_score - sched.P2_score

    long = pd.DataFrame(columns=['MD', 'Player', 'TCA', 'score', 'win', 'lose', 'tie', 'differential'])

    for i in range(0, len(sched)):
        long.loc[len(long.index)] = [sched.MD[i], sched.P1[i], sched.P1_TCA[i], sched.P1_score[i], sched.P1_win[i], \
                                     sched.P1_lose[i], sched.P1_tie[i],sched.differential[i]]
        
        long.loc[len(long.index)] = [sched.MD[i], sched.P2[i], sched.P2_TCA[i], sched.P2_score[i], sched.P2_win[i], \
                                     sched.P2_lose[i], sched.P2_tie[i],(sched.differential[i] * -1)]
        


    simmed_standings = long.groupby(by="Player").sum()

    simmed_standings['points'] = (simmed_standings['win'] * 2) + simmed_standings['tie']
    simmed_standings = simmed_standings[['win', 'lose', 'tie', 'points', 'TCA', 'score', 'differential']]

    simmed_standings = simmed_standings.sort_values(['points', 'differential', 'score', 'TCA'], ascending=False)
    simmed_standings['place'] = range(1,len(simmed_standings)+1)

    simmed_standings.merge(to_date, on='Player', how='left', suffixes=['','_todate'])

    spt = simmed_standings.merge(to_date, on='Player', how='left', suffixes=['','_todate']) #simmed plus todate

    simmed_latest = pd.DataFrame(columns=['Player', 'W', 'L', 'T', 'Pts', 'TCA', 'TMP', 'Diff', 'FL'])

    #simmed_standings['place'] = range(1,len(simmed_standings)+1)

    simmed_latest['Player'] = spt['Player']
    simmed_latest['W'] = spt['win'] + spt['W']
    simmed_latest['L'] = spt['lose'] + spt['L']
    simmed_latest['T'] = spt['tie'] + spt['T']
    simmed_latest['Pts'] = spt['points'] + spt['PTS']
    simmed_latest['TCA'] = spt['TCA'] + spt['TCA_todate']
    simmed_latest['TMP'] = spt['score'] + spt['TMP']
    simmed_latest['Diff'] = spt['differential'] + spt['MPD']
    simmed_latest['FL'] = spt['FL']

    simmed_latest = simmed_latest.sort_values(['Pts', 'Diff', 'TMP', 'TCA'], ascending=False)
    simmed_latest['place'] = range(1,len(simmed_latest)+1)

    simmed_latest = simmed_latest.set_index(['Player'])

    print(simmed_latest[['place']])

    session_sims = pd.concat([session_sims, simmed_latest])

KeyboardInterrupt: 

In [209]:
all_sims = pd.concat([all_sims, session_sims])
sim_count = len(all_sims) / players #number of iterations run
sim_count

53.0

In [210]:
all_sims['player'] = all_sims.index
wide = pd.pivot_table(all_sims, values = ['place'], index = ['player'], columns=all_sims['place'].values, aggfunc='count', fill_value=0)

In [211]:
if league_type == 'public':
    promotion_frame = wide['place'][range(1,promotion+1)]
    promotion_frame['promoted'] = [promotion_frame.iloc[i].sum(axis=0) for i in range(0, len(promotion_frame))]
    relegation_frame = wide['place'][range(players-relegation + 1, players + 1)]
    relegation_frame['relegated'] = [relegation_frame.iloc[i].sum(axis=0) for i in range(0, len(relegation_frame))]
    wide['promoted'] = promotion_frame['promoted']
    wide['relegated'] = relegation_frame['relegated']

In [212]:
wide['median_pos'] = get_medians(wide) #this is acting weird as of 2.10.24 but is still useful as a sort

In [213]:
wide = wide.sort_values('median_pos', ascending=True)

wide[['median_pos', 'place']]

Unnamed: 0_level_0,median_pos,place,place,place,place,place,place,place,place,place,place,place,place,place,place,place,place,place,place,place,place,place,place,place,place,place,place,place,place
Unnamed: 0_level_1,Unnamed: 1_level_1,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28
player,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2
DrennanJ,6.0,17,9,13,2,4,2,0,3,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
VelkerJ,7.0,5,6,4,5,1,8,6,3,6,0,2,2,1,1,0,0,3,0,0,0,0,0,0,0,0,0,0,0
CookeFB,8.0,11,10,6,6,3,0,4,2,1,2,1,0,1,0,0,2,1,1,0,0,0,0,1,1,0,0,0,0
SilverbergS,10.0,1,5,1,2,4,5,5,3,6,2,2,1,3,1,0,0,1,0,2,1,1,4,2,1,0,0,0,0
JungL,11.0,2,3,3,1,4,3,2,7,1,3,3,6,3,2,0,3,0,1,0,3,1,0,0,1,0,0,1,0
SchultzT,11.0,5,4,5,6,4,2,4,4,0,1,3,1,2,1,0,2,1,0,1,0,1,1,1,0,0,2,0,2
ReillyCM,12.0,0,1,3,6,2,1,3,2,5,3,3,3,6,2,3,1,0,3,1,0,0,0,1,1,2,0,1,0
McNittS,13.0,2,3,3,3,6,1,6,0,1,3,1,3,5,4,3,0,2,0,1,0,1,1,1,2,0,0,1,0
KrugK,14.0,3,1,2,4,6,3,2,1,1,0,3,5,2,3,1,2,2,1,0,2,0,1,3,2,1,2,0,0
CarterK,14.5,3,3,3,3,3,1,1,2,2,5,2,2,1,2,2,1,2,4,4,2,1,1,0,2,1,0,0,0


In [214]:
wide = wide.apply(lambda x: x/sim_count)

if league_type == 'public':
    wide[['promoted', 'relegated', 'place']].to_csv('probs_LL' + league + '_md' + str(completed) + '.csv')
elif league_type == 'private':
    wide[['promoted']].to_csv('probs_LL' + str(season) +'_' + league + '_md' + str(completed) + '.csv')