###### 

###### Overview

In this notebook I extract Glicko-2 ratings and rating deviations for the end of every time batch or epoch, whichever is shorter (max 1 year).  This data is converted into a DataFrames and is stored for use in the Glick2Elo notebook.

###### Imports

In [1]:
import pandas as pd
from functions import assembleDf, epochG, epochsG, ceildiv, splitBatch
from datetime import datetime, timedelta
import numpy as np
import sys
sys.path.append('..')
from pyglicko2.glicko2_tests import exampleCase
from pyglicko2.glicko2 import Player
import glicko2
import time

Read in the data.

In [2]:
matches = pd.read_csv('../Data/matches_glicko2.csv',
                      parse_dates = ['tourney_date'], 
                      infer_datetime_format = True)

In [3]:
matches.head(2)

Unnamed: 0,tourney_date,winner_id,loser_id,tourney_id
0,1877-07-09,113987,114149,1877-540
1,1877-07-09,113987,113999,1877-540


In [4]:
# 32577 unique player ids in the matches DataFrame
len(set(list(matches['winner_id'].unique())+list(matches['loser_id'].unique())))

32577

Because memory constraints limit the batch size in epochsG, I will need to process the matches incrementally.  Note that this could take a few minutes and is not necessary to run as the results are already stored in the Data folder.

In [5]:
# base batches.  can be further broken up by splitBatch
eg_batches_50 = [(s,f) for s,f in zip(range(0, 876_978-50_000,50_000),
                      range(50_000, 876978,50_000))] + [(850_000,876979)]
eg_batches_100 = [(s,f) for s,f in zip(range(0, 876_978-100_000,100_000),
                      range(100_000, 876978,100_000))] + [(800_000,876979)]

ratings_histories = []
players_dict = {}

In [6]:
%%time
for batch in eg_batches_50[0:3]:
    players_dict,rh = epochsG(matches[batch[0]:batch[1]],players_dict,365)
    ratings_histories += [rh]

CPU times: user 19 s, sys: 403 ms, total: 19.4 s
Wall time: 19.3 s


In [7]:
eg_batches_50[3]

(150000, 200000)

In [8]:
b = splitBatch(eg_batches_50[3],2); b0 = b[0]; b1 = b[1]
b

[(150000, 175000), (175000, 200000)]

In [9]:
%%time
players_dict,rh = epochsG(matches[b0[0]:b0[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 1.98 s, sys: 38.3 ms, total: 2.02 s
Wall time: 2.02 s


In [10]:
%%time
players_dict,rh = epochsG(matches[b1[0]:b1[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 1.96 s, sys: 37.1 ms, total: 2 s
Wall time: 2 s


In [11]:
splitBatch(eg_batches_50[4],3)

[(200000, 216667), (216667, 233334), (233334, 250000)]

In [12]:
b = splitBatch(eg_batches_50[4],4); 
b0 = b[0]; b1 = b[1]; b2 = b[2]; b3=b[3]
b

[(200000, 212500), (212500, 225000), (225000, 237500), (237500, 250000)]

In [13]:
%%time
players_dict,rh = epochsG(matches[b0[0]:b0[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 1.08 s, sys: 21.8 ms, total: 1.1 s
Wall time: 1.1 s


In [14]:
%%time
players_dict,rh = epochsG(matches[b1[0]:b1[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 1.04 s, sys: 22.7 ms, total: 1.06 s
Wall time: 1.06 s


In [15]:
%%time
players_dict,rh = epochsG(matches[b2[0]:b2[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 1.04 s, sys: 23.6 ms, total: 1.06 s
Wall time: 1.06 s


In [16]:
%%time
players_dict,rh = epochsG(matches[b3[0]:b3[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 1.13 s, sys: 24.4 ms, total: 1.15 s
Wall time: 1.15 s


In [17]:
%%time
players_dict,rh = epochsG(matches[eg_batches_50[5][0]:eg_batches_50[5][1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 3.83 s, sys: 59.2 ms, total: 3.89 s
Wall time: 3.89 s


In [18]:
b = splitBatch(eg_batches_50[6],8); 
b0 = b[0]; b1 = b[1]; b2 = b[2]; b3=b[3];
b4 = b[0]; b5 = b[1]; b6 = b[2]; b7=b[3]
b

[(300000, 306250),
 (306250, 312500),
 (312500, 318750),
 (318750, 325000),
 (325000, 331250),
 (331250, 337500),
 (337500, 343750),
 (343750, 350000)]

In [19]:
%%time
players_dict,rh = epochsG(matches[b0[0]:b0[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 819 ms, sys: 18.6 ms, total: 837 ms
Wall time: 836 ms


In [20]:
%%time
players_dict,rh = epochsG(matches[b1[0]:b1[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 873 ms, sys: 16.3 ms, total: 890 ms
Wall time: 888 ms


In [21]:
%%time
players_dict,rh = epochsG(matches[b2[0]:b2[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 750 ms, sys: 21.1 ms, total: 772 ms
Wall time: 765 ms


In [22]:
%%time
players_dict,rh = epochsG(matches[b3[0]:b3[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 829 ms, sys: 19 ms, total: 848 ms
Wall time: 846 ms


In [23]:
%%time
players_dict,rh = epochsG(matches[b4[0]:b4[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 879 ms, sys: 31.1 ms, total: 910 ms
Wall time: 896 ms


In [24]:
%%time
players_dict,rh = epochsG(matches[b5[0]:b5[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 875 ms, sys: 17.7 ms, total: 892 ms
Wall time: 890 ms


In [25]:
%%time
players_dict,rh = epochsG(matches[b6[0]:b6[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 746 ms, sys: 17.7 ms, total: 764 ms
Wall time: 763 ms


In [26]:
%%time
players_dict,rh = epochsG(matches[b7[0]:b7[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 842 ms, sys: 22.2 ms, total: 864 ms
Wall time: 860 ms


In [27]:
b = splitBatch(eg_batches_50[7],7); 
b0 = b[0]; b1 = b[1]; b2 = b[2]; b3 = b[3]; b4 = b[4]; b5 = b[5];
b6 = b[6]


In [28]:
%%time
players_dict,rh = epochsG(matches[b0[0]:b0[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 904 ms, sys: 17.5 ms, total: 921 ms
Wall time: 920 ms


In [29]:
%%time
players_dict,rh = epochsG(matches[b1[0]:b1[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 889 ms, sys: 21.9 ms, total: 911 ms
Wall time: 906 ms


In [30]:
%%time
players_dict,rh = epochsG(matches[b2[0]:b2[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 935 ms, sys: 20.2 ms, total: 955 ms
Wall time: 952 ms


In [31]:
%%time
players_dict,rh = epochsG(matches[b2[0]:b2[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 940 ms, sys: 22.1 ms, total: 962 ms
Wall time: 956 ms


In [32]:
%%time
players_dict,rh = epochsG(matches[b3[0]:b3[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 947 ms, sys: 21 ms, total: 968 ms
Wall time: 965 ms


In [33]:
%%time
players_dict,rh = epochsG(matches[b4[0]:b4[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 851 ms, sys: 17.2 ms, total: 868 ms
Wall time: 867 ms


In [34]:
%%time
players_dict,rh = epochsG(matches[b5[0]:b5[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 1 s, sys: 24.8 ms, total: 1.03 s
Wall time: 1.02 s


In [35]:
%%time
players_dict,rh = epochsG(matches[b6[0]:b6[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 966 ms, sys: 17.4 ms, total: 984 ms
Wall time: 983 ms


In [36]:
b = splitBatch(eg_batches_50[8],2); b0 = b[0]; b1 = b[1]
b

[(400000, 425000), (425000, 450000)]

In [37]:
%%time
players_dict,rh = epochsG(matches[b0[0]:b0[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 2.12 s, sys: 51.5 ms, total: 2.17 s
Wall time: 2.16 s


In [38]:
%%time
players_dict,rh = epochsG(matches[b1[0]:b1[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 1.92 s, sys: 49.1 ms, total: 1.97 s
Wall time: 1.96 s


In [39]:
b = splitBatch(eg_batches_50[9],8); b0 = b[0]; b1 = b[1]; 
b2 = b[2];b3 = b[3];b4 =b[4]; b5 = b[5]; b6 = b[6]; b7 = b[7]
b

[(450000, 456250),
 (456250, 462500),
 (462500, 468750),
 (468750, 475000),
 (475000, 481250),
 (481250, 487500),
 (487500, 493750),
 (493750, 500000)]

In [40]:
%%time
players_dict,rh = epochsG(matches[b0[0]:b0[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 850 ms, sys: 18.9 ms, total: 869 ms
Wall time: 868 ms


In [41]:
%%time
players_dict,rh = epochsG(matches[b1[0]:b1[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 994 ms, sys: 23.5 ms, total: 1.02 s
Wall time: 1.01 s


In [42]:
b = splitBatch(b2,2)
b2_0 = b[0]; b2_1 = b[1];

In [43]:
%%time
players_dict,rh = epochsG(matches[b2_0[0]:b2_0[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 759 ms, sys: 23.8 ms, total: 783 ms
Wall time: 774 ms


In [44]:
%%time
players_dict,rh = epochsG(matches[b2_1[0]:b2_1[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 781 ms, sys: 27 ms, total: 808 ms
Wall time: 793 ms


In [45]:
%%time
players_dict,rh = epochsG(matches[b3[0]:b3[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 909 ms, sys: 19.9 ms, total: 929 ms
Wall time: 926 ms


In [46]:
%%time
players_dict,rh = epochsG(matches[b4[0]:b4[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 820 ms, sys: 16.7 ms, total: 837 ms
Wall time: 835 ms


In [47]:
%%time
players_dict,rh = epochsG(matches[b5[0]:b5[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 960 ms, sys: 22.6 ms, total: 983 ms
Wall time: 980 ms


In [48]:
%%time
players_dict,rh = epochsG(matches[b6[0]:b6[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 1.01 s, sys: 22.1 ms, total: 1.03 s
Wall time: 1.03 s


In [49]:
%%time
players_dict,rh = epochsG(matches[b7[0]:b7[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 977 ms, sys: 19.6 ms, total: 997 ms
Wall time: 996 ms


In [50]:
b = splitBatch(eg_batches_50[10],2); b0 = b[0]; b1 = b[1]
b

[(500000, 525000), (525000, 550000)]

In [51]:
%%time
players_dict,rh = epochsG(matches[b0[0]:b0[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 1.85 s, sys: 27.8 ms, total: 1.88 s
Wall time: 1.88 s


In [52]:
%%time
players_dict,rh = epochsG(matches[b1[0]:b1[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 1.93 s, sys: 41.2 ms, total: 1.97 s
Wall time: 1.97 s


In [53]:
b = splitBatch(eg_batches_50[11],5); b0 = b[0]; b1 = b[1];
b2 = b[2]; b3 = b[3];b4 = b[4];
b

[(550000, 560000),
 (560000, 570000),
 (570000, 580000),
 (580000, 590000),
 (590000, 600000)]

In [54]:
%%time
players_dict,rh = epochsG(matches[b0[0]:b0[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 1.09 s, sys: 25.9 ms, total: 1.11 s
Wall time: 1.11 s


In [55]:
b = splitBatch(b1,2)
b1_0 = b[0]; b1_1 = b[1];

In [56]:
%%time
players_dict,rh = epochsG(matches[b1_0[0]:b1_0[1]],players_dict,365)
ratings_histories += [rh]

CPU times: user 937 ms, sys: 26.4 ms, total: 963 ms
Wall time: 957 ms


In [57]:
%%time
players_dict,rh = epochsG(matches[b1_1[0]:b1_1[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 879 ms, sys: 23.9 ms, total: 903 ms
Wall time: 897 ms


In [58]:
%%time
players_dict,rh = epochsG(matches[b2[0]:b2[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 1.13 s, sys: 24.7 ms, total: 1.15 s
Wall time: 1.15 s


In [59]:
%%time
players_dict,rh = epochsG(matches[b3[0]:b3[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 1.13 s, sys: 27.5 ms, total: 1.16 s
Wall time: 1.16 s


In [60]:
%%time
players_dict,rh = epochsG(matches[b4[0]:b4[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 1.17 s, sys: 24.8 ms, total: 1.2 s
Wall time: 1.2 s


In [61]:
%%time
players_dict,rh = epochsG(matches[eg_batches_50[12][0]:eg_batches_50[12][1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 3.54 s, sys: 72.5 ms, total: 3.61 s
Wall time: 3.61 s


In [62]:
%%time
players_dict,rh = epochsG(matches[eg_batches_50[13][0]:eg_batches_50[13][1]],players_dict,365)
ratings_histories += [rh]

OverflowError
111786
<pyglicko2.glicko2.Player object at 0x7fee61af8b20>
ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 3.69 s, sys: 77.5 ms, total: 3.77 s
Wall time: 3.76 s


In [63]:
b = splitBatch(eg_batches_50[14],2); b0 = b[0]; b1 = b[1]
b

[(700000, 725000), (725000, 750000)]

In [64]:
%%time
players_dict,rh = epochsG(matches[b0[0]:b0[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 1.74 s, sys: 34.7 ms, total: 1.78 s
Wall time: 1.78 s


In [65]:
b_1 = splitBatch(b1,2)
b_10=b_1[0];b_11=b_1[1]

In [66]:
%%time
players_dict,rh = epochsG(matches[b_10[0]:b_10[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 1.28 s, sys: 28.6 ms, total: 1.31 s
Wall time: 1.31 s


In [67]:
%%time
players_dict,rh = epochsG(matches[b_11[0]:b_11[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 1.33 s, sys: 26 ms, total: 1.35 s
Wall time: 1.35 s


In [68]:
b = splitBatch(eg_batches_50[14],2); b0 = b[0]; b1 = b[1]
b

[(700000, 725000), (725000, 750000)]

In [69]:
%%time
players_dict,rh = epochsG(matches[b0[0]:b0[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 1.74 s, sys: 33 ms, total: 1.78 s
Wall time: 1.78 s


In [70]:
%%time
players_dict,rh = epochsG(matches[b1[0]:b1[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 1.87 s, sys: 34.9 ms, total: 1.9 s
Wall time: 1.9 s


In [71]:
b = splitBatch(eg_batches_50[15],4); b0 = b[0]; b1 = b[1];
b2 = b[2]; b3 = b[3]


In [72]:
%%time
players_dict,rh = epochsG(matches[b0[0]:b0[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 1.23 s, sys: 23.9 ms, total: 1.26 s
Wall time: 1.26 s


In [73]:
%%time
players_dict,rh = epochsG(matches[b1[0]:b1[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 1.33 s, sys: 28.4 ms, total: 1.36 s
Wall time: 1.36 s


In [74]:
%%time
players_dict,rh = epochsG(matches[b2[0]:b2[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 1.3 s, sys: 26 ms, total: 1.33 s
Wall time: 1.33 s


In [75]:
%%time
players_dict,rh = epochsG(matches[b3[0]:b3[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 1.08 s, sys: 21.7 ms, total: 1.1 s
Wall time: 1.1 s


In [76]:
%%time
players_dict,rh = epochsG(matches[eg_batches_50[16][0]:eg_batches_50[16][1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 3.66 s, sys: 73.9 ms, total: 3.73 s
Wall time: 3.73 s


In [77]:
b = splitBatch(eg_batches_50[17],4)
b0 = b[0]; b1 = b[1]
b

[(850000, 856745), (856745, 863490), (863490, 870235), (870235, 876979)]

In [78]:
%%time
players_dict,rh = epochsG(matches[b0[0]:b0[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 886 ms, sys: 19.3 ms, total: 905 ms
Wall time: 903 ms


In [79]:
b = splitBatch(eg_batches_50[17],4)
b0 = b[0]; b1 = b[1]; b2 = b[2]; b3 = b[3];
b

[(850000, 856745), (856745, 863490), (863490, 870235), (870235, 876979)]

In [80]:
%%time
players_dict,rh = epochsG(matches[b0[0]:b0[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 852 ms, sys: 21.4 ms, total: 874 ms
Wall time: 868 ms


In [81]:
%%time
players_dict,rh = epochsG(matches[b1[0]:b1[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 1.01 s, sys: 24.1 ms, total: 1.04 s
Wall time: 1.03 s


In [82]:
%%time
players_dict,rh = epochsG(matches[b2[0]:b2[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 1.04 s, sys: 22.1 ms, total: 1.06 s
Wall time: 1.06 s


In [83]:
%%time
players_dict,rh = epochsG(matches[b3[0]:b3[1]],players_dict,365)
ratings_histories += [rh]

ZeroDivisionError
105613
<pyglicko2.glicko2.Player object at 0x7fee706a7eb0>
CPU times: user 984 ms, sys: 20.3 ms, total: 1 s
Wall time: 1 s


With the ratings_histories accumulated, these will now be stored in a dataframe for timely access later on.

In [84]:
ratingsrdHistory = {}
for rh in ratings_histories:
    ratingsrdHistory.update(rh)

In [85]:
rdHistory = {key: ratingsrdHistory[key][1] for key in ratingsrdHistory.keys()}
ratingsHistory = {key: ratingsrdHistory[key][0] for key in ratingsrdHistory.keys()}

In [86]:
ratingsHistory_df = assembleDf(ratingsHistory)
rdHistory_df = assembleDf(rdHistory)

In [87]:
padRow = pd.DataFrame({col: 1500 for col in ratingsHistory_df.columns}, 
                      index = [pd.Timestamp('1877-07-09T00')])
padRowrd = pd.DataFrame({col: 350 for col in ratingsHistory_df.columns}, 
                      index = [pd.Timestamp('1877-07-09T00')])

In [88]:
ratingsHistory_df= pd.concat([padRow,ratingsHistory_df],axis=0)
rdHistory_df = pd.concat([padRowrd,rdHistory_df],axis=0)

Finally unknown ratings are filled with the default of 1500 and rating deviations with the defauslt of 350.  Any time gaps in ratings will be filled with the previously seen rating.


In [89]:
ratingsHistory_df = ratingsHistory_df.ffill(axis=0).fillna(1500)
rdHistory_df = rdHistory_df.ffill(axis=0).fillna(350)

In [93]:
# Store to dataframes for use in Glicko2Elo
ratingsHistory_df.to_csv('../Data/ratings_histories_glicko2.csv')
rdHistory_df.to_csv('../Data/rd_histories_glicko2.csv')