### Introduction to Preprocessing: DecayTime-Weighting Experiments

In this preprocessing step, we are looking to find good (if not optimal) puzzle day-specific decay function parameters for a feature capturing IS2 'recent' performance prior to a given solve. This means looking for a combo of number of past puzzles to include (e.g., 10,20,25, 50, 75, 100) and the decay weights (ranging from no decay weighting to a steep curve) for each of those past puzzles that yields the best univariate prediction of solve time (RMSE training error in minutes) over the 15x15 puzzle set. 

In [1]:
import pandas as pd
import numpy as np
import os
import pickle
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import __version__ as sklearn_version
from sklearn.preprocessing import scale
from sklearn.model_selection import train_test_split, cross_validate, GridSearchCV, learning_curve
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.dummy import DummyRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
from sklearn.feature_selection import SelectKBest, f_regression
import datetime
#from library.sb_utils import save_file

### Load Data

In [2]:
df = pd.read_csv('../../data/df3.csv')
df.head()

Unnamed: 0,P_Date,P_Date_str,IS2_Completed,Comp_Date,Comp_Date_str,DOW,DOW_num,Grid Size,IS2_ST(m),IS_pds_l10_ndw,...,Circle_Count,Shade_Count,Unusual_Sym,Black_Square_Fill,Outside_Grid,Unchecked_Sq,Uniclue,Duplicate_Answers,Quantum,Wordplay
0,2022-05-11 00:00:00,2022-05-11,1.0,2024-02-29 19:58:44,2024-02-29,Wednesday,4.0,1,8.166667,10.995,...,0,0,0,0,0,0,0,0,0,3.0
1,2022-05-18 00:00:00,2022-05-18,1.0,2024-02-29 17:34:25,2024-02-29,Wednesday,4.0,1,6.783333,11.678333,...,0,0,0,0,0,0,0,0,0,3.0
2,2024-02-28 00:00:00,2024-02-28,1.0,2024-02-28 18:02:10,2024-02-28,Wednesday,4.0,1,7.033333,11.625,...,0,0,0,0,0,0,0,0,0,3.0
3,2022-05-25 00:00:00,2022-05-25,1.0,2024-02-27 20:57:43,2024-02-27,Wednesday,4.0,1,11.75,11.531667,...,0,0,0,0,0,0,0,0,0,4.0
4,2022-06-01 00:00:00,2022-06-01,1.0,2024-02-24 21:13:46,2024-02-24,Wednesday,4.0,1,16.2,11.161667,...,16,0,0,0,0,0,0,0,0,1.0


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1230 entries, 0 to 1229
Data columns (total 43 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   P_Date                  1230 non-null   object 
 1   P_Date_str              1230 non-null   object 
 2   IS2_Completed           1228 non-null   float64
 3   Comp_Date               1230 non-null   object 
 4   Comp_Date_str           1230 non-null   object 
 5   DOW                     1230 non-null   object 
 6   DOW_num                 1230 non-null   float64
 7   Grid Size               1230 non-null   int64  
 8   IS2_ST(m)               1230 non-null   float64
 9   IS_pds_l10_ndw          1223 non-null   float64
 10  IS_pds_l10_stdev        1216 non-null   float64
 11  IS_pds_l10_ndw_SOS_adj  1223 non-null   float64
 12  GMST(m)                 1230 non-null   float64
 13  Constructors            1230 non-null   object 
 14  Words                   1230 non-null   

### Create Feature Variants for Testing

### Filter Data

In [4]:
# strip down df to just the columns we need to evaluate decay function and number of day-specific puzzles to include
df1 = df[['DOW', 'Comp_Date', 'Comp_Date_str', 'IS2_ST(m)']]

In [5]:
#Filter out Sunday
df1 =df1[df1["DOW"]!="Sunday"]

In [6]:
#Remove the first solve period (2018-2019) to calculate sample averages by day
df1 = df1[df1['Comp_Date_str'].str.contains("2020|2021|2022|2023|2024")]

In [7]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 979 entries, 0 to 1219
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   DOW            979 non-null    object 
 1   Comp_Date      979 non-null    object 
 2   Comp_Date_str  979 non-null    object 
 3   IS2_ST(m)      979 non-null    float64
dtypes: float64(1), object(3)
memory usage: 38.2+ KB


In [69]:
#IS_pds_l5_dw
#Provides decay-weighted(dw), puzzle day-specific (pds) mean solve time performance for IS2 over the previous 5 puzzles relative to a given puzzle
# Note that the sort is by completion date for IS2, as completion date was avaiable 
# Note also that, unlike the 10-puzzle moving average, this weighted average does NOT include the "puzzle at hand" itself

df1 = df1.sort_values(by=['DOW', 'Comp_Date'], ascending = False)

# Gradual decay
# w = np.arange(1,6)
# w = list(w)

# No decay
w = np.ones(5)
w = list(w)

df1["IS_pds_l5_dw_1"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1)*w[0]
df1["IS_pds_l5_dw_2"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2)*w[1]
df1["IS_pds_l5_dw_3"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3)*w[2]
df1["IS_pds_l5_dw_4"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4)*w[3]
df1["IS_pds_l5_dw_5"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5)*w[4]

df1["IS_pds_l5_dw_1_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1))*w[0]
df1["IS_pds_l5_dw_2_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2))*w[1]
df1["IS_pds_l5_dw_3_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3))*w[2]
df1["IS_pds_l5_dw_4_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4))*w[3]
df1["IS_pds_l5_dw_5_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5))*w[4]

df1["IS_pds_l5_ws"] = df1[["IS_pds_l5_dw_1", "IS_pds_l5_dw_2", "IS_pds_l5_dw_3", "IS_pds_l5_dw_4", "IS_pds_l5_dw_5"]].sum(axis=1)
df1["IS_pds_l5_ws_ct"] = df1[["IS_pds_l5_dw_1_ct", "IS_pds_l5_dw_2_ct", "IS_pds_l5_dw_3_ct", "IS_pds_l5_dw_4_ct", "IS_pds_l5_dw_5_ct"]].sum(axis=1)
df1["IS_pds_l5_dw"] = df1["IS_pds_l5_ws"]/df1["IS_pds_l5_ws_ct"]

# Deleting transient columns
df1 = df1.drop(["IS_pds_l5_dw_1", "IS_pds_l5_dw_2", "IS_pds_l5_dw_3", "IS_pds_l5_dw_4", "IS_pds_l5_dw_5", "IS_pds_l5_ws", "IS_pds_l5_dw_1_ct", "IS_pds_l5_dw_2_ct", "IS_pds_l5_dw_3_ct", "IS_pds_l5_dw_4_ct", "IS_pds_l5_dw_5_ct", "IS_pds_l5_ws_ct"], axis = 1)

In [136]:
#IS_pds_l7_dw
#Provides decay-weighted(dw), puzzle day-specific (pds) mean solve time performance for IS2 over the previous 7 puzzles relative to a given puzzle
# Note that the sort is by completion date for IS2, as completion date was avaiable 
# Note also that, unlike the 10-puzzle moving average, this weighted average does NOT include the "puzzle at hand" itself

df1 = df1.sort_values(by=['DOW', 'Comp_Date'], ascending = False)

# Gradual decay
# w = np.arange(1,8)
# w = list(w)

# No decay
w = np.ones(7)
w = list(w)

df1["IS_pds_l7_dw_1"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1)*w[0]
df1["IS_pds_l7_dw_2"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2)*w[1]
df1["IS_pds_l7_dw_3"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3)*w[2]
df1["IS_pds_l7_dw_4"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4)*w[3]
df1["IS_pds_l7_dw_5"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5)*w[4]
df1["IS_pds_l7_dw_6"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6)*w[5]
df1["IS_pds_l7_dw_7"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7)*w[6]

df1["IS_pds_l7_dw_1_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1))*w[0]
df1["IS_pds_l7_dw_2_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2))*w[1]
df1["IS_pds_l7_dw_3_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3))*w[2]
df1["IS_pds_l7_dw_4_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4))*w[3]
df1["IS_pds_l7_dw_5_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5))*w[4]
df1["IS_pds_l7_dw_6_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6))*w[5]
df1["IS_pds_l7_dw_7_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7))*w[6]


df1["IS_pds_l7_ws"] = df1[["IS_pds_l7_dw_1", "IS_pds_l7_dw_2", "IS_pds_l7_dw_3", "IS_pds_l7_dw_4", "IS_pds_l7_dw_5", "IS_pds_l7_dw_6", "IS_pds_l7_dw_7"]].sum(axis=1)
df1["IS_pds_l7_ws_ct"] = df1[["IS_pds_l7_dw_1_ct", "IS_pds_l7_dw_2_ct", "IS_pds_l7_dw_3_ct", "IS_pds_l7_dw_4_ct", "IS_pds_l7_dw_5_ct", "IS_pds_l7_dw_6_ct", "IS_pds_l7_dw_7_ct"]].sum(axis=1)
df1["IS_pds_l7_dw"] = df1["IS_pds_l7_ws"]/df1["IS_pds_l7_ws_ct"]

# Deleting transient columns
df1 = df1.drop(["IS_pds_l7_dw_1", "IS_pds_l7_dw_2", "IS_pds_l7_dw_3", "IS_pds_l7_dw_4", "IS_pds_l7_dw_5", "IS_pds_l7_dw_6", "IS_pds_l7_dw_7", "IS_pds_l7_ws", "IS_pds_l7_dw_1_ct", "IS_pds_l7_dw_2_ct", "IS_pds_l7_dw_3_ct", "IS_pds_l7_dw_4_ct", "IS_pds_l7_dw_5_ct", "IS_pds_l7_dw_6_ct", "IS_pds_l7_dw_7_ct", "IS_pds_l7_ws_ct"], axis = 1)

In [36]:
#IS_pds_l8_dw
#Provides decay-weighted(dw), puzzle day-specific (pds) mean solve time performance for IS2 over the previous 8 puzzles relative to a given puzzle
# Note that the sort is by completion date for IS2, as completion date was avaiable 
# Note also that, unlike the 10-puzzle moving average, this weighted average does NOT include the "puzzle at hand" itself

df1 = df1.sort_values(by=['DOW', 'Comp_Date'], ascending = False)

# Gradual decay
w = np.arange(1,9)
w = list(w)

# No decay
# w = np.ones(8)
# w = list(w)

# Partial decay
#w=[8,7,6,5,4,4,4,4]

# Partial decay v2
#w=[8,7,6,5,4,4,2,2]

# Partial decay v3
# w=[8,7,6,5,4,4,3,3]

# Partial decay v4
#w=[8,7,6,5,4,4,3,2]

# Partial decay v5
# w=[8,7,6,4,4,4,4,4]

# Partial decay v6
# w=[8,7,4,4,4,4,4,4]

# Partial decay v7
# w=[8,6,6,4,4,4,4,4]

df1["IS_pds_l8_dw_1"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1)*w[0]
df1["IS_pds_l8_dw_2"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2)*w[1]
df1["IS_pds_l8_dw_3"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3)*w[2]
df1["IS_pds_l8_dw_4"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4)*w[3]
df1["IS_pds_l8_dw_5"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5)*w[4]
df1["IS_pds_l8_dw_6"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6)*w[5]
df1["IS_pds_l8_dw_7"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7)*w[6]
df1["IS_pds_l8_dw_8"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8)*w[7]

df1["IS_pds_l8_dw_1_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1))*w[0]
df1["IS_pds_l8_dw_2_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2))*w[1]
df1["IS_pds_l8_dw_3_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3))*w[2]
df1["IS_pds_l8_dw_4_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4))*w[3]
df1["IS_pds_l8_dw_5_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5))*w[4]
df1["IS_pds_l8_dw_6_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6))*w[5]
df1["IS_pds_l8_dw_7_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7))*w[6]
df1["IS_pds_l8_dw_8_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8))*w[7]

df1["IS_pds_l8_ws"] = df1[["IS_pds_l8_dw_1", "IS_pds_l8_dw_2", "IS_pds_l8_dw_3", "IS_pds_l8_dw_4", "IS_pds_l8_dw_5", "IS_pds_l8_dw_6", "IS_pds_l8_dw_7", "IS_pds_l8_dw_8"]].sum(axis=1)
df1["IS_pds_l8_ws_ct"] = df1[["IS_pds_l8_dw_1_ct", "IS_pds_l8_dw_2_ct", "IS_pds_l8_dw_3_ct", "IS_pds_l8_dw_4_ct", "IS_pds_l8_dw_5_ct", "IS_pds_l8_dw_6_ct", "IS_pds_l8_dw_7_ct", "IS_pds_l8_dw_8_ct"]].sum(axis=1)
df1["IS_pds_l8_dw"] = df1["IS_pds_l8_ws"]/df1["IS_pds_l8_ws_ct"]

# Deleting transient columns
df1 = df1.drop(["IS_pds_l8_dw_1", "IS_pds_l8_dw_2", "IS_pds_l8_dw_3", "IS_pds_l8_dw_4", "IS_pds_l8_dw_5", "IS_pds_l8_dw_6", "IS_pds_l8_dw_7", "IS_pds_l8_dw_8", "IS_pds_l8_ws", "IS_pds_l8_dw_1_ct", "IS_pds_l8_dw_2_ct", "IS_pds_l8_dw_3_ct", "IS_pds_l8_dw_4_ct", "IS_pds_l8_dw_5_ct", "IS_pds_l8_dw_6_ct", "IS_pds_l8_dw_7_ct", "IS_pds_l8_dw_8_ct", "IS_pds_l8_ws_ct"], axis = 1)

In [92]:
#IS_pds_l9_dw
#Provides decay-weighted(dw), puzzle day-specific (pds) mean solve time performance for IS2 over the previous 9 puzzles relative to a given puzzle
# Note that the sort is by completion date for IS2, as completion date was avaiable 
# Note also that, unlike the 10-puzzle moving average, this weighted average does NOT include the "puzzle at hand" itself

df1 = df1.sort_values(by=['DOW', 'Comp_Date'], ascending = False)

# Gradual decay
# w = np.arange(1,10)
# w = list(w)

# No decay
w = np.ones(9)
w = list(w)

df1["IS_pds_l9_dw_1"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1)*w[0]
df1["IS_pds_l9_dw_2"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2)*w[1]
df1["IS_pds_l9_dw_3"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3)*w[2]
df1["IS_pds_l9_dw_4"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4)*w[3]
df1["IS_pds_l9_dw_5"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5)*w[4]
df1["IS_pds_l9_dw_6"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6)*w[5]
df1["IS_pds_l9_dw_7"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7)*w[6]
df1["IS_pds_l9_dw_8"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8)*w[7]
df1["IS_pds_l9_dw_9"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-9)*w[8]

df1["IS_pds_l9_dw_1_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1))*w[0]
df1["IS_pds_l9_dw_2_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2))*w[1]
df1["IS_pds_l9_dw_3_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3))*w[2]
df1["IS_pds_l9_dw_4_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4))*w[3]
df1["IS_pds_l9_dw_5_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5))*w[4]
df1["IS_pds_l9_dw_6_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6))*w[5]
df1["IS_pds_l9_dw_7_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7))*w[6]
df1["IS_pds_l9_dw_8_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8))*w[7]
df1["IS_pds_l9_dw_9_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-9)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-9))*w[8]

df1["IS_pds_l9_ws"] = df1[["IS_pds_l9_dw_1", "IS_pds_l9_dw_2", "IS_pds_l9_dw_3", "IS_pds_l9_dw_4", "IS_pds_l9_dw_5", "IS_pds_l9_dw_6", "IS_pds_l9_dw_7", "IS_pds_l9_dw_8", "IS_pds_l9_dw_9"]].sum(axis=1)
df1["IS_pds_l9_ws_ct"] = df1[["IS_pds_l9_dw_1_ct", "IS_pds_l9_dw_2_ct", "IS_pds_l9_dw_3_ct", "IS_pds_l9_dw_4_ct", "IS_pds_l9_dw_5_ct", "IS_pds_l9_dw_6_ct", "IS_pds_l9_dw_7_ct", "IS_pds_l9_dw_8_ct", "IS_pds_l9_dw_9_ct"]].sum(axis=1)
df1["IS_pds_l9_dw"] = df1["IS_pds_l9_ws"]/df1["IS_pds_l9_ws_ct"]

# Deleting transient columns
df1 = df1.drop(["IS_pds_l9_dw_1", "IS_pds_l9_dw_2", "IS_pds_l9_dw_3", "IS_pds_l9_dw_4", "IS_pds_l9_dw_5", "IS_pds_l9_dw_6", "IS_pds_l9_dw_7", "IS_pds_l9_dw_8", "IS_pds_l9_dw_9", "IS_pds_l9_ws", "IS_pds_l9_dw_1_ct", "IS_pds_l9_dw_2_ct", "IS_pds_l9_dw_3_ct", "IS_pds_l9_dw_4_ct", "IS_pds_l9_dw_5_ct", "IS_pds_l9_dw_6_ct", "IS_pds_l9_dw_7_ct", "IS_pds_l9_dw_8_ct", "IS_pds_l9_dw_9_ct", "IS_pds_l9_ws_ct"], axis = 1)

In [148]:
#IS_pds_l10_dw
#Provides decay-weighted(dw), puzzle day-specific (pds) mean solve time performance for IS2 over the previous 10 puzzles relative to a given puzzle
# Note that the sort is by completion date for IS2, as completion date was avaiable 
# Note also that, unlike the 10-puzzle moving average, this weighted average does NOT include the "puzzle at hand" itself

df1 = df1.sort_values(by=['DOW', 'Comp_Date'], ascending = False)

# Gradual decay
# w = np.arange(1,11)
# w = list(w)

# No decay
w = np.ones(10)
w = list(w)

df1["IS_pds_l10_dw_1"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1)*w[0]
df1["IS_pds_l10_dw_2"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2)*w[1]
df1["IS_pds_l10_dw_3"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3)*w[2]
df1["IS_pds_l10_dw_4"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4)*w[3]
df1["IS_pds_l10_dw_5"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5)*w[4]
df1["IS_pds_l10_dw_6"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6)*w[5]
df1["IS_pds_l10_dw_7"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7)*w[6]
df1["IS_pds_l10_dw_8"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8)*w[7]
df1["IS_pds_l10_dw_9"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-9)*w[8]
df1["IS_pds_l10_dw_10"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-10)*w[9]

df1["IS_pds_l10_dw_1_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1))*w[0]
df1["IS_pds_l10_dw_2_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2))*w[1]
df1["IS_pds_l10_dw_3_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3))*w[2]
df1["IS_pds_l10_dw_4_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4))*w[3]
df1["IS_pds_l10_dw_5_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5))*w[4]
df1["IS_pds_l10_dw_6_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6))*w[5]
df1["IS_pds_l10_dw_7_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7))*w[6]
df1["IS_pds_l10_dw_8_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8))*w[7]
df1["IS_pds_l10_dw_9_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-9)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-9))*w[8]
df1["IS_pds_l10_dw_10_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-10)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-10))*w[9]

df1["IS_pds_l10_ws"] = df1[["IS_pds_l10_dw_1", "IS_pds_l10_dw_2", "IS_pds_l10_dw_3", "IS_pds_l10_dw_4", "IS_pds_l10_dw_5", "IS_pds_l10_dw_6", "IS_pds_l10_dw_7", "IS_pds_l10_dw_8", "IS_pds_l10_dw_9", "IS_pds_l10_dw_10"]].sum(axis=1)
df1["IS_pds_l10_ws_ct"] = df1[["IS_pds_l10_dw_1_ct", "IS_pds_l10_dw_2_ct", "IS_pds_l10_dw_3_ct", "IS_pds_l10_dw_4_ct", "IS_pds_l10_dw_5_ct", "IS_pds_l10_dw_6_ct", "IS_pds_l10_dw_7_ct", "IS_pds_l10_dw_8_ct", "IS_pds_l10_dw_9_ct", "IS_pds_l10_dw_10_ct"]].sum(axis=1)
df1["IS_pds_l10_dw"] = df1["IS_pds_l10_ws"]/df1["IS_pds_l10_ws_ct"]

# Deleting transient columns
df1 = df1.drop(["IS_pds_l10_dw_1", "IS_pds_l10_dw_2", "IS_pds_l10_dw_3", "IS_pds_l10_dw_4", "IS_pds_l10_dw_5", "IS_pds_l10_dw_6", "IS_pds_l10_dw_7", "IS_pds_l10_dw_8", "IS_pds_l10_dw_9", "IS_pds_l10_dw_10", "IS_pds_l10_ws", "IS_pds_l10_dw_1_ct", "IS_pds_l10_dw_2_ct", "IS_pds_l10_dw_3_ct", "IS_pds_l10_dw_4_ct", "IS_pds_l10_dw_5_ct", "IS_pds_l10_dw_6_ct", "IS_pds_l10_dw_7_ct", "IS_pds_l10_dw_8_ct", "IS_pds_l10_dw_9_ct", "IS_pds_l10_dw_10_ct", "IS_pds_l10_ws_ct"], axis = 1)

In [232]:
# #IS_pds_l11_dw
# #Provides decay-weighted(dw), puzzle day-specific (pds) mean solve time performance for IS2 over the previous 12 puzzles relative to a given puzzle
# # Note that the sort is by completion date for IS2, as completion date was avaiable 
# # Note also that, unlike the 10-puzzle moving average, this weighted average does NOT include the "puzzle at hand" itself

df1 = df1.sort_values(by=['DOW', 'Comp_Date'], ascending = False)

# Gradual decay
# w = np.arange(1,12)
# w = list(w)

# No decay
w = np.ones(11)
w = list(w)

df1["IS_pds_l11_dw_1"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1)*w[0]
df1["IS_pds_l11_dw_2"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2)*w[1]
df1["IS_pds_l11_dw_3"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3)*w[2]
df1["IS_pds_l11_dw_4"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4)*w[3]
df1["IS_pds_l11_dw_5"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5)*w[4]
df1["IS_pds_l11_dw_6"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6)*w[5]
df1["IS_pds_l11_dw_7"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7)*w[6]
df1["IS_pds_l11_dw_8"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8)*w[7]
df1["IS_pds_l11_dw_9"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-9)*w[8]
df1["IS_pds_l11_dw_10"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-10)*w[9]
df1["IS_pds_l11_dw_11"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-11)*w[10]

df1["IS_pds_l11_dw_1_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1))*w[0]
df1["IS_pds_l11_dw_2_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2))*w[1]
df1["IS_pds_l11_dw_3_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3))*w[2]
df1["IS_pds_l11_dw_4_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4))*w[3]
df1["IS_pds_l11_dw_5_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5))*w[4]
df1["IS_pds_l11_dw_6_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6))*w[5]
df1["IS_pds_l11_dw_7_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7))*w[6]
df1["IS_pds_l11_dw_8_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8))*w[7]
df1["IS_pds_l11_dw_9_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-9)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-9))*w[8]
df1["IS_pds_l11_dw_10_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-10)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-10))*w[9]
df1["IS_pds_l11_dw_11_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-11)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-11))*w[10]

df1["IS_pds_l11_ws"] = df1[["IS_pds_l11_dw_1", "IS_pds_l11_dw_2", "IS_pds_l11_dw_3", "IS_pds_l11_dw_4", "IS_pds_l11_dw_5", "IS_pds_l11_dw_6", "IS_pds_l11_dw_7", "IS_pds_l11_dw_8", "IS_pds_l11_dw_9", "IS_pds_l11_dw_10", "IS_pds_l11_dw_11"]].sum(axis=1)
df1["IS_pds_l11_ws_ct"] = df1[["IS_pds_l11_dw_1_ct", "IS_pds_l11_dw_2_ct", "IS_pds_l11_dw_3_ct", "IS_pds_l11_dw_4_ct", "IS_pds_l11_dw_5_ct", "IS_pds_l11_dw_6_ct", "IS_pds_l11_dw_7_ct", "IS_pds_l11_dw_8_ct", "IS_pds_l11_dw_9_ct", "IS_pds_l11_dw_10_ct", "IS_pds_l11_dw_11_ct"]].sum(axis=1)
df1["IS_pds_l11_dw"] = df1["IS_pds_l11_ws"]/df1["IS_pds_l11_ws_ct"]

# Deleting transient columns
df1 = df1.drop(["IS_pds_l11_dw_1", "IS_pds_l11_dw_2", "IS_pds_l11_dw_3", "IS_pds_l11_dw_4", "IS_pds_l11_dw_5", "IS_pds_l11_dw_6", "IS_pds_l11_dw_7", "IS_pds_l11_dw_8", "IS_pds_l11_dw_9", "IS_pds_l11_dw_10", "IS_pds_l11_dw_11", "IS_pds_l11_dw_1_ct", "IS_pds_l11_dw_2_ct", "IS_pds_l11_dw_3_ct", "IS_pds_l11_dw_4_ct", "IS_pds_l11_dw_5_ct", "IS_pds_l11_dw_6_ct", "IS_pds_l11_dw_7_ct", "IS_pds_l11_dw_8_ct", "IS_pds_l11_dw_9_ct", "IS_pds_l11_dw_10_ct", "IS_pds_l11_dw_11_ct", "IS_pds_l11_ws", "IS_pds_l11_ws_ct"], axis = 1)

In [204]:
# #IS_pds_l12_dw
# #Provides decay-weighted(dw), puzzle day-specific (pds) mean solve time performance for IS2 over the previous 12 puzzles relative to a given puzzle
# # Note that the sort is by completion date for IS2, as completion date was avaiable 
# # Note also that, unlike the 10-puzzle moving average, this weighted average does NOT include the "puzzle at hand" itself

df1 = df1.sort_values(by=['DOW', 'Comp_Date'], ascending = False)

# Gradual decay
# w = np.arange(1,13)
# w = list(w)

# No decay
w = np.ones(12)
w = list(w)

df1["IS_pds_l12_dw_1"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1)*w[0]
df1["IS_pds_l12_dw_2"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2)*w[1]
df1["IS_pds_l12_dw_3"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3)*w[2]
df1["IS_pds_l12_dw_4"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4)*w[3]
df1["IS_pds_l12_dw_5"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5)*w[4]
df1["IS_pds_l12_dw_6"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6)*w[5]
df1["IS_pds_l12_dw_7"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7)*w[6]
df1["IS_pds_l12_dw_8"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8)*w[7]
df1["IS_pds_l12_dw_9"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-9)*w[8]
df1["IS_pds_l12_dw_10"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-10)*w[9]
df1["IS_pds_l12_dw_11"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-11)*w[10]
df1["IS_pds_l12_dw_12"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-12)*w[11]

df1["IS_pds_l12_dw_1_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1))*w[0]
df1["IS_pds_l12_dw_2_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2))*w[1]
df1["IS_pds_l12_dw_3_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3))*w[2]
df1["IS_pds_l12_dw_4_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4))*w[3]
df1["IS_pds_l12_dw_5_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5))*w[4]
df1["IS_pds_l12_dw_6_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6))*w[5]
df1["IS_pds_l12_dw_7_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7))*w[6]
df1["IS_pds_l12_dw_8_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8))*w[7]
df1["IS_pds_l12_dw_9_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-9)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-9))*w[8]
df1["IS_pds_l12_dw_10_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-10)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-10))*w[9]
df1["IS_pds_l12_dw_11_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-11)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-11))*w[10]
df1["IS_pds_l12_dw_12_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-12)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-12))*w[11]

df1["IS_pds_l12_ws"] = df1[["IS_pds_l12_dw_1", "IS_pds_l12_dw_2", "IS_pds_l12_dw_3", "IS_pds_l12_dw_4", "IS_pds_l12_dw_5", "IS_pds_l12_dw_6", "IS_pds_l12_dw_7", "IS_pds_l12_dw_8", "IS_pds_l12_dw_9", "IS_pds_l12_dw_10", "IS_pds_l12_dw_11", "IS_pds_l12_dw_12"]].sum(axis=1)
df1["IS_pds_l12_ws_ct"] = df1[["IS_pds_l12_dw_1_ct", "IS_pds_l12_dw_2_ct", "IS_pds_l12_dw_3_ct", "IS_pds_l12_dw_4_ct", "IS_pds_l12_dw_5_ct", "IS_pds_l12_dw_6_ct", "IS_pds_l12_dw_7_ct", "IS_pds_l12_dw_8_ct", "IS_pds_l12_dw_9_ct", "IS_pds_l12_dw_10_ct", "IS_pds_l12_dw_11_ct", "IS_pds_l12_dw_12_ct"]].sum(axis=1)
df1["IS_pds_l12_dw"] = df1["IS_pds_l12_ws"]/df1["IS_pds_l12_ws_ct"]

# Deleting transient columns
df1 = df1.drop(["IS_pds_l12_dw_1", "IS_pds_l12_dw_2", "IS_pds_l12_dw_3", "IS_pds_l12_dw_4", "IS_pds_l12_dw_5", "IS_pds_l12_dw_6", "IS_pds_l12_dw_7", "IS_pds_l12_dw_8", "IS_pds_l12_dw_9", "IS_pds_l12_dw_10", "IS_pds_l12_dw_11", "IS_pds_l12_dw_12", "IS_pds_l12_dw_1_ct", "IS_pds_l12_dw_2_ct", "IS_pds_l12_dw_3_ct", "IS_pds_l12_dw_4_ct", "IS_pds_l12_dw_5_ct", "IS_pds_l12_dw_6_ct", "IS_pds_l12_dw_7_ct", "IS_pds_l12_dw_8_ct", "IS_pds_l12_dw_9_ct", "IS_pds_l12_dw_10_ct", "IS_pds_l12_dw_11_ct", "IS_pds_l12_dw_12_ct", "IS_pds_l12_ws", "IS_pds_l12_ws_ct"], axis = 1)

In [176]:
# #IS_pds_l15_dw
# #Provides decay-weighted(dw), puzzle day-specific (pds) mean solve time performance for IS2 over the previous 15 puzzles relative to a given puzzle
# # Note that the sort is by completion date for IS2, as completion date was avaiable 
# # Note also that, unlike the 10-puzzle moving average, this weighted average does NOT include the "puzzle at hand" itself

df1 = df1.sort_values(by=['DOW', 'Comp_Date'], ascending = False)

# Gradual decay
# w = np.arange(1,16)
# w = list(w)

# No decay
w = np.ones(15)
w = list(w)

df1["IS_pds_l15_dw_1"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1)*w[0]
df1["IS_pds_l15_dw_2"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2)*w[1]
df1["IS_pds_l15_dw_3"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3)*w[2]
df1["IS_pds_l15_dw_4"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4)*w[3]
df1["IS_pds_l15_dw_5"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5)*w[4]
df1["IS_pds_l15_dw_6"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6)*w[5]
df1["IS_pds_l15_dw_7"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7)*w[6]
df1["IS_pds_l15_dw_8"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8)*w[7]
df1["IS_pds_l15_dw_9"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-9)*w[8]
df1["IS_pds_l15_dw_10"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-10)*w[9]
df1["IS_pds_l15_dw_11"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-11)*w[10]
df1["IS_pds_l15_dw_12"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-12)*w[11]
df1["IS_pds_l15_dw_13"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-13)*w[12]
df1["IS_pds_l15_dw_14"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-14)*w[13]
df1["IS_pds_l15_dw_15"] = df1.groupby(['DOW'])['IS2_ST(m)'].shift(-15)*w[14]

df1["IS_pds_l15_dw_1_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-1))*w[0]
df1["IS_pds_l15_dw_2_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-2))*w[1]
df1["IS_pds_l15_dw_3_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-3))*w[2]
df1["IS_pds_l15_dw_4_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-4))*w[3]
df1["IS_pds_l15_dw_5_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-5))*w[4]
df1["IS_pds_l15_dw_6_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-6))*w[5]
df1["IS_pds_l15_dw_7_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-7))*w[6]
df1["IS_pds_l15_dw_8_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-8))*w[7]
df1["IS_pds_l15_dw_9_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-9)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-9))*w[8]
df1["IS_pds_l15_dw_10_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-10)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-10))*w[9]
df1["IS_pds_l15_dw_11_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-11)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-11))*w[10]
df1["IS_pds_l15_dw_12_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-12)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-12))*w[11]
df1["IS_pds_l15_dw_13_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-13)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-13))*w[12]
df1["IS_pds_l15_dw_14_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-14)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-14))*w[13]
df1["IS_pds_l15_dw_15_ct"] = (df1.groupby(['DOW'])['IS2_ST(m)'].shift(-15)/df1.groupby(['DOW'])['IS2_ST(m)'].shift(-15))*w[14]

df1["IS_pds_l15_ws"] = df1[["IS_pds_l15_dw_1", "IS_pds_l15_dw_2", "IS_pds_l15_dw_3", "IS_pds_l15_dw_4", "IS_pds_l15_dw_5", "IS_pds_l15_dw_6", "IS_pds_l15_dw_7", "IS_pds_l15_dw_8", "IS_pds_l15_dw_9", "IS_pds_l15_dw_10", "IS_pds_l15_dw_11", "IS_pds_l15_dw_12", "IS_pds_l15_dw_13", "IS_pds_l15_dw_14", "IS_pds_l15_dw_15"]].sum(axis=1)
df1["IS_pds_l15_ws_ct"] = df1[["IS_pds_l15_dw_1_ct", "IS_pds_l15_dw_2_ct", "IS_pds_l15_dw_3_ct", "IS_pds_l15_dw_4_ct", "IS_pds_l15_dw_5_ct", "IS_pds_l15_dw_6_ct", "IS_pds_l15_dw_7_ct", "IS_pds_l15_dw_8_ct", "IS_pds_l15_dw_9_ct", "IS_pds_l15_dw_10_ct", "IS_pds_l15_dw_11_ct", "IS_pds_l15_dw_12_ct", "IS_pds_l15_dw_13_ct", "IS_pds_l15_dw_14_ct", "IS_pds_l15_dw_15_ct"]].sum(axis=1)
df1["IS_pds_l15_dw"] = df1["IS_pds_l15_ws"]/df1["IS_pds_l15_ws_ct"]

# Deleting transient columns
df1 = df1.drop(["IS_pds_l15_dw_1", "IS_pds_l15_dw_2", "IS_pds_l15_dw_3", "IS_pds_l15_dw_4", "IS_pds_l15_dw_5", "IS_pds_l15_dw_6", "IS_pds_l15_dw_7", "IS_pds_l15_dw_8", "IS_pds_l15_dw_9", "IS_pds_l15_dw_10", "IS_pds_l15_dw_11", "IS_pds_l15_dw_12", "IS_pds_l15_dw_13", "IS_pds_l15_dw_14", "IS_pds_l15_dw_15", "IS_pds_l15_dw_1_ct", "IS_pds_l15_dw_2_ct", "IS_pds_l15_dw_3_ct", "IS_pds_l15_dw_4_ct", "IS_pds_l15_dw_5_ct", "IS_pds_l15_dw_6_ct", "IS_pds_l15_dw_7_ct", "IS_pds_l15_dw_8_ct", "IS_pds_l15_dw_9_ct", "IS_pds_l15_dw_10_ct", "IS_pds_l15_dw_11_ct", "IS_pds_l15_dw_12_ct", "IS_pds_l15_dw_13_ct", "IS_pds_l15_dw_14_ct", "IS_pds_l15_dw_15_ct", "IS_pds_l15_ws", "IS_pds_l15_ws_ct"], axis = 1)

In [104]:
# #IS_pds_l20_dw
# #Provides decay-weighted(dw), puzzle day-specific (pds) mean solve time performance for IS1 over the previous 20 puzzles relative to a given puzzle
# # Note that the sort is by completion date for IS1, as completion date was avaiable 
# # Note also that, unlike the 10-puzzle moving average, this weighted average does NOT include the "puzzle at hand" itself

df1 = df1.sort_values(by=['DOW', 'Comp_Date'], ascending = False)

# Gradual decay
# w = np.arange(1,21)
# w = list(w)

# No decay
w = np.ones(20)
w = list(w)

df1["IS_pds_l20_dw_1"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-1)*w[0]
df1["IS_pds_l20_dw_2"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-2)*w[1]
df1["IS_pds_l20_dw_3"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-3)*w[2]
df1["IS_pds_l20_dw_4"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-4)*w[3]
df1["IS_pds_l20_dw_5"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-5)*w[4]
df1["IS_pds_l20_dw_6"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-6)*w[5]
df1["IS_pds_l20_dw_7"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-7)*w[6]
df1["IS_pds_l20_dw_8"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-8)*w[7]
df1["IS_pds_l20_dw_9"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-9)*w[8]
df1["IS_pds_l20_dw_10"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-10)*w[9]
df1["IS_pds_l20_dw_11"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-11)*w[10]
df1["IS_pds_l20_dw_12"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-12)*w[11]
df1["IS_pds_l20_dw_13"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-13)*w[12]
df1["IS_pds_l20_dw_14"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-14)*w[13]
df1["IS_pds_l20_dw_15"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-15)*w[14]
df1["IS_pds_l20_dw_16"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-16)*w[15]
df1["IS_pds_l20_dw_17"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-17)*w[16]
df1["IS_pds_l20_dw_18"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-18)*w[17]
df1["IS_pds_l20_dw_19"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-19)*w[18]
df1["IS_pds_l20_dw_20"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-20)*w[19]

df1["IS_pds_l20_dw_1_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-1)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-1))*w[0]
df1["IS_pds_l20_dw_2_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-2)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-2))*w[1]
df1["IS_pds_l20_dw_3_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-3)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-3))*w[2]
df1["IS_pds_l20_dw_4_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-4)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-4))*w[3]
df1["IS_pds_l20_dw_5_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-5)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-5))*w[4]
df1["IS_pds_l20_dw_6_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-6)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-6))*w[5]
df1["IS_pds_l20_dw_7_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-7)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-7))*w[6]
df1["IS_pds_l20_dw_8_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-8)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-8))*w[7]
df1["IS_pds_l20_dw_9_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-9)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-9))*w[8]
df1["IS_pds_l20_dw_10_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-10)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-10))*w[9]
df1["IS_pds_l20_dw_11_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-11)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-11))*w[10]
df1["IS_pds_l20_dw_12_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-12)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-12))*w[11]
df1["IS_pds_l20_dw_13_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-13)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-13))*w[12]
df1["IS_pds_l20_dw_14_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-14)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-14))*w[13]
df1["IS_pds_l20_dw_15_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-15)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-15))*w[14]
df1["IS_pds_l20_dw_16_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-16)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-16))*w[15]
df1["IS_pds_l20_dw_17_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-17)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-17))*w[16]
df1["IS_pds_l20_dw_18_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-18)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-18))*w[17]
df1["IS_pds_l20_dw_19_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-19)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-19))*w[18]
df1["IS_pds_l20_dw_20_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-20)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-20))*w[19]

df1["IS_pds_l20_ws"] = df1[["IS_pds_l20_dw_1", "IS_pds_l20_dw_2", "IS_pds_l20_dw_3", "IS_pds_l20_dw_4", "IS_pds_l20_dw_5", "IS_pds_l20_dw_6", "IS_pds_l20_dw_7", "IS_pds_l20_dw_8", "IS_pds_l20_dw_9", "IS_pds_l20_dw_10", "IS_pds_l20_dw_11", "IS_pds_l20_dw_12", "IS_pds_l20_dw_13", "IS_pds_l20_dw_14", "IS_pds_l20_dw_15", "IS_pds_l20_dw_16", "IS_pds_l20_dw_17", "IS_pds_l20_dw_18", "IS_pds_l20_dw_19", "IS_pds_l20_dw_20"]].sum(axis=1)
df1["IS_pds_l20_ws_ct"] = df1[["IS_pds_l20_dw_1_ct", "IS_pds_l20_dw_2_ct", "IS_pds_l20_dw_3_ct", "IS_pds_l20_dw_4_ct", "IS_pds_l20_dw_5_ct", "IS_pds_l20_dw_6_ct", "IS_pds_l20_dw_7_ct", "IS_pds_l20_dw_8_ct", "IS_pds_l20_dw_9_ct", "IS_pds_l20_dw_10_ct", "IS_pds_l20_dw_11_ct", "IS_pds_l20_dw_12_ct", "IS_pds_l20_dw_13_ct", "IS_pds_l20_dw_14_ct", "IS_pds_l20_dw_15_ct", "IS_pds_l20_dw_16_ct", "IS_pds_l20_dw_17_ct", "IS_pds_l20_dw_18_ct", "IS_pds_l20_dw_19_ct", "IS_pds_l20_dw_20_ct"]].sum(axis=1)
df1["IS_pds_l20_dw"] = df1["IS_pds_l20_ws"]/df1["IS_pds_l20_ws_ct"]

# Deleting transient columns
df1 = df1.drop(["IS_pds_l20_dw_1", "IS_pds_l20_dw_2", "IS_pds_l20_dw_3", "IS_pds_l20_dw_4", "IS_pds_l20_dw_5", "IS_pds_l20_dw_6", "IS_pds_l20_dw_7", "IS_pds_l20_dw_8", "IS_pds_l20_dw_9", "IS_pds_l20_dw_10", "IS_pds_l20_dw_11", "IS_pds_l20_dw_12", "IS_pds_l20_dw_13", "IS_pds_l20_dw_14", "IS_pds_l20_dw_15", "IS_pds_l20_dw_16", "IS_pds_l20_dw_17", "IS_pds_l20_dw_18", "IS_pds_l20_dw_19", "IS_pds_l20_dw_20", "IS_pds_l20_dw_1_ct", "IS_pds_l20_dw_2_ct", "IS_pds_l20_dw_3_ct", "IS_pds_l20_dw_4_ct", "IS_pds_l20_dw_5_ct", "IS_pds_l20_dw_6_ct", "IS_pds_l20_dw_7_ct", "IS_pds_l20_dw_8_ct", "IS_pds_l20_dw_9_ct", "IS_pds_l20_dw_10_ct", "IS_pds_l20_dw_11_ct", "IS_pds_l20_dw_12_ct", "IS_pds_l20_dw_13_ct", "IS_pds_l20_dw_14_ct", "IS_pds_l20_dw_15_ct", "IS_pds_l20_dw_16_ct", "IS_pds_l20_dw_17_ct", "IS_pds_l20_dw_18_ct", "IS_pds_l20_dw_19_ct", "IS_pds_l20_dw_20_ct", "IS_pds_l20_ws", "IS_pds_l20_ws_ct"], axis = 1)

In [9]:
#IS_pds_l25_dw
#Provides decay-weighted(dw), puzzle day-specific (pds) mean solve time performance for IS1 over the previous 25 puzzles relative to a given puzzle
# Note that the sort is by completion date for IS1, as completion date was avaiable 
# Note also that, unlike the 10-puzzle moving average, this weighted average does NOT include the "puzzle at hand" itself

df1 = df1.sort_values(by=['DOW', 'Comp_Date'], ascending = False)

# Gradual decay
w = np.arange(1,26)
w = list(w)

# No decay
# w = np.ones(25)
# w = list(w)

df1["IS_pds_l25_dw_1"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-1)*w[0]
df1["IS_pds_l25_dw_2"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-2)*w[1]
df1["IS_pds_l25_dw_3"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-3)*w[2]
df1["IS_pds_l25_dw_4"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-4)*w[3]
df1["IS_pds_l25_dw_5"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-5)*w[4]
df1["IS_pds_l25_dw_6"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-6)*w[5]
df1["IS_pds_l25_dw_7"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-7)*w[6]
df1["IS_pds_l25_dw_8"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-8)*w[7]
df1["IS_pds_l25_dw_9"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-9)*w[8]
df1["IS_pds_l25_dw_10"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-10)*w[9]
df1["IS_pds_l25_dw_11"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-11)*w[10]
df1["IS_pds_l25_dw_12"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-12)*w[11]
df1["IS_pds_l25_dw_13"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-13)*w[12]
df1["IS_pds_l25_dw_14"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-14)*w[13]
df1["IS_pds_l25_dw_15"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-15)*w[14]
df1["IS_pds_l25_dw_16"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-16)*w[15]
df1["IS_pds_l25_dw_17"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-17)*w[16]
df1["IS_pds_l25_dw_18"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-18)*w[17]
df1["IS_pds_l25_dw_19"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-19)*w[18]
df1["IS_pds_l25_dw_20"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-20)*w[19]
df1["IS_pds_l25_dw_21"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-21)*w[20]
df1["IS_pds_l25_dw_22"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-22)*w[21]
df1["IS_pds_l25_dw_23"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-23)*w[22]
df1["IS_pds_l25_dw_24"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-24)*w[23]
df1["IS_pds_l25_dw_25"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-25)*w[24]

df1["IS_pds_l25_dw_1_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-1)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-1))*w[0]
df1["IS_pds_l25_dw_2_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-2)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-2))*w[1]
df1["IS_pds_l25_dw_3_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-3)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-3))*w[2]
df1["IS_pds_l25_dw_4_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-4)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-4))*w[3]
df1["IS_pds_l25_dw_5_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-5)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-5))*w[4]
df1["IS_pds_l25_dw_6_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-6)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-6))*w[5]
df1["IS_pds_l25_dw_7_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-7)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-7))*w[6]
df1["IS_pds_l25_dw_8_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-8)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-8))*w[7]
df1["IS_pds_l25_dw_9_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-9)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-9))*w[8]
df1["IS_pds_l25_dw_10_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-10)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-10))*w[9]
df1["IS_pds_l25_dw_11_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-11)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-11))*w[10]
df1["IS_pds_l25_dw_12_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-12)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-12))*w[11]
df1["IS_pds_l25_dw_13_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-13)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-13))*w[12]
df1["IS_pds_l25_dw_14_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-14)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-14))*w[13]
df1["IS_pds_l25_dw_15_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-15)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-15))*w[14]
df1["IS_pds_l25_dw_16_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-16)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-16))*w[15]
df1["IS_pds_l25_dw_17_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-17)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-17))*w[16]
df1["IS_pds_l25_dw_18_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-18)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-18))*w[17]
df1["IS_pds_l25_dw_19_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-19)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-19))*w[18]
df1["IS_pds_l25_dw_20_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-20)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-20))*w[19]
df1["IS_pds_l25_dw_21_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-21)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-21))*w[20]
df1["IS_pds_l25_dw_22_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-22)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-22))*w[21]
df1["IS_pds_l25_dw_23_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-23)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-23))*w[22]
df1["IS_pds_l25_dw_24_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-24)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-24))*w[23]
df1["IS_pds_l25_dw_25_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-25)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-25))*w[24]

df1["IS_pds_l25_ws"] = df1[["IS_pds_l25_dw_1", "IS_pds_l25_dw_2", "IS_pds_l25_dw_3", "IS_pds_l25_dw_4", "IS_pds_l25_dw_5", "IS_pds_l25_dw_6", "IS_pds_l25_dw_7", "IS_pds_l25_dw_8", "IS_pds_l25_dw_9", "IS_pds_l25_dw_10", "IS_pds_l25_dw_11", "IS_pds_l25_dw_12", "IS_pds_l25_dw_13", "IS_pds_l25_dw_14", "IS_pds_l25_dw_15", "IS_pds_l25_dw_16", "IS_pds_l25_dw_17", "IS_pds_l25_dw_18", "IS_pds_l25_dw_19", "IS_pds_l25_dw_20", "IS_pds_l25_dw_21", "IS_pds_l25_dw_22", "IS_pds_l25_dw_23", "IS_pds_l25_dw_24", "IS_pds_l25_dw_25"]].sum(axis=1)
df1["IS_pds_l25_ws_ct"] = df1[["IS_pds_l25_dw_1_ct", "IS_pds_l25_dw_2_ct", "IS_pds_l25_dw_3_ct", "IS_pds_l25_dw_4_ct", "IS_pds_l25_dw_5_ct", "IS_pds_l25_dw_6_ct", "IS_pds_l25_dw_7_ct", "IS_pds_l25_dw_8_ct", "IS_pds_l25_dw_9_ct", "IS_pds_l25_dw_10_ct", "IS_pds_l25_dw_11_ct", "IS_pds_l25_dw_12_ct", "IS_pds_l25_dw_13_ct", "IS_pds_l25_dw_14_ct", "IS_pds_l25_dw_15_ct", "IS_pds_l25_dw_16_ct", "IS_pds_l25_dw_17_ct", "IS_pds_l25_dw_18_ct", "IS_pds_l25_dw_19_ct", "IS_pds_l25_dw_20_ct", "IS_pds_l25_dw_21_ct", "IS_pds_l25_dw_22_ct", "IS_pds_l25_dw_23_ct", "IS_pds_l25_dw_24_ct", "IS_pds_l25_dw_25_ct"]].sum(axis=1)
df1["IS_pds_l25_dw"] = df1["IS_pds_l25_ws"]/df1["IS_pds_l25_ws_ct"]

# Deleting transient columns
df1 = df1.drop(["IS_pds_l25_dw_1", "IS_pds_l25_dw_2", "IS_pds_l25_dw_3", "IS_pds_l25_dw_4", "IS_pds_l25_dw_5", "IS_pds_l25_dw_6", "IS_pds_l25_dw_7", "IS_pds_l25_dw_8", "IS_pds_l25_dw_9", "IS_pds_l25_dw_10", "IS_pds_l25_dw_11", "IS_pds_l25_dw_12", "IS_pds_l25_dw_13", "IS_pds_l25_dw_14", "IS_pds_l25_dw_15", "IS_pds_l25_dw_16", "IS_pds_l25_dw_17", "IS_pds_l25_dw_18", "IS_pds_l25_dw_19", "IS_pds_l25_dw_20", "IS_pds_l25_dw_21", "IS_pds_l25_dw_22", "IS_pds_l25_dw_23", "IS_pds_l25_dw_24", "IS_pds_l25_dw_25", "IS_pds_l25_dw_1_ct", "IS_pds_l25_dw_2_ct", "IS_pds_l25_dw_3_ct", "IS_pds_l25_dw_4_ct", "IS_pds_l25_dw_5_ct", "IS_pds_l25_dw_6_ct", "IS_pds_l25_dw_7_ct", "IS_pds_l25_dw_8_ct", "IS_pds_l25_dw_9_ct", "IS_pds_l25_dw_10_ct", "IS_pds_l25_dw_11_ct", "IS_pds_l25_dw_12_ct", "IS_pds_l25_dw_13_ct", "IS_pds_l25_dw_14_ct", "IS_pds_l25_dw_15_ct", "IS_pds_l25_dw_16_ct", "IS_pds_l25_dw_17_ct", "IS_pds_l25_dw_18_ct", "IS_pds_l25_dw_19_ct", "IS_pds_l25_dw_20_ct", "IS_pds_l25_dw_21_ct", "IS_pds_l25_dw_22_ct", "IS_pds_l25_dw_23_ct", "IS_pds_l25_dw_24_ct", "IS_pds_l25_dw_25_ct", "IS_pds_l25_ws", "IS_pds_l25_ws_ct"], axis = 1)

In [10]:
# #GMS_pds_l35_dw
# #Provides decay-weighted(dw), puzzle day-specific (pds) mean solve time performance for GMS over the previous 35 puzzles relative to a given puzzle
# # Note also that, unlike the 10-puzzle moving average, this weighted average does NOT include the "puzzle at hand" itself

# df1 = df1.sort_values(by=['DOW', 'P_Date'], ascending = False)

# # Gradual decay
# # w = np.arange(1,36)
# # w = list(w)

# # No decay
# w = np.ones(35)
# w = list(w)

# df1["GMS_pds_l35_dw_1"] = df1.groupby(['DOW'])['GMST(m)'].shift(-1)*w[0]
# df1["GMS_pds_l35_dw_2"] = df1.groupby(['DOW'])['GMST(m)'].shift(-2)*w[1]
# df1["GMS_pds_l35_dw_3"] = df1.groupby(['DOW'])['GMST(m)'].shift(-3)*w[2]
# df1["GMS_pds_l35_dw_4"] = df1.groupby(['DOW'])['GMST(m)'].shift(-4)*w[3]
# df1["GMS_pds_l35_dw_5"] = df1.groupby(['DOW'])['GMST(m)'].shift(-5)*w[4]
# df1["GMS_pds_l35_dw_6"] = df1.groupby(['DOW'])['GMST(m)'].shift(-6)*w[5]
# df1["GMS_pds_l35_dw_7"] = df1.groupby(['DOW'])['GMST(m)'].shift(-7)*w[6]
# df1["GMS_pds_l35_dw_8"] = df1.groupby(['DOW'])['GMST(m)'].shift(-8)*w[7]
# df1["GMS_pds_l35_dw_9"] = df1.groupby(['DOW'])['GMST(m)'].shift(-9)*w[8]
# df1["GMS_pds_l35_dw_10"] = df1.groupby(['DOW'])['GMST(m)'].shift(-10)*w[9]
# df1["GMS_pds_l35_dw_11"] = df1.groupby(['DOW'])['GMST(m)'].shift(-11)*w[10]
# df1["GMS_pds_l35_dw_12"] = df1.groupby(['DOW'])['GMST(m)'].shift(-12)*w[11]
# df1["GMS_pds_l35_dw_13"] = df1.groupby(['DOW'])['GMST(m)'].shift(-13)*w[12]
# df1["GMS_pds_l35_dw_14"] = df1.groupby(['DOW'])['GMST(m)'].shift(-14)*w[13]
# df1["GMS_pds_l35_dw_15"] = df1.groupby(['DOW'])['GMST(m)'].shift(-15)*w[14]
# df1["GMS_pds_l35_dw_16"] = df1.groupby(['DOW'])['GMST(m)'].shift(-16)*w[15]
# df1["GMS_pds_l35_dw_17"] = df1.groupby(['DOW'])['GMST(m)'].shift(-17)*w[16]
# df1["GMS_pds_l35_dw_18"] = df1.groupby(['DOW'])['GMST(m)'].shift(-18)*w[17]
# df1["GMS_pds_l35_dw_19"] = df1.groupby(['DOW'])['GMST(m)'].shift(-19)*w[18]
# df1["GMS_pds_l35_dw_20"] = df1.groupby(['DOW'])['GMST(m)'].shift(-20)*w[19]
# df1["GMS_pds_l35_dw_21"] = df1.groupby(['DOW'])['GMST(m)'].shift(-21)*w[20]
# df1["GMS_pds_l35_dw_22"] = df1.groupby(['DOW'])['GMST(m)'].shift(-22)*w[21]
# df1["GMS_pds_l35_dw_23"] = df1.groupby(['DOW'])['GMST(m)'].shift(-23)*w[22]
# df1["GMS_pds_l35_dw_24"] = df1.groupby(['DOW'])['GMST(m)'].shift(-24)*w[23]
# df1["GMS_pds_l35_dw_25"] = df1.groupby(['DOW'])['GMST(m)'].shift(-25)*w[24]
# df1["GMS_pds_l35_dw_26"] = df1.groupby(['DOW'])['GMST(m)'].shift(-26)*w[25]
# df1["GMS_pds_l35_dw_27"] = df1.groupby(['DOW'])['GMST(m)'].shift(-27)*w[26]
# df1["GMS_pds_l35_dw_28"] = df1.groupby(['DOW'])['GMST(m)'].shift(-28)*w[27]
# df1["GMS_pds_l35_dw_29"] = df1.groupby(['DOW'])['GMST(m)'].shift(-29)*w[28]
# df1["GMS_pds_l35_dw_30"] = df1.groupby(['DOW'])['GMST(m)'].shift(-30)*w[29]
# df1["GMS_pds_l35_dw_31"] = df1.groupby(['DOW'])['GMST(m)'].shift(-31)*w[30]
# df1["GMS_pds_l35_dw_32"] = df1.groupby(['DOW'])['GMST(m)'].shift(-32)*w[31]
# df1["GMS_pds_l35_dw_33"] = df1.groupby(['DOW'])['GMST(m)'].shift(-33)*w[32]
# df1["GMS_pds_l35_dw_34"] = df1.groupby(['DOW'])['GMST(m)'].shift(-34)*w[33]
# df1["GMS_pds_l35_dw_35"] = df1.groupby(['DOW'])['GMST(m)'].shift(-35)*w[34]

# df1["GMS_pds_l35_dw_1_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-1)/df1.groupby(['DOW'])['GMST(m)'].shift(-1))*w[0]
# df1["GMS_pds_l35_dw_2_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-2)/df1.groupby(['DOW'])['GMST(m)'].shift(-2))*w[1]
# df1["GMS_pds_l35_dw_3_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-3)/df1.groupby(['DOW'])['GMST(m)'].shift(-3))*w[2]
# df1["GMS_pds_l35_dw_4_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-4)/df1.groupby(['DOW'])['GMST(m)'].shift(-4))*w[3]
# df1["GMS_pds_l35_dw_5_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-5)/df1.groupby(['DOW'])['GMST(m)'].shift(-5))*w[4]
# df1["GMS_pds_l35_dw_6_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-6)/df1.groupby(['DOW'])['GMST(m)'].shift(-6))*w[5]
# df1["GMS_pds_l35_dw_7_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-7)/df1.groupby(['DOW'])['GMST(m)'].shift(-7))*w[6]
# df1["GMS_pds_l35_dw_8_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-8)/df1.groupby(['DOW'])['GMST(m)'].shift(-8))*w[7]
# df1["GMS_pds_l35_dw_9_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-9)/df1.groupby(['DOW'])['GMST(m)'].shift(-9))*w[8]
# df1["GMS_pds_l35_dw_10_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-10)/df1.groupby(['DOW'])['GMST(m)'].shift(-10))*w[9]
# df1["GMS_pds_l35_dw_11_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-11)/df1.groupby(['DOW'])['GMST(m)'].shift(-11))*w[10]
# df1["GMS_pds_l35_dw_12_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-12)/df1.groupby(['DOW'])['GMST(m)'].shift(-12))*w[11]
# df1["GMS_pds_l35_dw_13_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-13)/df1.groupby(['DOW'])['GMST(m)'].shift(-13))*w[12]
# df1["GMS_pds_l35_dw_14_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-14)/df1.groupby(['DOW'])['GMST(m)'].shift(-14))*w[13]
# df1["GMS_pds_l35_dw_15_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-15)/df1.groupby(['DOW'])['GMST(m)'].shift(-15))*w[14]
# df1["GMS_pds_l35_dw_16_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-16)/df1.groupby(['DOW'])['GMST(m)'].shift(-16))*w[15]
# df1["GMS_pds_l35_dw_17_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-17)/df1.groupby(['DOW'])['GMST(m)'].shift(-17))*w[16]
# df1["GMS_pds_l35_dw_18_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-18)/df1.groupby(['DOW'])['GMST(m)'].shift(-18))*w[17]
# df1["GMS_pds_l35_dw_19_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-19)/df1.groupby(['DOW'])['GMST(m)'].shift(-19))*w[18]
# df1["GMS_pds_l35_dw_20_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-20)/df1.groupby(['DOW'])['GMST(m)'].shift(-20))*w[19]
# df1["GMS_pds_l35_dw_21_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-21)/df1.groupby(['DOW'])['GMST(m)'].shift(-21))*w[20]
# df1["GMS_pds_l35_dw_22_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-22)/df1.groupby(['DOW'])['GMST(m)'].shift(-22))*w[21]
# df1["GMS_pds_l35_dw_23_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-23)/df1.groupby(['DOW'])['GMST(m)'].shift(-23))*w[22]
# df1["GMS_pds_l35_dw_24_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-24)/df1.groupby(['DOW'])['GMST(m)'].shift(-24))*w[23]
# df1["GMS_pds_l35_dw_25_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-25)/df1.groupby(['DOW'])['GMST(m)'].shift(-25))*w[24]
# df1["GMS_pds_l35_dw_26_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-26)/df1.groupby(['DOW'])['GMST(m)'].shift(-26))*w[25]
# df1["GMS_pds_l35_dw_27_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-27)/df1.groupby(['DOW'])['GMST(m)'].shift(-27))*w[26]
# df1["GMS_pds_l35_dw_28_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-28)/df1.groupby(['DOW'])['GMST(m)'].shift(-28))*w[27]
# df1["GMS_pds_l35_dw_29_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-29)/df1.groupby(['DOW'])['GMST(m)'].shift(-29))*w[28]
# df1["GMS_pds_l35_dw_30_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-30)/df1.groupby(['DOW'])['GMST(m)'].shift(-30))*w[29]
# df1["GMS_pds_l35_dw_31_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-31)/df1.groupby(['DOW'])['GMST(m)'].shift(-31))*w[30]
# df1["GMS_pds_l35_dw_32_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-32)/df1.groupby(['DOW'])['GMST(m)'].shift(-32))*w[31]
# df1["GMS_pds_l35_dw_33_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-33)/df1.groupby(['DOW'])['GMST(m)'].shift(-33))*w[32]
# df1["GMS_pds_l35_dw_34_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-34)/df1.groupby(['DOW'])['GMST(m)'].shift(-34))*w[33]
# df1["GMS_pds_l35_dw_35_ct"] = (df1.groupby(['DOW'])['GMST(m)'].shift(-35)/df1.groupby(['DOW'])['GMST(m)'].shift(-35))*w[34]


# df1["GMS_pds_l35_ws"] = df1[["GMS_pds_l35_dw_1", "GMS_pds_l35_dw_2", "GMS_pds_l35_dw_3", "GMS_pds_l35_dw_4", "GMS_pds_l35_dw_5", "GMS_pds_l35_dw_6", "GMS_pds_l35_dw_7", "GMS_pds_l35_dw_8", "GMS_pds_l35_dw_9", "GMS_pds_l35_dw_10", "GMS_pds_l35_dw_11", "GMS_pds_l35_dw_12", "GMS_pds_l35_dw_13", "GMS_pds_l35_dw_14", "GMS_pds_l35_dw_15", "GMS_pds_l35_dw_16", "GMS_pds_l35_dw_17", "GMS_pds_l35_dw_18", "GMS_pds_l35_dw_19", "GMS_pds_l35_dw_20", "GMS_pds_l35_dw_21", "GMS_pds_l35_dw_22", "GMS_pds_l35_dw_23", "GMS_pds_l35_dw_24", "GMS_pds_l35_dw_25",
#                             "GMS_pds_l35_dw_26", "GMS_pds_l35_dw_27", "GMS_pds_l35_dw_28", "GMS_pds_l35_dw_29", "GMS_pds_l35_dw_30", "GMS_pds_l35_dw_31", "GMS_pds_l35_dw_32", "GMS_pds_l35_dw_33", "GMS_pds_l35_dw_34", "GMS_pds_l35_dw_35"]].sum(axis=1)
# df1["GMS_pds_l35_ws_ct"] = df1[["GMS_pds_l35_dw_1_ct", "GMS_pds_l35_dw_2_ct", "GMS_pds_l35_dw_3_ct", "GMS_pds_l35_dw_4_ct", "GMS_pds_l35_dw_5_ct", "GMS_pds_l35_dw_6_ct", "GMS_pds_l35_dw_7_ct", "GMS_pds_l35_dw_8_ct", "GMS_pds_l35_dw_9_ct", "GMS_pds_l35_dw_10_ct", "GMS_pds_l35_dw_11_ct", "GMS_pds_l35_dw_12_ct", "GMS_pds_l35_dw_13_ct", "GMS_pds_l35_dw_14_ct", "GMS_pds_l35_dw_15_ct", "GMS_pds_l35_dw_16_ct", "GMS_pds_l35_dw_17_ct", "GMS_pds_l35_dw_18_ct", "GMS_pds_l35_dw_19_ct", "GMS_pds_l35_dw_20_ct", "GMS_pds_l35_dw_21_ct", "GMS_pds_l35_dw_22_ct", "GMS_pds_l35_dw_23_ct", "GMS_pds_l35_dw_24_ct", "GMS_pds_l35_dw_25_ct",
#                                "GMS_pds_l35_dw_26_ct", "GMS_pds_l35_dw_27_ct", "GMS_pds_l35_dw_28_ct", "GMS_pds_l35_dw_29_ct", "GMS_pds_l35_dw_30_ct", "GMS_pds_l35_dw_31_ct", "GMS_pds_l35_dw_32_ct", "GMS_pds_l35_dw_33_ct", "GMS_pds_l35_dw_34_ct", "GMS_pds_l35_dw_35_ct"]].sum(axis=1)
# df1["GMS_pds_l35_dw"] = df1["GMS_pds_l35_ws"]/df1["GMS_pds_l35_ws_ct"]

# # Deleting transient columns
# # df1 = df1.drop(["GMS_pds_l35_dw_1", "GMS_pds_l35_dw_2", "GMS_pds_l35_dw_3", "GMS_pds_l35_dw_4", "GMS_pds_l35_dw_5", "GMS_pds_l35_dw_6", "GMS_pds_l35_dw_7", "GMS_pds_l35_dw_8", "GMS_pds_l35_dw_9", "GMS_pds_l35_dw_10", "GMS_pds_l35_dw_11", "GMS_pds_l35_dw_12", "GMS_pds_l35_dw_13", "GMS_pds_l35_dw_14", "GMS_pds_l35_dw_15", "GMS_pds_l35_dw_16", "GMS_pds_l35_dw_17", "GMS_pds_l35_dw_18", "GMS_pds_l35_dw_19", "GMS_pds_l35_dw_20", "GMS_pds_l35_dw_21", "GMS_pds_l35_dw_22", "GMS_pds_l35_dw_23", "GMS_pds_l35_dw_24", "GMS_pds_l35_dw_25", "GMS_pds_l35_dw_26", "GMS_pds_l35_dw_27", "GMS_pds_l35_dw_28", "GMS_pds_l35_dw_29", "GMS_pds_l35_dw_30", "GMS_pds_l35_dw_31", "GMS_pds_l35_dw_32", "GMS_pds_l35_dw_33", "GMS_pds_l35_dw_34", "GMS_pds_l35_dw_35", "GMS_pds_l35_dw_36", "GMS_pds_l35_dw_37", "GMS_pds_l35_dw_38", "GMS_pds_l35_dw_39", "GMS_pds_l35_dw_40", "GMS_pds_l35_dw_41", "GMS_pds_l35_dw_42", "GMS_pds_l35_dw_43", "GMS_pds_l35_dw_44", "GMS_pds_l35_dw_45", "GMS_pds_l35_dw_46", "GMS_pds_l35_dw_47", "GMS_pds_l35_dw_48", "GMS_pds_l35_dw_49", "GMS_pds_l35_dw_50",  
# #                 "GMS_pds_l35_dw_1_ct", "GMS_pds_l35_dw_2_ct", "GMS_pds_l35_dw_3_ct", "GMS_pds_l35_dw_4_ct", "GMS_pds_l35_dw_5_ct", "GMS_pds_l35_dw_6_ct", "GMS_pds_l35_dw_7_ct", "GMS_pds_l35_dw_8_ct", "GMS_pds_l35_dw_9_ct", "GMS_pds_l35_dw_10_ct", "GMS_pds_l35_dw_11_ct", "GMS_pds_l35_dw_12_ct", "GMS_pds_l35_dw_13_ct", "GMS_pds_l35_dw_14_ct", "GMS_pds_l35_dw_15_ct", "GMS_pds_l35_dw_16_ct", "GMS_pds_l35_dw_17_ct", "GMS_pds_l35_dw_18_ct", "GMS_pds_l35_dw_19_ct", "GMS_pds_l35_dw_20_ct", "GMS_pds_l35_dw_21_ct", "GMS_pds_l35_dw_22_ct", "GMS_pds_l35_dw_23_ct", "GMS_pds_l35_dw_24_ct", "GMS_pds_l35_dw_25_ct", "GMS_pds_l35_dw_26_ct", "GMS_pds_l35_dw_27_ct", "GMS_pds_l35_dw_28_ct", "GMS_pds_l35_dw_29_ct", "GMS_pds_l35_dw_30_ct", "GMS_pds_l35_dw_31_ct", "GMS_pds_l35_dw_32_ct", "GMS_pds_l35_dw_33_ct", "GMS_pds_l35_dw_34_ct", "GMS_pds_l35_dw_35_ct", "GMS_pds_l35_dw_36_ct", "GMS_pds_l35_dw_37_ct", "GMS_pds_l35_dw_38_ct", "GMS_pds_l35_dw_39_ct", "GMS_pds_l35_dw_40_ct", "GMS_pds_l35_dw_41_ct", "GMS_pds_l35_dw_42_ct", "GMS_pds_l35_dw_43_ct", "GMS_pds_l35_dw_44_ct", "GMS_pds_l35_dw_45_ct", "GMS_pds_l35_dw_46_ct", "GMS_pds_l35_dw_47_ct", "GMS_pds_l35_dw_48_ct", "GMS_pds_l35_dw_49_ct", "GMS_pds_l35_dw_50_ct",
# #                 "GMS_pds_l35_ws", "GMS_pds_l35_ws_ct"], axis = 1)

In [66]:
#IS_pds_l40_dw
#Provides decay-weighted(dw), puzzle day-specific (pds) mean solve time performance for IS1 over the previous 40 puzzles relative to a given puzzle
# Note also that, unlike the 10-puzzle moving average, this weighted average does NOT include the "puzzle at hand" itself

df1 = df1.sort_values(by=['DOW', 'Comp_Date'], ascending = False)

# Gradual decay
w = np.arange(1,41)
w = list(w)

# No decay
# w = np.ones(40)
# w = list(w)

df1["IS_pds_l40_dw_1"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-1)*w[0]
df1["IS_pds_l40_dw_2"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-2)*w[1]
df1["IS_pds_l40_dw_3"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-3)*w[2]
df1["IS_pds_l40_dw_4"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-4)*w[3]
df1["IS_pds_l40_dw_5"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-5)*w[4]
df1["IS_pds_l40_dw_6"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-6)*w[5]
df1["IS_pds_l40_dw_7"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-7)*w[6]
df1["IS_pds_l40_dw_8"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-8)*w[7]
df1["IS_pds_l40_dw_9"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-9)*w[8]
df1["IS_pds_l40_dw_10"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-10)*w[9]
df1["IS_pds_l40_dw_11"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-11)*w[10]
df1["IS_pds_l40_dw_12"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-12)*w[11]
df1["IS_pds_l40_dw_13"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-13)*w[12]
df1["IS_pds_l40_dw_14"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-14)*w[13]
df1["IS_pds_l40_dw_15"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-15)*w[14]
df1["IS_pds_l40_dw_16"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-16)*w[15]
df1["IS_pds_l40_dw_17"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-17)*w[16]
df1["IS_pds_l40_dw_18"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-18)*w[17]
df1["IS_pds_l40_dw_19"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-19)*w[18]
df1["IS_pds_l40_dw_20"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-20)*w[19]
df1["IS_pds_l40_dw_21"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-21)*w[20]
df1["IS_pds_l40_dw_22"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-22)*w[21]
df1["IS_pds_l40_dw_23"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-23)*w[22]
df1["IS_pds_l40_dw_24"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-24)*w[23]
df1["IS_pds_l40_dw_25"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-25)*w[24]
df1["IS_pds_l40_dw_26"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-26)*w[25]
df1["IS_pds_l40_dw_27"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-27)*w[26]
df1["IS_pds_l40_dw_28"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-28)*w[27]
df1["IS_pds_l40_dw_29"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-29)*w[28]
df1["IS_pds_l40_dw_30"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-30)*w[29]
df1["IS_pds_l40_dw_31"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-31)*w[30]
df1["IS_pds_l40_dw_32"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-32)*w[31]
df1["IS_pds_l40_dw_33"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-33)*w[32]
df1["IS_pds_l40_dw_34"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-34)*w[33]
df1["IS_pds_l40_dw_35"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-35)*w[34]
df1["IS_pds_l40_dw_36"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-36)*w[35]
df1["IS_pds_l40_dw_37"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-37)*w[36]
df1["IS_pds_l40_dw_38"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-38)*w[37]
df1["IS_pds_l40_dw_39"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-39)*w[38]
df1["IS_pds_l40_dw_40"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-40)*w[39]


df1["IS_pds_l40_dw_1_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-1)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-1))*w[0]
df1["IS_pds_l40_dw_2_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-2)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-2))*w[1]
df1["IS_pds_l40_dw_3_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-3)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-3))*w[2]
df1["IS_pds_l40_dw_4_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-4)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-4))*w[3]
df1["IS_pds_l40_dw_5_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-5)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-5))*w[4]
df1["IS_pds_l40_dw_6_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-6)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-6))*w[5]
df1["IS_pds_l40_dw_7_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-7)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-7))*w[6]
df1["IS_pds_l40_dw_8_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-8)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-8))*w[7]
df1["IS_pds_l40_dw_9_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-9)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-9))*w[8]
df1["IS_pds_l40_dw_10_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-10)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-10))*w[9]
df1["IS_pds_l40_dw_11_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-11)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-11))*w[10]
df1["IS_pds_l40_dw_12_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-12)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-12))*w[11]
df1["IS_pds_l40_dw_13_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-13)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-13))*w[12]
df1["IS_pds_l40_dw_14_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-14)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-14))*w[13]
df1["IS_pds_l40_dw_15_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-15)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-15))*w[14]
df1["IS_pds_l40_dw_16_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-16)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-16))*w[15]
df1["IS_pds_l40_dw_17_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-17)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-17))*w[16]
df1["IS_pds_l40_dw_18_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-18)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-18))*w[17]
df1["IS_pds_l40_dw_19_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-19)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-19))*w[18]
df1["IS_pds_l40_dw_20_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-20)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-20))*w[19]
df1["IS_pds_l40_dw_21_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-21)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-21))*w[20]
df1["IS_pds_l40_dw_22_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-22)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-22))*w[21]
df1["IS_pds_l40_dw_23_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-23)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-23))*w[22]
df1["IS_pds_l40_dw_24_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-24)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-24))*w[23]
df1["IS_pds_l40_dw_25_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-25)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-25))*w[24]
df1["IS_pds_l40_dw_26_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-26)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-26))*w[25]
df1["IS_pds_l40_dw_27_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-27)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-27))*w[26]
df1["IS_pds_l40_dw_28_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-28)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-28))*w[27]
df1["IS_pds_l40_dw_29_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-29)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-29))*w[28]
df1["IS_pds_l40_dw_30_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-30)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-30))*w[29]
df1["IS_pds_l40_dw_31_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-31)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-31))*w[30]
df1["IS_pds_l40_dw_32_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-32)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-32))*w[31]
df1["IS_pds_l40_dw_33_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-33)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-33))*w[32]
df1["IS_pds_l40_dw_34_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-34)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-34))*w[33]
df1["IS_pds_l40_dw_35_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-35)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-35))*w[34]
df1["IS_pds_l40_dw_36_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-36)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-36))*w[35]
df1["IS_pds_l40_dw_37_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-37)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-37))*w[36]
df1["IS_pds_l40_dw_38_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-38)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-38))*w[37]
df1["IS_pds_l40_dw_39_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-39)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-39))*w[38]
df1["IS_pds_l40_dw_40_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-40)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-40))*w[39]


df1["IS_pds_l40_ws"] = df1[["IS_pds_l40_dw_1", "IS_pds_l40_dw_2", "IS_pds_l40_dw_3", "IS_pds_l40_dw_4", "IS_pds_l40_dw_5", "IS_pds_l40_dw_6", "IS_pds_l40_dw_7", "IS_pds_l40_dw_8", "IS_pds_l40_dw_9", "IS_pds_l40_dw_10", "IS_pds_l40_dw_11", "IS_pds_l40_dw_12", "IS_pds_l40_dw_13", "IS_pds_l40_dw_14", "IS_pds_l40_dw_15", "IS_pds_l40_dw_16", "IS_pds_l40_dw_17", "IS_pds_l40_dw_18", "IS_pds_l40_dw_19", "IS_pds_l40_dw_20", "IS_pds_l40_dw_21", "IS_pds_l40_dw_22", "IS_pds_l40_dw_23", "IS_pds_l40_dw_24", "IS_pds_l40_dw_25",
                            "IS_pds_l40_dw_26", "IS_pds_l40_dw_27", "IS_pds_l40_dw_28", "IS_pds_l40_dw_29", "IS_pds_l40_dw_30", "IS_pds_l40_dw_31", "IS_pds_l40_dw_32", "IS_pds_l40_dw_33", "IS_pds_l40_dw_34", "IS_pds_l40_dw_35", "IS_pds_l40_dw_36", "IS_pds_l40_dw_37", "IS_pds_l40_dw_38", "IS_pds_l40_dw_39", "IS_pds_l40_dw_40"]].sum(axis=1)
df1["IS_pds_l40_ws_ct"] = df1[["IS_pds_l40_dw_1_ct", "IS_pds_l40_dw_2_ct", "IS_pds_l40_dw_3_ct", "IS_pds_l40_dw_4_ct", "IS_pds_l40_dw_5_ct", "IS_pds_l40_dw_6_ct", "IS_pds_l40_dw_7_ct", "IS_pds_l40_dw_8_ct", "IS_pds_l40_dw_9_ct", "IS_pds_l40_dw_10_ct", "IS_pds_l40_dw_11_ct", "IS_pds_l40_dw_12_ct", "IS_pds_l40_dw_13_ct", "IS_pds_l40_dw_14_ct", "IS_pds_l40_dw_15_ct", "IS_pds_l40_dw_16_ct", "IS_pds_l40_dw_17_ct", "IS_pds_l40_dw_18_ct", "IS_pds_l40_dw_19_ct", "IS_pds_l40_dw_20_ct", "IS_pds_l40_dw_21_ct", "IS_pds_l40_dw_22_ct", "IS_pds_l40_dw_23_ct", "IS_pds_l40_dw_24_ct", "IS_pds_l40_dw_25_ct",
                               "IS_pds_l40_dw_26_ct", "IS_pds_l40_dw_27_ct", "IS_pds_l40_dw_28_ct", "IS_pds_l40_dw_29_ct", "IS_pds_l40_dw_30_ct", "IS_pds_l40_dw_31_ct", "IS_pds_l40_dw_32_ct", "IS_pds_l40_dw_33_ct", "IS_pds_l40_dw_34_ct", "IS_pds_l40_dw_35_ct", "IS_pds_l40_dw_36_ct", "IS_pds_l40_dw_37_ct", "IS_pds_l40_dw_38_ct", "IS_pds_l40_dw_39_ct", "IS_pds_l40_dw_40_ct"]].sum(axis=1)
df1["IS_pds_l40_dw"] = df1["IS_pds_l40_ws"]/df1["IS_pds_l40_ws_ct"]

# Deleting transient columns
df1 = df1.drop(["IS_pds_l40_dw_1", "IS_pds_l40_dw_2", "IS_pds_l40_dw_3", "IS_pds_l40_dw_4", "IS_pds_l40_dw_5", "IS_pds_l40_dw_6", "IS_pds_l40_dw_7", "IS_pds_l40_dw_8", "IS_pds_l40_dw_9", "IS_pds_l40_dw_10", "IS_pds_l40_dw_11", "IS_pds_l40_dw_12", "IS_pds_l40_dw_13", "IS_pds_l40_dw_14", "IS_pds_l40_dw_15", "IS_pds_l40_dw_16", "IS_pds_l40_dw_17", "IS_pds_l40_dw_18", "IS_pds_l40_dw_19", "IS_pds_l40_dw_20", "IS_pds_l40_dw_21", "IS_pds_l40_dw_22", "IS_pds_l40_dw_23", "IS_pds_l40_dw_24", "IS_pds_l40_dw_25", "IS_pds_l40_dw_26", "IS_pds_l40_dw_27", "IS_pds_l40_dw_28", "IS_pds_l40_dw_29", "IS_pds_l40_dw_30", "IS_pds_l40_dw_31", "IS_pds_l40_dw_32", "IS_pds_l40_dw_33", "IS_pds_l40_dw_34", "IS_pds_l40_dw_35", "IS_pds_l40_dw_36", "IS_pds_l40_dw_37", "IS_pds_l40_dw_38", "IS_pds_l40_dw_39", "IS_pds_l40_dw_40",  
                "IS_pds_l40_dw_1_ct", "IS_pds_l40_dw_2_ct", "IS_pds_l40_dw_3_ct", "IS_pds_l40_dw_4_ct", "IS_pds_l40_dw_5_ct", "IS_pds_l40_dw_6_ct", "IS_pds_l40_dw_7_ct", "IS_pds_l40_dw_8_ct", "IS_pds_l40_dw_9_ct", "IS_pds_l40_dw_10_ct", "IS_pds_l40_dw_11_ct", "IS_pds_l40_dw_12_ct", "IS_pds_l40_dw_13_ct", "IS_pds_l40_dw_14_ct", "IS_pds_l40_dw_15_ct", "IS_pds_l40_dw_16_ct", "IS_pds_l40_dw_17_ct", "IS_pds_l40_dw_18_ct", "IS_pds_l40_dw_19_ct", "IS_pds_l40_dw_20_ct", "IS_pds_l40_dw_21_ct", "IS_pds_l40_dw_22_ct", "IS_pds_l40_dw_23_ct", "IS_pds_l40_dw_24_ct", "IS_pds_l40_dw_25_ct", "IS_pds_l40_dw_26_ct", "IS_pds_l40_dw_27_ct", "IS_pds_l40_dw_28_ct", "IS_pds_l40_dw_29_ct", "IS_pds_l40_dw_30_ct", "IS_pds_l40_dw_31_ct", "IS_pds_l40_dw_32_ct", "IS_pds_l40_dw_33_ct", "IS_pds_l40_dw_34_ct", "IS_pds_l40_dw_35_ct", "IS_pds_l40_dw_36_ct", "IS_pds_l40_dw_37_ct", "IS_pds_l40_dw_38_ct", "IS_pds_l40_dw_39_ct", "IS_pds_l40_dw_40_ct",
                "IS_pds_l40_ws", "IS_pds_l40_ws_ct"], axis = 1)

In [652]:
#IS_pds_l50_dw
#Provides decay-weighted(dw), puzzle day-specific (pds) mean solve time performance for GMS over the previous 50 puzzles relative to a given puzzle
# Note also that, unlike the 10-puzzle moving average, this weighted average does NOT include the "puzzle at hand" itself

df1 = df1.sort_values(by=['DOW', 'Comp_Date'], ascending = False)

# Gradual decay
# w = np.arange(1,51)
# w = list(w)

# No decay
w = np.ones(50)
w = list(w)

df1["IS_pds_l50_dw_1"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-1)*w[0]
df1["IS_pds_l50_dw_2"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-2)*w[1]
df1["IS_pds_l50_dw_3"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-3)*w[2]
df1["IS_pds_l50_dw_4"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-4)*w[3]
df1["IS_pds_l50_dw_5"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-5)*w[4]
df1["IS_pds_l50_dw_6"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-6)*w[5]
df1["IS_pds_l50_dw_7"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-7)*w[6]
df1["IS_pds_l50_dw_8"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-8)*w[7]
df1["IS_pds_l50_dw_9"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-9)*w[8]
df1["IS_pds_l50_dw_10"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-10)*w[9]
df1["IS_pds_l50_dw_11"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-11)*w[10]
df1["IS_pds_l50_dw_12"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-12)*w[11]
df1["IS_pds_l50_dw_13"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-13)*w[12]
df1["IS_pds_l50_dw_14"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-14)*w[13]
df1["IS_pds_l50_dw_15"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-15)*w[14]
df1["IS_pds_l50_dw_16"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-16)*w[15]
df1["IS_pds_l50_dw_17"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-17)*w[16]
df1["IS_pds_l50_dw_18"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-18)*w[17]
df1["IS_pds_l50_dw_19"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-19)*w[18]
df1["IS_pds_l50_dw_20"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-20)*w[19]
df1["IS_pds_l50_dw_21"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-21)*w[20]
df1["IS_pds_l50_dw_22"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-22)*w[21]
df1["IS_pds_l50_dw_23"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-23)*w[22]
df1["IS_pds_l50_dw_24"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-24)*w[23]
df1["IS_pds_l50_dw_25"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-25)*w[24]
df1["IS_pds_l50_dw_26"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-26)*w[25]
df1["IS_pds_l50_dw_27"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-27)*w[26]
df1["IS_pds_l50_dw_28"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-28)*w[27]
df1["IS_pds_l50_dw_29"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-29)*w[28]
df1["IS_pds_l50_dw_30"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-30)*w[29]
df1["IS_pds_l50_dw_31"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-31)*w[30]
df1["IS_pds_l50_dw_32"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-32)*w[31]
df1["IS_pds_l50_dw_33"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-33)*w[32]
df1["IS_pds_l50_dw_34"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-34)*w[33]
df1["IS_pds_l50_dw_35"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-35)*w[34]
df1["IS_pds_l50_dw_36"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-36)*w[35]
df1["IS_pds_l50_dw_37"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-37)*w[36]
df1["IS_pds_l50_dw_38"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-38)*w[37]
df1["IS_pds_l50_dw_39"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-39)*w[38]
df1["IS_pds_l50_dw_40"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-40)*w[39]
df1["IS_pds_l50_dw_41"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-41)*w[40]
df1["IS_pds_l50_dw_42"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-42)*w[41]
df1["IS_pds_l50_dw_43"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-43)*w[42]
df1["IS_pds_l50_dw_44"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-44)*w[43]
df1["IS_pds_l50_dw_45"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-45)*w[44]
df1["IS_pds_l50_dw_46"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-46)*w[45]
df1["IS_pds_l50_dw_47"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-47)*w[46]
df1["IS_pds_l50_dw_48"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-48)*w[47]
df1["IS_pds_l50_dw_49"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-49)*w[48]
df1["IS_pds_l50_dw_50"] = df1.groupby(['DOW'])['IS1_ST(m)'].shift(-50)*w[49]

df1["IS_pds_l50_dw_1_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-1)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-1))*w[0]
df1["IS_pds_l50_dw_2_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-2)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-2))*w[1]
df1["IS_pds_l50_dw_3_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-3)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-3))*w[2]
df1["IS_pds_l50_dw_4_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-4)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-4))*w[3]
df1["IS_pds_l50_dw_5_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-5)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-5))*w[4]
df1["IS_pds_l50_dw_6_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-6)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-6))*w[5]
df1["IS_pds_l50_dw_7_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-7)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-7))*w[6]
df1["IS_pds_l50_dw_8_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-8)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-8))*w[7]
df1["IS_pds_l50_dw_9_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-9)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-9))*w[8]
df1["IS_pds_l50_dw_10_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-10)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-10))*w[9]
df1["IS_pds_l50_dw_11_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-11)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-11))*w[10]
df1["IS_pds_l50_dw_12_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-12)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-12))*w[11]
df1["IS_pds_l50_dw_13_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-13)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-13))*w[12]
df1["IS_pds_l50_dw_14_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-14)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-14))*w[13]
df1["IS_pds_l50_dw_15_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-15)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-15))*w[14]
df1["IS_pds_l50_dw_16_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-16)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-16))*w[15]
df1["IS_pds_l50_dw_17_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-17)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-17))*w[16]
df1["IS_pds_l50_dw_18_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-18)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-18))*w[17]
df1["IS_pds_l50_dw_19_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-19)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-19))*w[18]
df1["IS_pds_l50_dw_20_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-20)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-20))*w[19]
df1["IS_pds_l50_dw_21_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-21)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-21))*w[20]
df1["IS_pds_l50_dw_22_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-22)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-22))*w[21]
df1["IS_pds_l50_dw_23_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-23)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-23))*w[22]
df1["IS_pds_l50_dw_24_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-24)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-24))*w[23]
df1["IS_pds_l50_dw_25_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-25)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-25))*w[24]
df1["IS_pds_l50_dw_26_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-26)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-26))*w[25]
df1["IS_pds_l50_dw_27_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-27)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-27))*w[26]
df1["IS_pds_l50_dw_28_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-28)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-28))*w[27]
df1["IS_pds_l50_dw_29_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-29)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-29))*w[28]
df1["IS_pds_l50_dw_30_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-30)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-30))*w[29]
df1["IS_pds_l50_dw_31_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-31)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-31))*w[30]
df1["IS_pds_l50_dw_32_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-32)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-32))*w[31]
df1["IS_pds_l50_dw_33_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-33)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-33))*w[32]
df1["IS_pds_l50_dw_34_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-34)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-34))*w[33]
df1["IS_pds_l50_dw_35_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-35)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-35))*w[34]
df1["IS_pds_l50_dw_36_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-36)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-36))*w[35]
df1["IS_pds_l50_dw_37_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-37)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-37))*w[36]
df1["IS_pds_l50_dw_38_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-38)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-38))*w[37]
df1["IS_pds_l50_dw_39_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-39)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-39))*w[38]
df1["IS_pds_l50_dw_40_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-40)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-40))*w[39]
df1["IS_pds_l50_dw_41_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-41)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-41))*w[40]
df1["IS_pds_l50_dw_42_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-42)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-42))*w[41]
df1["IS_pds_l50_dw_43_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-43)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-43))*w[42]
df1["IS_pds_l50_dw_44_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-44)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-44))*w[43]
df1["IS_pds_l50_dw_45_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-45)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-45))*w[44]
df1["IS_pds_l50_dw_46_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-46)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-46))*w[45]
df1["IS_pds_l50_dw_47_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-47)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-47))*w[46]
df1["IS_pds_l50_dw_48_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-48)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-48))*w[47]
df1["IS_pds_l50_dw_49_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-49)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-49))*w[48]
df1["IS_pds_l50_dw_50_ct"] = (df1.groupby(['DOW'])['IS1_ST(m)'].shift(-50)/df1.groupby(['DOW'])['IS1_ST(m)'].shift(-50))*w[49]

df1["IS_pds_l50_ws"] = df1[["IS_pds_l50_dw_1", "IS_pds_l50_dw_2", "IS_pds_l50_dw_3", "IS_pds_l50_dw_4", "IS_pds_l50_dw_5", "IS_pds_l50_dw_6", "IS_pds_l50_dw_7", "IS_pds_l50_dw_8", "IS_pds_l50_dw_9", "IS_pds_l50_dw_10", "IS_pds_l50_dw_11", "IS_pds_l50_dw_12", "IS_pds_l50_dw_13", "IS_pds_l50_dw_14", "IS_pds_l50_dw_15", "IS_pds_l50_dw_16", "IS_pds_l50_dw_17", "IS_pds_l50_dw_18", "IS_pds_l50_dw_19", "IS_pds_l50_dw_20", "IS_pds_l50_dw_21", "IS_pds_l50_dw_22", "IS_pds_l50_dw_23", "IS_pds_l50_dw_24", "IS_pds_l50_dw_25",
                            "IS_pds_l50_dw_26", "IS_pds_l50_dw_27", "IS_pds_l50_dw_28", "IS_pds_l50_dw_29", "IS_pds_l50_dw_30", "IS_pds_l50_dw_31", "IS_pds_l50_dw_32", "IS_pds_l50_dw_33", "IS_pds_l50_dw_34", "IS_pds_l50_dw_35", "IS_pds_l50_dw_36", "IS_pds_l50_dw_37", "IS_pds_l50_dw_38", "IS_pds_l50_dw_39", "IS_pds_l50_dw_40", "IS_pds_l50_dw_41", "IS_pds_l50_dw_42", "IS_pds_l50_dw_43", "IS_pds_l50_dw_44", "IS_pds_l50_dw_45", "IS_pds_l50_dw_46", "IS_pds_l50_dw_47", "IS_pds_l50_dw_48", "IS_pds_l50_dw_49", "IS_pds_l50_dw_50"]].sum(axis=1)
df1["IS_pds_l50_ws_ct"] = df1[["IS_pds_l50_dw_1_ct", "IS_pds_l50_dw_2_ct", "IS_pds_l50_dw_3_ct", "IS_pds_l50_dw_4_ct", "IS_pds_l50_dw_5_ct", "IS_pds_l50_dw_6_ct", "IS_pds_l50_dw_7_ct", "IS_pds_l50_dw_8_ct", "IS_pds_l50_dw_9_ct", "IS_pds_l50_dw_10_ct", "IS_pds_l50_dw_11_ct", "IS_pds_l50_dw_12_ct", "IS_pds_l50_dw_13_ct", "IS_pds_l50_dw_14_ct", "IS_pds_l50_dw_15_ct", "IS_pds_l50_dw_16_ct", "IS_pds_l50_dw_17_ct", "IS_pds_l50_dw_18_ct", "IS_pds_l50_dw_19_ct", "IS_pds_l50_dw_20_ct", "IS_pds_l50_dw_21_ct", "IS_pds_l50_dw_22_ct", "IS_pds_l50_dw_23_ct", "IS_pds_l50_dw_24_ct", "IS_pds_l50_dw_25_ct",
                               "IS_pds_l50_dw_26_ct", "IS_pds_l50_dw_27_ct", "IS_pds_l50_dw_28_ct", "IS_pds_l50_dw_29_ct", "IS_pds_l50_dw_30_ct", "IS_pds_l50_dw_31_ct", "IS_pds_l50_dw_32_ct", "IS_pds_l50_dw_33_ct", "IS_pds_l50_dw_34_ct", "IS_pds_l50_dw_35_ct", "IS_pds_l50_dw_36_ct", "IS_pds_l50_dw_37_ct", "IS_pds_l50_dw_38_ct", "IS_pds_l50_dw_39_ct", "IS_pds_l50_dw_40_ct", "IS_pds_l50_dw_41_ct", "IS_pds_l50_dw_42_ct", "IS_pds_l50_dw_43_ct", "IS_pds_l50_dw_44_ct", "IS_pds_l50_dw_45_ct", "IS_pds_l50_dw_46_ct", "IS_pds_l50_dw_47_ct", "IS_pds_l50_dw_48_ct", "IS_pds_l50_dw_49_ct", "IS_pds_l50_dw_50_ct"]].sum(axis=1)
df1["IS_pds_l50_dw"] = df1["IS_pds_l50_ws"]/df1["IS_pds_l50_ws_ct"]

# Deleting transient columns
df1 = df1.drop(["IS_pds_l50_dw_1", "IS_pds_l50_dw_2", "IS_pds_l50_dw_3", "IS_pds_l50_dw_4", "IS_pds_l50_dw_5", "IS_pds_l50_dw_6", "IS_pds_l50_dw_7", "IS_pds_l50_dw_8", "IS_pds_l50_dw_9", "IS_pds_l50_dw_10", "IS_pds_l50_dw_11", "IS_pds_l50_dw_12", "IS_pds_l50_dw_13", "IS_pds_l50_dw_14", "IS_pds_l50_dw_15", "IS_pds_l50_dw_16", "IS_pds_l50_dw_17", "IS_pds_l50_dw_18", "IS_pds_l50_dw_19", "IS_pds_l50_dw_20", "IS_pds_l50_dw_21", "IS_pds_l50_dw_22", "IS_pds_l50_dw_23", "IS_pds_l50_dw_24", "IS_pds_l50_dw_25", "IS_pds_l50_dw_26", "IS_pds_l50_dw_27", "IS_pds_l50_dw_28", "IS_pds_l50_dw_29", "IS_pds_l50_dw_30", "IS_pds_l50_dw_31", "IS_pds_l50_dw_32", "IS_pds_l50_dw_33", "IS_pds_l50_dw_34", "IS_pds_l50_dw_35", "IS_pds_l50_dw_36", "IS_pds_l50_dw_37", "IS_pds_l50_dw_38", "IS_pds_l50_dw_39", "IS_pds_l50_dw_40", "IS_pds_l50_dw_41", "IS_pds_l50_dw_42", "IS_pds_l50_dw_43", "IS_pds_l50_dw_44", "IS_pds_l50_dw_45", "IS_pds_l50_dw_46", "IS_pds_l50_dw_47", "IS_pds_l50_dw_48", "IS_pds_l50_dw_49", "IS_pds_l50_dw_50",  
                "IS_pds_l50_dw_1_ct", "IS_pds_l50_dw_2_ct", "IS_pds_l50_dw_3_ct", "IS_pds_l50_dw_4_ct", "IS_pds_l50_dw_5_ct", "IS_pds_l50_dw_6_ct", "IS_pds_l50_dw_7_ct", "IS_pds_l50_dw_8_ct", "IS_pds_l50_dw_9_ct", "IS_pds_l50_dw_10_ct", "IS_pds_l50_dw_11_ct", "IS_pds_l50_dw_12_ct", "IS_pds_l50_dw_13_ct", "IS_pds_l50_dw_14_ct", "IS_pds_l50_dw_15_ct", "IS_pds_l50_dw_16_ct", "IS_pds_l50_dw_17_ct", "IS_pds_l50_dw_18_ct", "IS_pds_l50_dw_19_ct", "IS_pds_l50_dw_20_ct", "IS_pds_l50_dw_21_ct", "IS_pds_l50_dw_22_ct", "IS_pds_l50_dw_23_ct", "IS_pds_l50_dw_24_ct", "IS_pds_l50_dw_25_ct", "IS_pds_l50_dw_26_ct", "IS_pds_l50_dw_27_ct", "IS_pds_l50_dw_28_ct", "IS_pds_l50_dw_29_ct", "IS_pds_l50_dw_30_ct", "IS_pds_l50_dw_31_ct", "IS_pds_l50_dw_32_ct", "IS_pds_l50_dw_33_ct", "IS_pds_l50_dw_34_ct", "IS_pds_l50_dw_35_ct", "IS_pds_l50_dw_36_ct", "IS_pds_l50_dw_37_ct", "IS_pds_l50_dw_38_ct", "IS_pds_l50_dw_39_ct", "IS_pds_l50_dw_40_ct", "IS_pds_l50_dw_41_ct", "IS_pds_l50_dw_42_ct", "IS_pds_l50_dw_43_ct", "IS_pds_l50_dw_44_ct", "IS_pds_l50_dw_45_ct", "IS_pds_l50_dw_46_ct", "IS_pds_l50_dw_47_ct", "IS_pds_l50_dw_48_ct", "IS_pds_l50_dw_49_ct", "IS_pds_l50_dw_50_ct",
                "IS_pds_l50_ws", "IS_pds_l50_ws_ct"], axis = 1)

Creating df variants with only the columns we will need to generate the benchmark models 

In [233]:
df_filter=df1.copy()

In [234]:
#df_model1 = df_filter[["IS2_ST(m)", "IS_pds_l5_dw"]]
#df_model1 = df_filter[["IS2_ST(m)", "IS_pds_l7_dw"]]
#df_model1 = df_filter[["IS2_ST(m)", "IS_pds_l8_dw"]]
#df_model1 = df_filter[["IS2_ST(m)", "IS_pds_l9_dw"]]
#df_model1 = df_filter[["IS2_ST(m)", "IS_pds_l10_dw"]]
df_model1 = df_filter[["IS2_ST(m)", "IS_pds_l11_dw"]]
#df_model1 = df_filter[["IS2_ST(m)", "IS_pds_l12_dw"]]
#df_model1 = df_filter[["IS2_ST(m)", "IS_pds_l15_dw"]]
#df_model1 = df_filter[["IS2_ST(m)", "IS_pds_l20_dw"]]
#df_model1 = df_filter[["IS2_ST(m)", "IS_pds_l25_dw"]]
#df_model1 = df_filter[["IS2_ST(m)", "IS_pds_l40_dw"]]
#df_model1 = df_filter[["IS2_ST(m)", "IS_pds_l50_dw"]]
#df_model1 = df_filter[["IS2_ST(m)", "IS_pds_l35_dw"]]
#df_model1 = df_filter[["IS2_ST(m)", "IS_pds_l45_dw"]]
#df_model1 = df_filter[["IS2_ST(m)", "overall_day_mean_IST_ST(m)"]]

In [235]:
df_model1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 979 entries, 0 to 1219
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   IS2_ST(m)      979 non-null    float64
 1   IS_pds_l11_dw  973 non-null    float64
dtypes: float64(2)
memory usage: 22.9 KB


### Train Test Split

In [236]:
len(df_model1) * .80, len(df_model1) * .20

(783.2, 195.8)

In [237]:
X_train, X_test, y_train, y_test = train_test_split(df_model1.drop(columns='IS2_ST(m)'), 
                                                    df_model1["IS2_ST(m)"], test_size=0.20, 
                                                    random_state=12)

In [238]:
y_train.shape, y_test.shape

((783,), (196,))

In [239]:
y_train

916      4.283333
460     15.900000
970      6.066667
22       6.250000
461     29.033333
          ...    
259      7.033333
271      7.500000
426     18.350000
891      4.200000
1084    14.283333
Name: IS2_ST(m), Length: 783, dtype: float64

In [240]:
X_train.shape, X_test.shape

((783, 1), (196, 1))

In [241]:
y_train.mean()

18.280757769263513

### Benchmark Linear Model Based on Last N Day-Specific Puzzles With X Decay Weighting

In [242]:
X_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 783 entries, 916 to 1084
Data columns (total 1 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   IS_pds_l11_dw  777 non-null    float64
dtypes: float64(1)
memory usage: 12.2 KB


In [243]:
lr_pipe = make_pipeline(
    SimpleImputer(strategy='median'), 
    StandardScaler(),
    SelectKBest(f_regression),
    LinearRegression()
)

In [244]:
#Dict of available parameters for linear regression pipe
lr_pipe.get_params().keys()

dict_keys(['memory', 'steps', 'verbose', 'simpleimputer', 'standardscaler', 'selectkbest', 'linearregression', 'simpleimputer__add_indicator', 'simpleimputer__copy', 'simpleimputer__fill_value', 'simpleimputer__missing_values', 'simpleimputer__strategy', 'simpleimputer__verbose', 'standardscaler__copy', 'standardscaler__with_mean', 'standardscaler__with_std', 'selectkbest__k', 'selectkbest__score_func', 'linearregression__copy_X', 'linearregression__fit_intercept', 'linearregression__n_jobs', 'linearregression__normalize', 'linearregression__positive'])

In [245]:
#Define search grid parameters
k = [k+1 for k in range(len(X_train.columns))]

grid_params = {
    'standardscaler': [StandardScaler(), None],
    'simpleimputer__strategy': ['mean', 'median'],
    'selectkbest__k': k
}

In [246]:
#Call `GridSearchCV` with linear regression pipeline, passing in the above `grid_params`
#dict for parameters to evaluate with 5-fold cross-validation
lr_grid_cv = GridSearchCV(lr_pipe, param_grid=grid_params, cv=5)

In [247]:
#Conduct grid search for this model
lr_grid_cv.fit(X_train, y_train)

GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('simpleimputer',
                                        SimpleImputer(strategy='median')),
                                       ('standardscaler', StandardScaler()),
                                       ('selectkbest',
                                        SelectKBest(score_func=<function f_regression at 0x0000027F7B5E2310>)),
                                       ('linearregression',
                                        LinearRegression())]),
             param_grid={'selectkbest__k': [1],
                         'simpleimputer__strategy': ['mean', 'median'],
                         'standardscaler': [StandardScaler(), None]})

In [248]:
#Best params from grid search for this model
lr_grid_cv.best_params_

{'selectkbest__k': 1,
 'simpleimputer__strategy': 'mean',
 'standardscaler': StandardScaler()}

### Linear Model Metrics From RPB Variant

#### R-squared (COD)

In [249]:
#Cross-validation defaults to R^2 metric for scoring regression
lr_best_cv_results = cross_validate(lr_grid_cv.best_estimator_, X_train, y_train, cv=5)
lr_best_scores = lr_best_cv_results['test_score']
lr_best_scores

array([0.56315828, 0.56447104, 0.51935066, 0.59466409, 0.37971054])

In [250]:
#Training set CV mean and std
np.mean(lr_best_scores), np.std(lr_best_scores)

(0.524270921237854, 0.0761648054135865)

#### Mean Absolute Error (MAE)

In [251]:
lr_neg_mae = cross_validate(lr_grid_cv.best_estimator_, X_train, y_train, 
                            scoring='neg_mean_absolute_error', cv=5, n_jobs=-1)

In [252]:
# Training set MAE and STD 
lr_mae_mean = np.mean(-1 * lr_neg_mae['test_score'])
lr_mae_std = np.std(-1 * lr_neg_mae['test_score'])
MAE_LR_train = lr_mae_mean, lr_mae_std
MAE_LR_train

(5.950515259625978, 0.44571858899386857)

In [253]:
# Test set mean
MAE_LR_test = mean_absolute_error(y_test, lr_grid_cv.best_estimator_.predict(X_test))
MAE_LR_test

6.15798386523689

#### Mean Squared Error (MSE)

In [254]:
lr_neg_mse = cross_validate(lr_grid_cv.best_estimator_, X_train, y_train, 
                            scoring='neg_mean_squared_error', cv=5)

In [255]:
#Training set CV mean and std
lr_mse_mean = np.mean(-1 * lr_neg_mse['test_score'])
lr_mse_std = np.std(-1 * lr_neg_mse['test_score'])
MSE_LR_train = lr_mse_mean, lr_mse_std
MSE_LR_train

(91.85462561839417, 21.305270250360547)

In [256]:
# Test set mean
MSE_LR_test = mean_squared_error(y_test, lr_grid_cv.best_estimator_.predict(X_test))
MSE_LR_test

99.57740064486778

#### Root Mean Square Error (RMSE)

In [257]:
lr_neg_rmse = cross_validate(lr_grid_cv.best_estimator_, X_train, y_train, 
                            scoring='neg_root_mean_squared_error', cv=5)

In [258]:
#Training set CV mean and std
lr_rmse_mean = np.mean(-1 * lr_neg_rmse['test_score'])
lr_rmse_std = np.std(-1 * lr_neg_rmse['test_score'])
RMSE_LR_train = lr_rmse_mean, lr_rmse_std
RMSE_LR_train

(9.51710437905382, 1.1310834878950187)

In [259]:
# Test set mean
RMSE_LR_test = np.sqrt(mean_squared_error(y_test, lr_grid_cv.best_estimator_.predict(X_test)))
RMSE_LR_test

9.978847661171493

IS2 Updated run (03/02/2024)
Note: Random state is now 12- good target mean balance between training and testing

Previous 8-day-specific puzzles weighting: (8,7,6,5,4,3,2,1)
Training: (9.798881746609208, 1.2134187618799477), Testing: 10.964549688578016

Previous 8-day-specific puzzles NO decay weighting: (1,1,1,1,1,1,1,1)
Training: (9.60054896054275, 1.159448916762027), Testing: 10.578549319186436

Previous 9-day-specific puzzles weighting: (9,8,7,6,5,4,3,2,1)
Training: (9.734837015637146, 1.151007242076803), Testing: 10.745443894396468

Previous 9-day-specific puzzles NO decay weighting: (1,1,1,1,1,1,1,1,1)
Training: (9.550125901307718, 1.1049239093196783), Testing: 10.452699792245369

Previous 10-day-specific puzzles weighting: (10,9,8,7,6,5,4,3,2,1)
Training: (9.655857747402898, 1.2016019547341474), Testing: 10.190371260283799

******************************************************************************
Previous 10-day-specific puzzles NO decay weighting: (1,1,1,1,1,1,1,1,1,1)
Training: (9.497224208481256, 1.125336347282818), Testing: 10.132585279402853
********************************************************************************

Previous 11-day-specific puzzles NO decay weighting: (1,1,1,1,1,1,1,1,1,1,1)
Training: (9.51710437905382, 1.1310834878950187), Testing: 9.978847661171493

Previous 15-day-specific puzzles NO decay weighting: (1,1,1...)
Training: (9.507868456445484, 1.1620422128668335), 9.947347911135095 

IS2

Results for Different Decay-Variants (RMSE) deviation in minutes
This series 2018-2019 solves HAVE been removed

Previous 5-day-specific puzzles weighting: (5,4,3,2,1)
Training: (9.786545944069342, 0.4609918425623554)   Testing: 11.222689117530397

Previous 5-day-specific puzzles NO decay weighting: (1,1,1,1,1)
Training: (9.548456581346477, 0.4431892030651494)  Testing: 11.00065563982726

Previous 7-day-specific puzzles weighting: (7,6,5,4,3,2,1)
Training: (9.806541380298015, 0.5664850693731814)  Testing: 10.817209017061433

Previous 7-day-specific puzzles NO decay weighting: (1,1,1,1,1,1,1)
Training: (9.492289552498136, 0.5192018228288098), Testing: 10.761158393428632

Previous 8-day-specific puzzles weighting: (8,7,6,5,4,3,2,1)
Training: (9.802542304702078, 0.6417049871979194), Testing: 10.915138758233237

Previous 8-day-specific puzzles NO decay weighting: (1,1,1,1,1,1,1,1)
Training: (9.474008660477427, 0.573193097902938) , Testing: 10.803890029991948

Previous 9-day-specific puzzles weighting: (9,8,7,6,5,4,3,2,1)
Training: (9.75857267908731, 0.6931573002748672), Testing: 10.572225170046044

Previous 9-day-specific puzzles NO decay weighting: (1,1,1,1,1,1,1,1,1)
Training: (9.436479947578825, 0.6038620330153149), Testing: 10.614684380656712


Previous 10-day-specific puzzles weighting: (10,9,8,7,6,5,4,3,2,1)
Training: (9.615157259904759, 0.7242625018536559), Testing: 10.18868615738438

*******************************************************************************
Previous 10-day-specific puzzles NO decay weighting: (1,1,1,1,1,1,1,1,1,1)
Training: (9.34929335656066, 0.5849791869617799), Testing: 10.391901127775267
*******************************************************************************

Previous 11-day-specific puzzles weighting: (11,10,9,8,7,6,5,4,3,2,1)
Training:(9.664292058681028, 0.8196239562890933), Testing: 10.029637439856753

Previous 11-day-specific puzzles NO decay weighting: (1,1,1,1,1,1,1,1,1,1,1)
Training: (9.380887036399496, 0.6437357395388599), Testing: 10.25666393612052

Previous 12-day-specific puzzles weighting: (12,11,10,9,8,7,6,5,4,3,2,1)
Training: (9.653058986014615, 0.8218199404432892), Testing: 9.926979615308175

Previous 12-day-specific puzzles NO decay weighting: (1,1,1,1,1,1,1,1,1,1,1,1)
Training: (9.38024249929179, 0.6604850961509687), Testing: 10.14766341005752

Previous 15-day-specific puzzles weighting: (15,14,13...)
Training: (9.689432705603844, 0.8340069034156191), 10.275485603646484

Previous 15-day-specific puzzles NO decay weighting: (1,1,1...)
Training: (9.410895620301494, 0.6567257107625311), 10.253007599276069 