## Data Set Construction and Break Up

For the sake of trying to see how we will perform in coming years we will hold out the last two years for test data which will ultimately be used to evaluate our model. There are some questions with this methodology though as these recent years could be biased in some way that is different from our training data. In fact, some of our data exploration shows just this: that the data has experienced some covariate shift over time as players changed. This however is fine though as we are assuming that this trend shall continue such that modern data is most representative of future data. Whether or not this is a fair assumption though will be shown with time. The reason to make use of the whole year rather than random sampling from it is that data within a year is related in the sense that a quarterback will always be throwing to a receiver such that if we train on how a quarterback performed in a particular game, we will have some guess as to how the receivers on that team performed. Whether the model is able to pick up on this or not is a different matter but training on this related data would pollute our data. 

This file serves to make two csvs: train.csv and test.csv where train.csv will be our training/validation data and test is our testing data. The testing data should never be used in model selection or training.

In [1]:
import pandas as pd
import numpy as np
from models.model_help import last_n_weeks, construct_model_vector

In [2]:
# Set dataset parameters
# number of previous games to include in analysis
n = 7

In [3]:
# Read in saved player data
players_df = pd.read_csv('data/game_pro_ftb_ref_top_220.csv')
players_df = players_df[players_df['pos'] != '0']

In [4]:
# Construct dataset vectors
unstandardized = construct_model_vector(players_df, n)

NEW LEN:
110
final vec: [ 0.     1.     0.     0.    27.191  0.     0.     0.     0.     0.
  0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
  0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
  0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
  0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
  0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
  0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
  0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
  0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
  0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
  0.     0.     0.     0.     0.     0.     0.     0.     0.     0.   ]
pos: [0. 1. 0. 0.]
age: [27.191]
games: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0

KeyboardInterrupt: 

In [None]:
# Standardize stat vectors
np.where(unstandardized[0][:, 3] == 1, unstandardized[0])

In [57]:
vals = players_df.loc[(players_df['full_name'] == 'Taysom Hill') & (players_df['pos'] == 'QB')]

In [58]:
players_df.loc[(players_df['full_name'] == 'Taysom Hill') & (players_df['pos'] == 'QB'), 'stub'] = ['/players/H/HillTa00.htm'] * vals.shape[0]

In [59]:
players_df[(players_df['full_name'] == 'Taysom Hill') & (players_df['pos'] == 'QB')]['stub']

68771    /players/H/HillTa00.htm
68772    /players/H/HillTa00.htm
68773    /players/H/HillTa00.htm
68774    /players/H/HillTa00.htm
68775    /players/H/HillTa00.htm
68776    /players/H/HillTa00.htm
68777    /players/H/HillTa00.htm
68778    /players/H/HillTa00.htm
68779    /players/H/HillTa00.htm
68780    /players/H/HillTa00.htm
68781    /players/H/HillTa00.htm
68782    /players/H/HillTa00.htm
68783    /players/H/HillTa00.htm
68784    /players/H/HillTa00.htm
68785    /players/H/HillTa00.htm
68786    /players/H/HillTa00.htm
71415    /players/H/HillTa00.htm
71416    /players/H/HillTa00.htm
71417    /players/H/HillTa00.htm
71418    /players/H/HillTa00.htm
71419    /players/H/HillTa00.htm
71420    /players/H/HillTa00.htm
71421    /players/H/HillTa00.htm
71422    /players/H/HillTa00.htm
71423    /players/H/HillTa00.htm
71424    /players/H/HillTa00.htm
71425    /players/H/HillTa00.htm
71426    /players/H/HillTa00.htm
71427    /players/H/HillTa00.htm
71428    /players/H/HillTa00.htm
71429    /

In [60]:
players_df[players_df['stub']== '0']

Unnamed: 0,full_name,pos,year,game_date,game_num,week_num,age,team,game_location,opp,...,defense,def_pct,special_teams,st_pct,unique_id,stub,reason,fantasy_points,5years,career_game


In [25]:
len(['/players/L/LewiMa00.htm'] * vals.shape[0])

112