# Sandbox to create prototype models for game prediction

Model list:
1. Linear regression using OLS

In [28]:
import sys
sys.path.append('/Users/tanyatang/Documents/Code/python/50_in_07/app')
sys.path.append('/Users/tanyatang/Documents/Code/python/50_in_07/venv/lib/python3.7/site-packages')

In [94]:
import app.data_preparation.database_connector as db
import pickle

## Model 1: Linear regression

Use player per game information to predict player goals in future games

In [95]:
from sklearn import linear_model as lm
db_name = '50_in_07'
current_db, current_cursor = db.connect_db(db_name)

Connecting to SQL database...
Connected to SQL database


Get player information to encode player id's to integers

In [40]:
player_dict = {}
current_cursor.execute("""
SELECT player_id FROM player_info
""")
for i, player in enumerate(current_cursor.fetchall()):
    player_dict[player[0]] = i

Get relevant game information\
Dictionary keys are game ID's, values are 7-item tuples with elements:
1. Player ID (integer encoding)
2. Team ID (integer encoding)
3. Month of game
4. Away team
5. Home team
6. Goals scored by player

In [83]:
game_dict = {}
current_cursor.execute("""
SELECT a.game_id, player_id, team_id, MONTH(date_time), away_team_id, home_team_id, goals
FROM (
SELECT game_id, player_id, team_id, goals
FROM skater
) AS a
JOIN (
SELECT game_id, date_time, away_team_id, home_team_id
FROM game
) AS b
ON a.game_id = b.game_id
""")
for result in current_cursor.fetchall():
    player = player_dict[result[1]]
    game_dict[result[0]] = (player, result[2], result[3], result[4], result[5], result[6])

Need to only include goals for player's own team, features are 3-item tuples with elements:
1. Player ID (integer encoding)
2. Team ID (integer encoding)
3. Month of game
4. Away or home team (0-1 encoding)

Target is goals scored:
1. Goals scored by player

In [84]:
features = []
targets = []
for game_id in game_dict.keys():
    current_sample = game_dict[game_id]
    targets.append(current_sample[5])
    if current_sample[1] == current_sample[3]:
        features.append((current_sample[0], current_sample[1], current_sample[2], 0))
    elif current_sample[1] == current_sample[4]:
        features.append((current_sample[0], current_sample[1], current_sample[2], 1))

In [85]:
linear_ols = lm.LinearRegression().fit(features, targets)

Test model on Febuary games for Mitch Marner of the Toronto Maple Leafs

In [89]:
marner_id = 8478483
mini_test = [[player_dict[marner_id], 10, 2, 1],
            [player_dict[marner_id], 10, 2, 0]]
mini_test_results = linear_ols.predict(mini_test)
print('Predicted goals for home game in February by Mitch Marner:', mini_test_results[0])
print('Predicted goals for away game in February by Mitch Marner:', mini_test_results[1])

Predicted goals for home game in February by Mitch Marner: 0.15973675864986914
Predicted goals for away game in February by Mitch Marner: -0.00801852635517842


Save model to archive

In [92]:
pickle.dump(linear_ols, open("/Users/tanyatang/Documents/Code/python/50_in_07/app/models/archived_models/linear_ols_1.sav", 'wb'))

Disconnect from database

In [96]:
db.disconnect_db(current_db, current_cursor)

Committing database changes...
Disconnecting from SQL database...
Disconnected from SQL database
