# NHL API Shot Location Data

Tyler Young May 2019

Continuation of working with shot location data, but now looking at entire NHL 2018-2019 season, not just Sharks games.

---

## Collect Game IDs from the 2018-2019 Regular Season

In [1]:
import requests
import json
import pandas as pd
pd.options.display.max_rows = 9999

In [2]:
response_teams = requests.get("https://statsapi.web.nhl.com/api/v1/teams")
teams = response_teams.json()

This year the season went from Oct 3rd, 2018 to Apr 6th, 2019.

In [3]:
response = requests.get("https://statsapi.web.nhl.com/api/v1/schedule?startDate=2018-10-03&endDate=2019-04-06")
games = response.json()

In [4]:
def days_games(day):
    days_games = []
    for i in range(0, len(day)):
        days_games.append(day[i]['gamePk'])
    return days_games

In [5]:
def season_games(games):
    list_of_seasons_game_ids = []
    for i in range(0, len(games)):
        list_of_seasons_game_ids = list_of_seasons_game_ids + days_games(games[i]['games'])
    return list_of_seasons_game_ids

In [6]:
list_of_game_ids = season_games(games['dates'])

In [7]:
df_game_ids = pd.DataFrame(list_of_game_ids, columns = ['game_ids'])

Game IDs from All-Start event and a preseason game. Let's drop these.

2018010110

2018040641

2018040642

2018040643

In [8]:
df_game_ids = df_game_ids[(df_game_ids.game_ids != 2018040641) & (df_game_ids.game_ids != 2018040642) 
                          & (df_game_ids.game_ids != 2018040643) & (df_game_ids.game_ids != 2018010110)]

---

## Scale up to get data from all games in the season

First I pull the list of game IDs from before.

In [9]:
list_of_gameIDs = df_game_ids['game_ids']
list_of_gameIDs[0:5]

1    2018020001
2    2018020002
3    2018020003
4    2018020004
5    2018020005
Name: game_ids, dtype: int64

Function to loop through all games from the list created before and make a data frame of all the plays from the season.

In [10]:
def game_data(list_of_gameIDs):
    df_all_plays = pd.DataFrame()
    for ID in list_of_gameIDs:
        r = requests.get("https://statsapi.web.nhl.com/api/v1/game/"+str(ID)+"/feed/live")
        s_game = r.json()
        df_s_game = pd.DataFrame(s_game['liveData']['plays']['allPlays'])
        df_all_plays = df_all_plays.append(df_s_game)
    return df_all_plays.reset_index(drop = True)

Collect all live game data from the entire 2018-2019 regular season. 1,271 games total.

In [132]:
df_play = game_data(list_of_gameIDs)

In [203]:
#Creating variable to remove events that do not have coordinates. 
#These 'events' without coordinates are things like start and end of a period. For this analysis we do not need those data points.
df_play['has_coordinates'] = df_play['coordinates'].apply(lambda x: 1 if all (k in x for k in ("x","y")) else 0)
df_play = df_play[df_play['has_coordinates']==1].reset_index(drop = True)


Create variables we want to analyze by extracting data from the dictionaries within our columns.

In [206]:
df_play['date'] = df_play['about'].apply(lambda x: x['dateTime'].split('T')[0])
df_play['event'] = df_play['result'].apply(lambda x: x['event'])
df_play['eventTypeId'] = df_play['result'].apply(lambda x: x['eventTypeId'])
df_play['description'] = df_play['result'].apply(lambda x: x['description'])
df_play['period'] = df_play['about'].apply(lambda x: x['period'])
df_play['periodType'] = df_play['about'].apply(lambda x: x['periodType'])
df_play['periodTimeRemaining'] = df_play['about'].apply(lambda x: x['periodTimeRemaining'])
df_play['xcoord'] = df_play['coordinates'].apply(lambda x: x['x'])
df_play['ycoord'] = df_play['coordinates'].apply(lambda x: x['y'])
df_play['player1_team'] = df_play['team'].apply(lambda x: x['name'])
df_play['player1_name'] = df_play['players'].apply(lambda x: x[0]['player']['fullName'])
df_play['player1_type'] = df_play['players'].apply(lambda x: x[0]['playerType'])
df_play['player2_name'] = df_play['players'].apply(lambda x: x[1]['player']['fullName'] if len(x)>1 else None)
df_play['player2_type'] = df_play['players'].apply(lambda x: x[1]['playerType'] if len(x)>1 else None)
df_play['player3_name'] = df_play['players'].apply(lambda x: x[2]['player']['fullName'] if len(x)>2 else None)
df_play['player3_type'] = df_play['players'].apply(lambda x: x[2]['playerType'] if len(x)>2 else None)
df_play['player4_name'] = df_play['players'].apply(lambda x: x[3]['player']['fullName'] if len(x)>3 else None)
df_play['player4_type'] = df_play['players'].apply(lambda x: x[3]['playerType'] if len(x)>3 else None)

Create clean version of dataframe by dropping columns no longer needed.

In [208]:
df_all_plays_reg_season = df_play.copy()
df_all_plays_reg_season.drop(['about','players','result','team','has_coordinates'], inplace = True, axis = 1)
df_all_plays_reg_season.head(3)

Unnamed: 0,coordinates,date,event,eventTypeId,description,period,periodType,periodTimeRemaining,player1_team,player1_name,player1_type,player2_name,player2_type,player3_name,player3_type,player4_name,player4_type,has_coord,xcoord,ycoord
0,"{'x': 0.0, 'y': 0.0}",2018-10-03,Faceoff,FACEOFF,Max Domi faceoff won against Auston Matthews,1,REGULAR,20:00,Montréal Canadiens,Max Domi,Winner,Auston Matthews,Loser,,,,,1,0.0,0.0
1,"{'x': 78.0, 'y': -19.0}",2018-10-03,Shot,SHOT,Artturi Lehkonen Backhand saved by Frederik An...,1,REGULAR,19:31,Montréal Canadiens,Artturi Lehkonen,Shooter,Frederik Andersen,Goalie,,,,,1,78.0,-19.0
2,"{'x': -37.0, 'y': -10.0}",2018-10-03,Shot,SHOT,Morgan Rielly Snap Shot saved by Carey Price,1,REGULAR,19:11,Toronto Maple Leafs,Morgan Rielly,Shooter,Carey Price,Goalie,,,,,1,-37.0,-10.0


In [209]:
len(df_all_plays_reg_season)
#there are over 300,000 plays we collected from the season to analyze.

333288

In [210]:
#number of plays where Joe Thornton scored a goal.
len(df_all_plays_reg_season[(df_all_plays_reg_season['player1_name']=='Joe Thornton')&
                           (df_all_plays_reg_season['player1_type']=='Scorer')])

16

Just to check, I see how many plays occurred during the season where Joe Thornton was the goal scorer and found 16. According to the stats from the Sharks website, this checks out!

In [212]:
df_all_plays_reg_season.to_csv('2018-2019_reg_season_plays_all_nhl.csv')

---