In [65]:
import requests
import pandas as pd

pd.options.display.max_columns = None

More more helpful information on navigating the NHL Hockey API, see:

https://gitlab.com/dword4/nhlapi/tree/master/

## Play-by-Play Data

`GET https://statsapi.web.nhl.com/api/v1/game/ID/feed/live Returns` all data about a specified game id including play data with on-ice coordinates and post-game details like first, second and third stars and any details about shootouts. The data returned is simply too large at often over 30k lines and is best explored with a JSON viewer.

### Game ID Dictionary
* First 4 digits signify the season start year, ex: `2018` (for the 2018-2019 season)
* Next 2 digits signify the following:
    - 01: Preseason
    - 02: Regular Season
    - 03: Post-Season (Playoffs)
    - 04: All-Star Games
* The final 4 digits signify the game number. Valid range is `0001`-`1271` (until 2020, when the NHL will mandate that there will be 1312 games per season)

Note: for help visualizing JSON see - http://jsonviewer.stack.hu

## Buffalo Sabres vs Carolina Hurricanes

In [2]:
year = '2018'
season = '02'
game_number = '0683'
game_id = year + season + game_number

url = f'https://statsapi.web.nhl.com/api/v1/game/{game_id}/feed/live'
response = requests.get(url)

In [3]:
url

'https://statsapi.web.nhl.com/api/v1/game/2018020683/feed/live'

In [4]:
json = response.json()

We will first explore the data without any specific goal in mind to get familiar with the format that the data is provided in.

In [5]:
plays = json.get('liveData').get('plays').get('allPlays')

In [6]:
faceoffs = [play.get('result') for play in plays if play.get('result').get('event') == 'Faceoff']

In [7]:
events = [play.get('result').get('event') for play in plays]
event_ids = [play.get('about').get('eventId') for play in plays]
periods = [play.get('about').get('period') for play in plays]

unique_events = list(set(events))
unique_event_ids = list(set(event_ids))

In [8]:
event_dict = {}

for event, eid, period in zip(events, event_ids, periods):
    if event in event_dict:
        # key exists, add event id and period to the dict
        event_dict[event]['eventId'].append(eid)
        event_dict[event]['period'].append(period)
    else:
        # key does not exist, create the nested dict
        event_dict[event] = {'eventId' : [],
                             'period' : []}

In [18]:
event_dict = {}

for event, eid in zip(events, event_ids):
    if event in event_dict:
        # key exsists, append to list
        event_dict[event].append(eid)
    else:
        # key does not exists, create an empty list
        event_dict[event] = []

In [19]:
event_dict

{'Blocked Shot': [19,
  24,
  28,
  30,
  45,
  48,
  49,
  312,
  325,
  329,
  331,
  335,
  341,
  349,
  502,
  505],
 'Faceoff': [66,
  71,
  25,
  29,
  84,
  88,
  91,
  34,
  98,
  201,
  211,
  42,
  43,
  47,
  218,
  221,
  222,
  226,
  234,
  314,
  317,
  240,
  243,
  244,
  247,
  328,
  334,
  336,
  337,
  339,
  413,
  415,
  347,
  422,
  503],
 'Game Scheduled': [],
 'Giveaway': [57,
  59,
  63,
  89,
  99,
  206,
  214,
  228,
  229,
  238,
  239,
  250,
  408,
  409,
  419,
  420,
  426],
 'Goal': [303, 320, 321, 324, 346],
 'Hit': [58,
  60,
  61,
  62,
  64,
  67,
  68,
  69,
  72,
  75,
  76,
  79,
  80,
  85,
  92,
  94,
  95,
  202,
  205,
  216,
  223,
  224,
  227,
  230,
  242,
  249,
  402,
  404,
  406,
  410,
  411,
  414,
  417,
  418],
 'Missed Shot': [10,
  11,
  17,
  18,
  21,
  31,
  37,
  38,
  44,
  311,
  315,
  316,
  323,
  330,
  342,
  345,
  348,
  350,
  504],
 'Penalty': [219, 232],
 'Period End': [],
 'Period Official': [],
 'Period Re

In [9]:
[(play.get('about').get('eventIdx'), 
  play.get('about').get('eventId'), 
  play.get('result').get('event')) for play in plays]

[(0, 1, 'Game Scheduled'),
 (1, 4, 'Period Ready'),
 (2, 52, 'Period Start'),
 (3, 53, 'Faceoff'),
 (4, 54, 'Giveaway'),
 (5, 55, 'Takeaway'),
 (6, 56, 'Hit'),
 (7, 57, 'Giveaway'),
 (8, 7, 'Shot'),
 (9, 12, 'Shot'),
 (10, 14, 'Missed Shot'),
 (11, 8, 'Shot'),
 (12, 58, 'Hit'),
 (13, 9, 'Shot'),
 (14, 59, 'Giveaway'),
 (15, 60, 'Hit'),
 (16, 61, 'Hit'),
 (17, 62, 'Hit'),
 (18, 10, 'Missed Shot'),
 (19, 11, 'Missed Shot'),
 (20, 63, 'Giveaway'),
 (21, 64, 'Hit'),
 (22, 65, 'Stoppage'),
 (23, 66, 'Faceoff'),
 (24, 67, 'Hit'),
 (25, 68, 'Hit'),
 (26, 69, 'Hit'),
 (27, 13, 'Shot'),
 (28, 70, 'Stoppage'),
 (29, 71, 'Faceoff'),
 (30, 15, 'Blocked Shot'),
 (31, 16, 'Shot'),
 (32, 17, 'Missed Shot'),
 (33, 72, 'Hit'),
 (34, 18, 'Missed Shot'),
 (35, 21, 'Missed Shot'),
 (36, 19, 'Blocked Shot'),
 (37, 73, 'Takeaway'),
 (38, 74, 'Takeaway'),
 (39, 20, 'Shot'),
 (40, 75, 'Hit'),
 (41, 22, 'Shot'),
 (42, 76, 'Hit'),
 (43, 23, 'Shot'),
 (44, 24, 'Blocked Shot'),
 (45, 77, 'Stoppage'),
 (46, 25, 'F

In [10]:
[ (play.get('players')[0].get('player').get('fullName'),
   play.get('players')[1].get('player').get('fullName') )
 for play in plays
 if play.get('result').get('event') == 'Faceoff' ]

[('Jack Eichel', 'Sebastian Aho'),
 ('Vladimir Sobotka', 'Lucas Wallmark'),
 ('Greg McKegg', 'Evan Rodrigues'),
 ('Lucas Wallmark', 'Kyle Okposo'),
 ('Victor Rask', 'Johan Larsson'),
 ('Evan Rodrigues', 'Victor Rask'),
 ('Jack Eichel', 'Sebastian Aho'),
 ('Sebastian Aho', 'Jack Eichel'),
 ('Lucas Wallmark', 'Vladimir Sobotka'),
 ('Johan Larsson', 'Justin Williams'),
 ('Johan Larsson', 'Victor Rask'),
 ('Vladimir Sobotka', 'Greg McKegg'),
 ('Vladimir Sobotka', 'Greg McKegg'),
 ('Greg McKegg', 'Sam Reinhart'),
 ('Lucas Wallmark', 'Jeff Skinner'),
 ('Victor Rask', 'Jack Eichel'),
 ('Jeff Skinner', 'Lucas Wallmark'),
 ('Johan Larsson', 'Greg McKegg'),
 ('Jack Eichel', 'Sebastian Aho'),
 ('Jack Eichel', 'Sebastian Aho'),
 ('Lucas Wallmark', 'Jeff Skinner'),
 ('Kyle Okposo', 'Sebastian Aho'),
 ('Johan Larsson', 'Victor Rask'),
 ('Jack Eichel', 'Sebastian Aho'),
 ('Vladimir Sobotka', 'Greg McKegg'),
 ('Johan Larsson', 'Lucas Wallmark'),
 ('Lucas Wallmark', 'Johan Larsson'),
 ('Sebastian Aho',

Now that we've become familiar with the data, we are going to extract some information that will be helpful in our analysis.

First, we will start simple and extract the names and codes of the home and visiting teams.

In [21]:
away_team = json.get('gameData').get('teams').get('away').get('name')
home_team = json.get('gameData').get('teams').get('home').get('name')

away_tri_code = json.get('gameData').get('teams').get('away').get('triCode')
home_tri_code = json.get('gameData').get('teams').get('home').get('triCode')

print(f'Home: {home_team} ({home_tri_code})')
print(f'Away: {away_team} ({away_tri_code})')

Home: Carolina Hurricanes (CAR)
Away: Buffalo Sabres (BUF)


Next, we will create a pandas DataFrame of all player information, both home and visitors, for the given game. This will be useful down the line when analyzing each event or play.

In [40]:
players = json.get('gameData').get('players')

In [68]:
player_data = {}

for player, data in players.items():
    for key, val in data.items():
        if key in player_data:
            player_data[key].append(val)
        else:
            player_data[key] = []

It looks like we have captured all the information we needed. However, the `currentTeam` and the `primaryPosition` keys don't contain the data in the format we need, so we will need to clean those two columns a little bit more.

In [69]:
for key, val in player_data.items():
    print(key, val[:5])

id [8477839, 8471436, 8477998, 8478427, 8475753]
fullName ['Conor Sheary', 'Matt Hunwick', 'Warren Foegele', 'Sebastian Aho', 'Justin Faulk']
link ['/api/v1/people/8477839', '/api/v1/people/8471436', '/api/v1/people/8477998', '/api/v1/people/8478427', '/api/v1/people/8475753']
firstName ['Conor', 'Matt', 'Warren', 'Sebastian', 'Justin']
lastName ['Sheary', 'Hunwick', 'Foegele', 'Aho', 'Faulk']
primaryNumber ['43', '48', '13', '20', '27']
birthDate ['1992-06-08', '1985-05-21', '1996-04-01', '1997-07-26', '1992-03-20']
currentAge [26, 33, 22, 21, 26]
birthCity ['Winchester', 'Warren', 'Markham', 'Rauma', 'South St.Paul']
birthStateProvince ['MA', 'MI', 'ON', 'MN', 'NY']
birthCountry ['USA', 'USA', 'CAN', 'FIN', 'USA']
nationality ['USA', 'USA', 'CAN', 'FIN', 'USA']
height ['5\' 8"', '5\' 11"', '6\' 2"', '6\' 0"', '6\' 0"']
weight [176, 194, 198, 176, 217]
active [True, True, True, True, True]
alternateCaptain [False, False, False, False, True]
captain [False, False, False, False, False]


In [70]:
current_team = [x.get('name') for x in player_data['currentTeam']]
current_team_code = [x.get('triCode') for x in player_data['currentTeam']]
primary_position = [x.get('abbreviation') for x in player_data['primaryPosition']]

We will now re-assign the cleaned data back to our player data dictionary, and then use it to construct a pandas DataFrame to be able to explore the data easier.

In [71]:
player_data['currentTeam'] = current_team
player_data['currentTeamCode'] = current_team_code
player_data['primaryPosition'] = primary_position

We notice that because some players are not born in the US/Canada, there is not field for `birthStateProvince` for them. Therefore we will just drop this key from our data.

In [72]:
for key, val in player_data.items():
    print(key, ' : ', len(val))

id  :  41
fullName  :  41
link  :  41
firstName  :  41
lastName  :  41
primaryNumber  :  41
birthDate  :  41
currentAge  :  41
birthCity  :  41
birthStateProvince  :  29
birthCountry  :  41
nationality  :  41
height  :  41
weight  :  41
active  :  41
alternateCaptain  :  41
captain  :  41
rookie  :  41
shootsCatches  :  41
rosterStatus  :  41
currentTeam  :  41
primaryPosition  :  41
currentTeamCode  :  41


In [73]:
del player_data['birthStateProvince']

Construct a pandas DataFrame

In [74]:
columns = [
    'player_id',
    'full_name',
    'link',
    'first_name',
    'last_name',
    'primary_number',
    'birth_date',
    'current_age',
    'birth_city',
    'birth_country',
    'nationality',
    'height',
    'weight',
    'active',
    'alternate_captain',
    'captain',
    'rookie',
    'shoots_catches',
    'roster_status',
    'current_team',
    'primary_position',
    'current_team_code'
]

player_data = pd.DataFrame(player_data)
player_data.columns = columns

In [75]:
player_data.head()

Unnamed: 0,player_id,full_name,link,first_name,last_name,primary_number,birth_date,current_age,birth_city,birth_country,nationality,height,weight,active,alternate_captain,captain,rookie,shoots_catches,roster_status,current_team,primary_position,current_team_code
0,8477839,Conor Sheary,/api/v1/people/8477839,Conor,Sheary,43,1992-06-08,26,Winchester,USA,USA,"5' 8""",176,True,False,False,False,L,Y,Buffalo Sabres,LW,BUF
1,8471436,Matt Hunwick,/api/v1/people/8471436,Matt,Hunwick,48,1985-05-21,33,Warren,USA,USA,"5' 11""",194,True,False,False,False,L,Y,Buffalo Sabres,D,BUF
2,8477998,Warren Foegele,/api/v1/people/8477998,Warren,Foegele,13,1996-04-01,22,Markham,CAN,CAN,"6' 2""",198,True,False,False,True,L,Y,Carolina Hurricanes,LW,CAR
3,8478427,Sebastian Aho,/api/v1/people/8478427,Sebastian,Aho,20,1997-07-26,21,Rauma,FIN,FIN,"6' 0""",176,True,False,False,False,L,Y,Carolina Hurricanes,C,CAR
4,8475753,Justin Faulk,/api/v1/people/8475753,Justin,Faulk,27,1992-03-20,26,South St.Paul,USA,USA,"6' 0""",217,True,True,False,False,R,Y,Carolina Hurricanes,D,CAR


The data looks good, but we notice that there is no column that explicitly states if the player is on the home/away team so we will add that in to our data. This will be helpful when joining "players on ice" data from a different source.

In [77]:
home_away_map = {
    home_tri_code : 'Home',
    away_tri_code : 'Away'
}

player_data['home_away'] = player_data['current_team_code'].map(home_away_map)

In [78]:
player_data.head()

Unnamed: 0,player_id,full_name,link,first_name,last_name,primary_number,birth_date,current_age,birth_city,birth_country,nationality,height,weight,active,alternate_captain,captain,rookie,shoots_catches,roster_status,current_team,primary_position,current_team_code,home_away
0,8477839,Conor Sheary,/api/v1/people/8477839,Conor,Sheary,43,1992-06-08,26,Winchester,USA,USA,"5' 8""",176,True,False,False,False,L,Y,Buffalo Sabres,LW,BUF,Away
1,8471436,Matt Hunwick,/api/v1/people/8471436,Matt,Hunwick,48,1985-05-21,33,Warren,USA,USA,"5' 11""",194,True,False,False,False,L,Y,Buffalo Sabres,D,BUF,Away
2,8477998,Warren Foegele,/api/v1/people/8477998,Warren,Foegele,13,1996-04-01,22,Markham,CAN,CAN,"6' 2""",198,True,False,False,True,L,Y,Carolina Hurricanes,LW,CAR,Home
3,8478427,Sebastian Aho,/api/v1/people/8478427,Sebastian,Aho,20,1997-07-26,21,Rauma,FIN,FIN,"6' 0""",176,True,False,False,False,L,Y,Carolina Hurricanes,C,CAR,Home
4,8475753,Justin Faulk,/api/v1/people/8475753,Justin,Faulk,27,1992-03-20,26,South St.Paul,USA,USA,"6' 0""",217,True,True,False,False,R,Y,Carolina Hurricanes,D,CAR,Home


Now that we have a table of all the player information, let's further explore the live game data.

In [95]:
plays = json.get('liveData').get('plays').get('allPlays')

In [108]:
play = plays[0]

In [109]:
play

{'result': {'event': 'Game Scheduled',
  'eventCode': 'CAR1',
  'eventTypeId': 'GAME_SCHEDULED',
  'description': 'Game Scheduled'},
 'about': {'eventIdx': 0,
  'eventId': 1,
  'period': 1,
  'periodType': 'REGULAR',
  'ordinalNum': '1st',
  'periodTime': '00:00',
  'periodTimeRemaining': '20:00',
  'dateTime': '2019-01-11T23:43:12Z',
  'goals': {'away': 0, 'home': 0}},
 'coordinates': {}}

In [118]:
faceoffs = {
    'event_id' : [],
    'period' : [],
    'period_time' : [],
    'winning_player_id' : [],
    'winning_player_name' : [],
    'losing_player_id' : [],
    'losing_player_name' : [],
}

for play in plays:
    if play.get('result').get('event') != 'Faceoff':
        continue
        
    faceoffs['event_id'].append(play.get('about').get('eventId'))
    faceoffs['period'].append(play.get('about').get('period'))
    faceoffs['period_time'].append(play.get('about').get('periodTime'))
    
    winning_player = play.get('players')[0]
    losing_player = play.get('players')[1]
    
    faceoffs['winning_player_id'].append(winning_player.get('player').get('id'))
    faceoffs['winning_player_name'].append(winning_player.get('player').get('fullName'))
    
    faceoffs['losing_player_id'].append(losing_player.get('player').get('id'))
    faceoffs['losing_player_name'].append(losing_player.get('player').get('fullName'))
    
faceoffs = pd.DataFrame(faceoffs)

In [119]:
faceoffs.head()

Unnamed: 0,event_id,period,period_time,winning_player_id,winning_player_name,losing_player_id,losing_player_name
0,53,1,00:00,8478403,Jack Eichel,8478427,Sebastian Aho
1,66,1,04:15,8471743,Vladimir Sobotka,8478027,Lucas Wallmark
2,71,1,04:56,8475735,Greg McKegg,8478542,Evan Rodrigues
3,25,1,08:04,8478027,Lucas Wallmark,8473449,Kyle Okposo
4,29,1,09:09,8476437,Victor Rask,8475728,Johan Larsson


Now we have a clean dataframe with all of the data that we are interested in for the purpose of this analysis, namely: faceoffs. Let's dig into it a bit and see if we can visualize any trends.

Let's start off by seeing how many faceoffs occurred during this game.

In [122]:
faceoffs.shape

(62, 7)

Of those 62 faceoffs, who was the most common winner, and who was the most common loser?

In [121]:
faceoffs['winning_player_name'].value_counts()

Sebastian Aho       13
Vladimir Sobotka    10
Jack Eichel          8
Johan Larsson        8
Lucas Wallmark       7
Victor Rask          6
Greg McKegg          3
Justin Williams      2
Kyle Okposo          1
Evan Rodrigues       1
Jeff Skinner         1
Micheal Ferland      1
Conor Sheary         1
Name: winning_player_name, dtype: int64

In [123]:
faceoffs['losing_player_name'].value_counts()

Jack Eichel         10
Greg McKegg          8
Sebastian Aho        7
Lucas Wallmark       6
Evan Rodrigues       5
Johan Larsson        5
Jeff Skinner         4
Vladimir Sobotka     4
Justin Williams      4
Victor Rask          3
Kyle Okposo          2
Sam Reinhart         2
Micheal Ferland      1
Brock McGinn         1
Name: losing_player_name, dtype: int64

Now, if we want to analyze faceoff outcomes by teams, we'll need to join team information according to the player. Luckily we have the player ID so we can use that to join on. Although we can use `pd.merge()` or `df.merge()`, we will instead use a `dict` with a `map` to simplify the process.