<div id="container" style="position:relative;">
<div style="float:left"><h1> Predicting NHL Goal Scoring - API Debugging </h1></div>

***BrainStation Data Science Capstone Project*** <br/>
***Author:***  &ensp;    **Taylor Gallivan** <br/>
***Date:*** &ensp; **Sep-Nov 2023** 

### Introduction

This workbook has been prepared to document the process of writing the scripts used to retrieve my desired data from the NHL's public API.  

Each player on the NHL site is assigned a 7-digit ID number, which is needed to access their statistics using with API requests.  I was able to retrieve an up-to-date list of player ID numbers from https://hockey-statistics.com/.  From there, I was able to create a list of all active and inactive players that played in the NHL in the last ~3 decades.  

When making my API calls, information was pulled from two endpoints:  `people` and `stats`.  The `people` endpoint contains player bio information, such as first name, last name, weight, height, date of birth, position played, and so on.  The `stats` endpoint, and more particularly the `stats=yearByYear` endpoint, contains season-by-season statistics for each season that a player played in the NHL.  Statistics include features such as number of games played, team played for, number of goals/assists/points scored, average ice time, shots, and so on.  

The working piece of code that I deployed first calls the `people` endpoint to retrieve player bio information, then calls the `stats` endpoint to retrieve player statistics, which are immediately converted into a pandas DataFrame.  Next, specific attributes of interest from the `people` endpoint are added to the statistics DataFrame.  A for loop was used to iterate over the entire process, so that I could collect data for all of the player IDs of interest.  Now that the architecture for the data collection is in place, I could go farther back into the 80s if I wanted to increase the size of my training dataset.  That is a decision I will make once the dataset has been partially-cleaned.

What I have tried to document below is the process of familiarizing myself with the API call process, and piecewise testing of my code, building up to the final code block that is being actively deployed for data collection.  Where bugs have been encountered, I have documented the process of resolving them.  

I would like to give credit to Nick Paul's personal blog (https://www.nickpaul.info/), as his clear documentation of his personal experience accessing the NHL API was an immense help to my efforts. 

#### Data Dictionary

Below is a partial data dictionary for my project, currently including only the player bio information.  The statistics returned by my pull requests include some features that I plan to omit, and some that I am currently undecided on.  The statistical features will be added to the data dictionary once the list is refined.

- `player_id`:  a player's 7 digit ID number, as assigned by the NHL (int)
- `first_name`:  player's first name (str)
- `last_name`:  player's last name (str)
- `position_code`:  single digit code of a player's position: L, C, R, or D (str)
- `position_name`:  full name of a player's position:  Left Wing, Center, Right Wing, Defenseman (str)
- `position_type`:  the category of a player's position:  either Forward or Defenseman (str)
- `weight`:  player's weight in lbs (int)
- `height`:  player's height feet & inches (str)
- `shot_dir`:  which direction a player shoots: left (L) or right (R) (str)
- `birth_date`:  player's date of birth (str)
- `birth_country`:  country a player was born in (str)
- `nationality`:  country a player represents in international competition (str)


In [27]:
# import key packages
import requests
import json
import pandas as pd
from datetime import datetime
import numpy as np

### Exploring NHL API Calls

The code below documents my process in becoming familiar with the accessing the NHL API, retrieving the requisite data, and transforming it into DataFrame format.

In [28]:
# requesting data from 'people' endpoint
# transform output into a series and display it
url = 'https://statsapi.web.nhl.com/api/v1/people/8445550'
response = requests.get(url)
suggestions = json.loads(response.content)['people']
bio = suggestions[0]
series = (pd.json_normalize(bio))
series

Unnamed: 0,id,fullName,link,firstName,lastName,primaryNumber,birthDate,birthCity,birthStateProvince,birthCountry,...,height,weight,active,rookie,shootsCatches,rosterStatus,primaryPosition.code,primaryPosition.name,primaryPosition.type,primaryPosition.abbreviation
0,8445550,Rob Blake,/api/v1/people/8445550,Rob,Blake,4,1969-12-10,Simcoe,ON,CAN,...,"6' 4""",220,False,False,R,Y,D,Defenseman,Defenseman,D


In [29]:
# same as above but for another player
url = 'https://statsapi.web.nhl.com/api/v1/people/8471214'
response = requests.get(url)
suggestions = json.loads(response.content)['people']
bio = suggestions[0]
series = (pd.json_normalize(bio))
series

Unnamed: 0,id,fullName,link,firstName,lastName,primaryNumber,birthDate,currentAge,birthCity,birthCountry,...,rookie,shootsCatches,rosterStatus,currentTeam.id,currentTeam.name,currentTeam.link,primaryPosition.code,primaryPosition.name,primaryPosition.type,primaryPosition.abbreviation
0,8471214,Alex Ovechkin,/api/v1/people/8471214,Alex,Ovechkin,8,1985-09-17,38,Moscow,RUS,...,False,R,Y,15,Washington Capitals,/api/v1/teams/15,L,Left Wing,Forward,LW


In [30]:
# examining the data types of the features in the 'people' endpoint
url = 'https://statsapi.web.nhl.com/api/v1/people/8445550'
response = requests.get(url)
suggestions = json.loads(response.content)['people']
bio = suggestions[0]
series = (pd.json_normalize(bio))
series.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 21 columns):
 #   Column                        Non-Null Count  Dtype 
---  ------                        --------------  ----- 
 0   id                            1 non-null      int64 
 1   fullName                      1 non-null      object
 2   link                          1 non-null      object
 3   firstName                     1 non-null      object
 4   lastName                      1 non-null      object
 5   primaryNumber                 1 non-null      object
 6   birthDate                     1 non-null      object
 7   birthCity                     1 non-null      object
 8   birthStateProvince            1 non-null      object
 9   birthCountry                  1 non-null      object
 10  nationality                   1 non-null      object
 11  height                        1 non-null      object
 12  weight                        1 non-null      int64 
 13  active                  

In [31]:
# working out how to pass a value from the 'people' data series
url = 'https://statsapi.web.nhl.com/api/v1/people/8445550'
response = requests.get(url)
suggestions = json.loads(response.content)['people']
#bio = suggestions[0]
series = (pd.json_normalize(suggestions))
#player_id = series['id']
#print(player_id[0])
#first_name = player['firstName']
#first_name

#player_series = series.loc[:, ['id', 'firstName', 'lastName', 'birthDate', 'height', 'weight', 'primaryPosition.code']]
print(series['firstName'][0])

Rob


In [32]:
# testing output of year by year stats
response = requests.get('https://statsapi.web.nhl.com/api/v1/people/8445550/stats/?stats=yearByYear')
suggestions = json.loads(response.content)
suggestions

{'copyright': 'NHL and the NHL Shield are registered trademarks of the National Hockey League. NHL and NHL team marks are the property of the NHL and its teams. © NHL 2023. All Rights Reserved.',
 'stats': [{'type': {'displayName': 'yearByYear', 'gameType': None},
   'splits': [{'season': '19851986',
     'stat': {'timeOnIce': '00:00',
      'assists': 13,
      'goals': 3,
      'pim': 43,
      'games': 39,
      'penaltyMinutes': '43',
      'points': 16},
     'team': {'name': 'Brantford', 'link': '/api/v1/teams/null'},
     'league': {'name': 'OHA-B', 'link': '/api/v1/league/null'},
     'sequenceNumber': 1},
    {'season': '19861987',
     'stat': {'assists': 20,
      'goals': 11,
      'pim': 115,
      'games': 31,
      'penaltyMinutes': '115',
      'points': 31},
     'team': {'name': 'Stratford', 'link': '/api/v1/teams/null'},
     'league': {'name': 'OHA-B', 'link': '/api/v1/league/null'},
     'sequenceNumber': 1},
    {'season': '19871988',
     'stat': {'assists': 8,
 

In [33]:
# transforming json output into a pandas dataframe
url = 'https://statsapi.web.nhl.com/api/v1/people/8445550/stats/?stats=yearByYear'
response = requests.get(url)
content = json.loads(response.content)['stats']
splits = content[0]['splits']

# the NHL API stores player stats from development leagues, we only care about NHL stats
df_splits = (pd.json_normalize(splits, sep = "_" )
             .query('league_name == "National Hockey League"')  
            )
df_splits

Unnamed: 0,season,sequenceNumber,stat_timeOnIce,stat_assists,stat_goals,stat_pim,stat_games,stat_penaltyMinutes,stat_points,team_name,...,stat_shortHandedGoals,stat_shortHandedPoints,stat_plusMinus,stat_hits,stat_powerPlayTimeOnIce,stat_evenTimeOnIce,stat_faceOffPct,stat_shortHandedTimeOnIce,stat_blocked,stat_shifts
4,19891990,1,,0,0,4,4,4,0,Los Angeles Kings,...,0.0,0.0,0.0,,,,,,,
6,19901991,1,,34,12,125,75,125,46,Los Angeles Kings,...,0.0,0.0,3.0,,,,,,,
8,19911992,1,,13,7,102,57,102,20,Los Angeles Kings,...,0.0,0.0,-5.0,,,,,,,
9,19921993,1,,43,16,152,76,152,59,Los Angeles Kings,...,0.0,2.0,18.0,,,,,,,
10,19931994,1,,48,20,137,84,137,68,Los Angeles Kings,...,0.0,3.0,-7.0,,,,,,,
12,19941995,1,,7,4,38,24,38,11,Los Angeles Kings,...,0.0,0.0,-16.0,,,,,,,
13,19951996,1,,2,1,8,6,8,3,Los Angeles Kings,...,0.0,0.0,0.0,,,,,,,
15,19961997,1,,23,8,82,62,82,31,Los Angeles Kings,...,0.0,0.0,-28.0,,,,,,,
18,19971998,1,2141:39,27,23,94,81,94,50,Los Angeles Kings,...,0.0,1.0,-3.0,183.0,389:53,1426:44,0.0,325:02,0.0,2353.0
21,19981999,1,1541:13,23,12,128,62,128,35,Los Angeles Kings,...,1.0,2.0,-7.0,132.0,252:19,1078:25,0.0,210:29,0.0,1667.0


In [34]:
# view the columns outputted by 'stats' endpoint
url = 'https://statsapi.web.nhl.com/api/v1/people/8445550/stats/?stats=yearByYear'
response = requests.get(url)
content = json.loads(response.content)['stats']
splits = content[0]['splits']

df_splits = (pd.json_normalize(splits, sep = "_" )
             .query('league_name == "National Hockey League"')
            )

df_splits.columns

Index(['season', 'sequenceNumber', 'stat_timeOnIce', 'stat_assists',
       'stat_goals', 'stat_pim', 'stat_games', 'stat_penaltyMinutes',
       'stat_points', 'team_name', 'team_link', 'league_name', 'league_link',
       'team_id', 'league_id', 'stat_shots', 'stat_powerPlayGoals',
       'stat_powerPlayPoints', 'stat_shotPct', 'stat_gameWinningGoals',
       'stat_overTimeGoals', 'stat_shortHandedGoals', 'stat_shortHandedPoints',
       'stat_plusMinus', 'stat_hits', 'stat_powerPlayTimeOnIce',
       'stat_evenTimeOnIce', 'stat_faceOffPct', 'stat_shortHandedTimeOnIce',
       'stat_blocked', 'stat_shifts'],
      dtype='object')

In [35]:
# check the data types in 'stats' endpoint
url = 'https://statsapi.web.nhl.com/api/v1/people/8445550/stats/?stats=yearByYear'
response = requests.get(url)
content = json.loads(response.content)['stats']
splits = content[0]['splits']

df_splits = (pd.json_normalize(splits, sep = "_" )
             .query('league_name == "National Hockey League"')
            )

df_splits.info()

<class 'pandas.core.frame.DataFrame'>
Index: 21 entries, 4 to 35
Data columns (total 31 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   season                     21 non-null     object 
 1   sequenceNumber             21 non-null     int64  
 2   stat_timeOnIce             13 non-null     object 
 3   stat_assists               21 non-null     int64  
 4   stat_goals                 21 non-null     int64  
 5   stat_pim                   21 non-null     int64  
 6   stat_games                 21 non-null     int64  
 7   stat_penaltyMinutes        21 non-null     object 
 8   stat_points                21 non-null     int64  
 9   team_name                  21 non-null     object 
 10  team_link                  21 non-null     object 
 11  league_name                21 non-null     object 
 12  league_link                21 non-null     object 
 13  team_id                    21 non-null     float64
 14  l

In [36]:
# testing how to pass player bio info into player stats DataFrame
url = 'https://statsapi.web.nhl.com/api/v1/people/8445550'
response = requests.get(url)
suggestions = json.loads(response.content)['people']
player = (pd.json_normalize(suggestions))

url = 'https://statsapi.web.nhl.com/api/v1/people/8445550/stats/?stats=yearByYear'
response = requests.get(url)
content = json.loads(response.content)['stats']
splits = content[0]['splits']

df_splits = (pd.json_normalize(splits, sep = "_" )
             .query('league_name == "National Hockey League"')
            )

df_splits['player_id'] = player['id'][0]
df_splits['first_name'] = player['firstName'][0]
df_splits['last_name'] = player['lastName'][0]
df_splits['bday'] = pd.to_datetime(player['birthDate'][0])
# Determines the season start year, from column 'season'
df_splits['season_end'] = [x[4:8] for x in df_splits['season']]
# Determines the season start year from column 'seasons'
df_splits['season_start_yr'] = [x[0:4] for x in df_splits['season']]
# Sets the season start date
df_splits['season_start_dt'] =  [datetime.strptime(x + '0930', "%Y%m%d") for x in df_splits['season_start_yr']] 
# Calculates the player's age at the start of a given season
df_splits['age'] = (np.floor((df_splits['season_start_dt'] - df_splits['bday'])/ np.timedelta64(1,'Y') ))
# Convert age to int datatype
df_splits['age'] = df_splits['age'].astype(int)

df_splits

Unnamed: 0,season,sequenceNumber,stat_timeOnIce,stat_assists,stat_goals,stat_pim,stat_games,stat_penaltyMinutes,stat_points,team_name,...,stat_blocked,stat_shifts,player_id,first_name,last_name,bday,season_end,season_start_yr,season_start_dt,age
4,19891990,1,,0,0,4,4,4,0,Los Angeles Kings,...,,,8445550,Rob,Blake,1969-12-10,1990,1989,1989-09-30,19
6,19901991,1,,34,12,125,75,125,46,Los Angeles Kings,...,,,8445550,Rob,Blake,1969-12-10,1991,1990,1990-09-30,20
8,19911992,1,,13,7,102,57,102,20,Los Angeles Kings,...,,,8445550,Rob,Blake,1969-12-10,1992,1991,1991-09-30,21
9,19921993,1,,43,16,152,76,152,59,Los Angeles Kings,...,,,8445550,Rob,Blake,1969-12-10,1993,1992,1992-09-30,22
10,19931994,1,,48,20,137,84,137,68,Los Angeles Kings,...,,,8445550,Rob,Blake,1969-12-10,1994,1993,1993-09-30,23
12,19941995,1,,7,4,38,24,38,11,Los Angeles Kings,...,,,8445550,Rob,Blake,1969-12-10,1995,1994,1994-09-30,24
13,19951996,1,,2,1,8,6,8,3,Los Angeles Kings,...,,,8445550,Rob,Blake,1969-12-10,1996,1995,1995-09-30,25
15,19961997,1,,23,8,82,62,82,31,Los Angeles Kings,...,,,8445550,Rob,Blake,1969-12-10,1997,1996,1996-09-30,26
18,19971998,1,2141:39,27,23,94,81,94,50,Los Angeles Kings,...,0.0,2353.0,8445550,Rob,Blake,1969-12-10,1998,1997,1997-09-30,27
21,19981999,1,1541:13,23,12,128,62,128,35,Los Angeles Kings,...,0.0,1667.0,8445550,Rob,Blake,1969-12-10,1999,1998,1998-09-30,28


In [37]:
# increasing DataFrame complexity
url = 'https://statsapi.web.nhl.com/api/v1/people/8445550'
response = requests.get(url)
suggestions = json.loads(response.content)['people']
player = (pd.json_normalize(suggestions))

url = 'https://statsapi.web.nhl.com/api/v1/people/8445550/stats/?stats=yearByYear'
response = requests.get(url)
content = json.loads(response.content)['stats']
splits = content[0]['splits']

df_splits = (pd.json_normalize(splits, sep = "_" )
             .query('league_name == "National Hockey League"')
            )

df_splits['player_id'] = player['id'][0]
df_splits['first_name'] = player['firstName'][0]
df_splits['last_name'] = player['lastName'][0]
df_splits['position_code'] = player['primaryPosition.code'][0]

df_splits['season_start_yr'] = [x[0:4] for x in df_splits['season']]
df_splits['season_start_dt'] =  [datetime.strptime(x + '0930', "%Y%m%d") for x in df_splits['season_start_yr']] 
df_splits['season_end'] = [x[4:8] for x in df_splits['season']]

df_splits['weight'] = player['weight'][0]
df_splits['height'] = player['height'][0]
df_splits['shot_dir'] = player['shootsCatches'][0]
df_splits['birth_date'] = pd.to_datetime(player['birthDate'][0])
df_splits['age'] = (np.floor((df_splits['season_start_dt'] - df_splits['birth_date'])/ np.timedelta64(1,'Y') ))
df_splits['age'] = df_splits['age'].astype(int)
df_splits['position_name'] = player['primaryPosition.name'][0]
df_splits['position_type'] = player['primaryPosition.type'][0]
df_splits['birth_country'] = player['birthCountry'][0]
df_splits['nationality'] = player['nationality'][0]


df_splits

Unnamed: 0,season,sequenceNumber,stat_timeOnIce,stat_assists,stat_goals,stat_pim,stat_games,stat_penaltyMinutes,stat_points,team_name,...,season_end,weight,height,shot_dir,birth_date,age,position_name,position_type,birth_country,nationality
4,19891990,1,,0,0,4,4,4,0,Los Angeles Kings,...,1990,220,"6' 4""",R,1969-12-10,19,Defenseman,Defenseman,CAN,CAN
6,19901991,1,,34,12,125,75,125,46,Los Angeles Kings,...,1991,220,"6' 4""",R,1969-12-10,20,Defenseman,Defenseman,CAN,CAN
8,19911992,1,,13,7,102,57,102,20,Los Angeles Kings,...,1992,220,"6' 4""",R,1969-12-10,21,Defenseman,Defenseman,CAN,CAN
9,19921993,1,,43,16,152,76,152,59,Los Angeles Kings,...,1993,220,"6' 4""",R,1969-12-10,22,Defenseman,Defenseman,CAN,CAN
10,19931994,1,,48,20,137,84,137,68,Los Angeles Kings,...,1994,220,"6' 4""",R,1969-12-10,23,Defenseman,Defenseman,CAN,CAN
12,19941995,1,,7,4,38,24,38,11,Los Angeles Kings,...,1995,220,"6' 4""",R,1969-12-10,24,Defenseman,Defenseman,CAN,CAN
13,19951996,1,,2,1,8,6,8,3,Los Angeles Kings,...,1996,220,"6' 4""",R,1969-12-10,25,Defenseman,Defenseman,CAN,CAN
15,19961997,1,,23,8,82,62,82,31,Los Angeles Kings,...,1997,220,"6' 4""",R,1969-12-10,26,Defenseman,Defenseman,CAN,CAN
18,19971998,1,2141:39,27,23,94,81,94,50,Los Angeles Kings,...,1998,220,"6' 4""",R,1969-12-10,27,Defenseman,Defenseman,CAN,CAN
21,19981999,1,1541:13,23,12,128,62,128,35,Los Angeles Kings,...,1999,220,"6' 4""",R,1969-12-10,28,Defenseman,Defenseman,CAN,CAN


In [38]:
# confirming that the shape matches what is expected
url = 'https://statsapi.web.nhl.com/api/v1/people/8445550'
response = requests.get(url)
suggestions = json.loads(response.content)['people']
player = (pd.json_normalize(suggestions))

url = 'https://statsapi.web.nhl.com/api/v1/people/8445550/stats/?stats=yearByYear'
response = requests.get(url)
content = json.loads(response.content)['stats']
splits = content[0]['splits']

df_splits = (pd.json_normalize(splits, sep = "_" )
             .query('league_name == "National Hockey League"')
            )

df_splits['player_id'] = player['id'][0]
df_splits['first_name'] = player['firstName'][0]
df_splits['last_name'] = player['lastName'][0]
df_splits['position_code'] = player['primaryPosition.code'][0]

df_splits['season_start_yr'] = [x[0:4] for x in df_splits['season']]
df_splits['season_start_dt'] =  [datetime.strptime(x + '0930', "%Y%m%d") for x in df_splits['season_start_yr']] 
df_splits['season_end'] = [x[4:8] for x in df_splits['season']]

df_splits['weight'] = player['weight'][0]
df_splits['height'] = player['height'][0]
df_splits['shot_dir'] = player['shootsCatches'][0]
df_splits['birth_date'] = pd.to_datetime(player['birthDate'][0])
df_splits['age'] = (np.floor((df_splits['season_start_dt'] - df_splits['birth_date'])/ np.timedelta64(1,'Y') ))
df_splits['age'] = df_splits['age'].astype(int)
df_splits['position_name'] = player['primaryPosition.name'][0]
df_splits['position_type'] = player['primaryPosition.type'][0]
df_splits['birth_country'] = player['birthCountry'][0]
df_splits['nationality'] = player['nationality'][0]


df_splits.shape

(21, 47)

### Implementing Conditionals

Now that we are confident in the API call process, we can start setting up our iteration to parse through the list of player IDs.  This can be accomplished using a for loop.  Our initial list of player IDs will be: 8445550 - 8482246.  

Within the loop, we will need to make two endpoint calls:  one to `people` and one to `stats`.  The following conditions will be included in the code:
- if the player's position code is 'G' (goaltender), pass
- if a player has played less than 3 seasons in the NHL, pass
- if a player has played less than 200 career games in the NHL, pass

Another pass to consider, which could be part of a successive data cleaning effort, would be to only consider seasons where a player played over, say, 15 games.  This is close to the 20% mark of a season.  If sample sizes are too small, they will be more sensitive to 'noise' (i.e., a hot or cold streak that is misleading)

For any player who passes all pass conditions, the dataframe of their statistics will be added to the larger dataframe containing all players (who meet our criteria).

In [39]:
# testing URL concatenation, with very simple for loop

# set base URL, which player ID number will be added to
base_url = 'https://statsapi.web.nhl.com/api/v1/people/'

# Set a range for loop
range1 = range(8445549, 8445551)  

for num in range1:
    url = f'{base_url}{num}'  # f-strings used to concatenate the player ID onto the URL for the API call
    print(url)

https://statsapi.web.nhl.com/api/v1/people/8445549
https://statsapi.web.nhl.com/api/v1/people/8445550


One thing that was identified early on in this stage is that for some player IDs there is no information stored.  In that case, an error is thrown when we try to pull informataion from the `people` endpoint.  Below is the code I deployed to figure out the response code of the error.  After some web searching, I found the `.status_code` command which can be used to make logical decisions when a the error response code is thrown (i.e., not break the code!) 

In [40]:
# this player ID broke my code - check response code
url = 'https://statsapi.web.nhl.com/api/v1/people/8445552'
response = requests.get(url)
print(response)

<Response [404]>


In [41]:
# testing .status_code function
url = 'https://statsapi.web.nhl.com/api/v1/people/8445552'
response = requests.get(url)
if response.status_code == 404:
    print('yes')
else:  
    print('no')

yes


For the next block of code, I pulled together my two debugged API calls for `people` and `stats`, as well as a conditional statement to pass responses returning a [404] response.

After the calling the `people` endpoint, the code checks if the position of the player is a goaltender (and passes if so).  AFter calling the `stats` endpoint, the code checks if the resulting DataFrame has less than 3 rows, which would mean the player played in less than 3 seasons (and passes if so).  

For simplicity of code, I did not add all of my desired variables from `people` into the outputted DataFrame.  After completing debugging, all of the desired variables will be included.

In [42]:
# Pulling pieces together and testing conditional statements
base_url = 'https://statsapi.web.nhl.com/api/v1/people/'

# modest range of 10 player IDs
range1 = range(8445550, 8445560)

for num in range1:
    url = f'{base_url}{num}'
    response = requests.get(url)
    
    # evaluate response code, pass if 404
    if response.status_code == 404:
        pass
    else:
        bio = json.loads(response.content)['people']
        player = (pd.json_normalize(bio))
        
        # evaluate player position, pass if 'G' (goaltender)
        if player['primaryPosition.code'][0] == 'G':
            pass
        else:
            url = f'{base_url}{num}/stats/?stats=yearByYear'
            response = requests.get(url)
            content = json.loads(response.content)['stats']
            splits = content[0]['splits']
            
            df_splits = (pd.json_normalize(splits, sep = "_" )
                         .query('league_name == "National Hockey League"')
                        )
            
            # evaluate how many seasons a player has played, pass if less than 3
            if df_splits.shape[0] < 3:
                pass
            else:
                df_splits['player_id'] = player['id'][0]
                df_splits['first_name'] = player['firstName'][0]
                df_splits['last_name'] = player['lastName'][0]
                df_splits['position_code'] = player['primaryPosition.code'][0]
                
                # print the DataFrame to check output is correct
                print(df_splits)

      season  sequenceNumber stat_timeOnIce  stat_assists  stat_goals  \
4   19891990               1            NaN             0           0   
6   19901991               1            NaN            34          12   
8   19911992               1            NaN            13           7   
9   19921993               1            NaN            43          16   
10  19931994               1            NaN            48          20   
12  19941995               1            NaN             7           4   
13  19951996               1            NaN             2           1   
15  19961997               1            NaN            23           8   
18  19971998               1        2141:39            27          23   
21  19981999               1        1541:13            23          12   
23  19992000               1        2193:52            39          18   
24  20002001               1        1521:30            32          17   
25  20002001               2         338:37        

Next I needed to add a second conditional statement on the `stats` response to evaluate if a player has played at least 200 games. 

In [43]:
# Add conditional evaluating number of games played by the current player
base_url = 'https://statsapi.web.nhl.com/api/v1/people/'
range1 = range(8445550, 8445560)

for num in range1:
    people_url = f'{base_url}{num}'
    response = requests.get(people_url)
    
    if response.status_code == 404:
        pass
    else:
        bio = json.loads(response.content)['people']
        player = (pd.json_normalize(bio))
        
        if player['primaryPosition.code'][0] == 'G':
            pass
        else:
            stats_url = f'{base_url}{num}/stats/?stats=yearByYear'
            response = requests.get(stats_url)
            content = json.loads(response.content)['stats']
            splits = content[0]['splits']
            
            df_splits = (pd.json_normalize(splits, sep = "_" )
                         .query('league_name == "National Hockey League"')
                        )
            
            if df_splits.shape[0] < 3:
                pass
            else:
                df_splits['player_id'] = player['id'][0]
                df_splits['first_name'] = player['firstName'][0]
                df_splits['last_name'] = player['lastName'][0]
                df_splits['position_code'] = player['primaryPosition.code'][0]
                
                # create a new DF that stores the total games played by the current player
                total_games = df_splits.groupby(['player_id', 'first_name', 'last_name'])['stat_games'].sum().reset_index()
                
                # evaluate if current player has played more than 200 games, if so pass to filtered DF
                filtered_total_games = total_games[total_games['stat_games'] > 200]
                
                # if the the filtered DF has an entry in it, continue
                if not filtered_total_games.empty:
                    # print the filtered DF
                    print(filtered_total_games)
                else:
                    pass               
                

   player_id first_name last_name  stat_games
0    8445550        Rob     Blake        1270
   player_id first_name  last_name  stat_games
0    8445557       Timo  Blomqvist         243
   player_id first_name last_name  stat_games
0    8445558       Rick    Blight         326
   player_id first_name last_name  stat_games
0    8445559       John      Blum       250.0


In [44]:
# Sanity check - print response codes & total games played for each ID 
base_url = 'https://statsapi.web.nhl.com/api/v1/people/'
range1 = range(8445550, 8445560)

for num in range1:
    people_url = f'{base_url}{num}'
    response = requests.get(people_url)
    
    # PRINT 1
    print(response)
    
    if response.status_code == 404:
        pass
    else:
        bio = json.loads(response.content)['people']
        player = (pd.json_normalize(bio))
        
        # PRINT 2
        print(player['primaryPosition.code'][0])
        
        if player['primaryPosition.code'][0] == 'G':
            pass
        else:
            stats_url = f'{base_url}{num}/stats/?stats=yearByYear'
            response = requests.get(stats_url)
            content = json.loads(response.content)['stats']
            splits = content[0]['splits']
            
            df_splits = (pd.json_normalize(splits, sep = "_" )
                         .query('league_name == "National Hockey League"')
                        )
            
            # PRINT 3
            print(df_splits.shape[0])
            
            if df_splits.shape[0] < 3:
                pass
            else:
                df_splits['player_id'] = player['id'][0]
                df_splits['first_name'] = player['firstName'][0]
                df_splits['last_name'] = player['lastName'][0]
                df_splits['position_code'] = player['primaryPosition.code'][0]
                total_games = df_splits.groupby(['player_id', 'first_name', 'last_name'])['stat_games'].sum().reset_index()
                filtered_total_games = total_games[total_games['stat_games'] > 200]
                
                # PRINT 4
                print(total_games)
                
                if not filtered_total_games.empty:
                    print('GOOD PLAYER!')
                else:
                    pass           
                

<Response [200]>
D
21
   player_id first_name last_name  stat_games
0    8445550        Rob     Blake        1270
GOOD PLAYER!
<Response [200]>
G
<Response [404]>
<Response [404]>
<Response [404]>
<Response [200]>
D
4
   player_id first_name  last_name  stat_games
0    8445555       Jeff  Bloemberg          43
<Response [404]>
<Response [200]>
D
5
   player_id first_name  last_name  stat_games
0    8445557       Timo  Blomqvist         243
GOOD PLAYER!
<Response [200]>
R
7
   player_id first_name last_name  stat_games
0    8445558       Rick    Blight         326
GOOD PLAYER!
<Response [200]>
D
9
   player_id first_name last_name  stat_games
0    8445559       John      Blum       250.0
GOOD PLAYER!


Reading the output of the above code, the conditionals are functioning as designed.

### Implementing DataFrame Concatenation

The last step here is to implement code that will take the DataFrame of a player meeting all conditions, and concatenate it to a main a main DataFrame of all such players.  I tested the code over 20 player IDs, and returned the DataFrame.

During this step, another error response code was encountered (code 500), in the case where a player has an NHL profile, but has no stats associated with it.

In [46]:
# Final piece of code to test functionality - 20 player IDs
main_df_test = pd.DataFrame()
base_url = 'https://statsapi.web.nhl.com/api/v1/people/'
range1 = range(8445550, 8445570)

for num in range1:
    people_url = f'{base_url}{num}'
    response = requests.get(people_url)
    
    if response.status_code != 404:
        suggestions = json.loads(response.content)['people']
        player = (pd.json_normalize(suggestions))
        
        if player['primaryPosition.code'][0] != 'G':
            stats_url = f'{base_url}{num}/stats/?stats=yearByYear'
            response = requests.get(stats_url)
            
            # evaluate response code, pass if 500
            if response.status_code != 500:
                content = json.loads(response.content)['stats']
                splits = content[0]['splits']
                
                df_splits = (pd.json_normalize(splits, sep = "_" )
                             .query('league_name == "National Hockey League"')
                            )
                            
                if df_splits.shape[0] >= 3:
                    df_splits['player_id'] = player['id'][0]
                    df_splits['first_name'] = player['firstName'][0]
                    df_splits['last_name'] = player['lastName'][0]
                    df_splits['position_code'] = player['primaryPosition.code'][0]
                    df_splits['stat_games'] = df_splits['stat_games'].astype(int)
                    total_games = df_splits.groupby(['player_id', 'first_name', 'last_name', 'position_code'])['stat_games'].sum().reset_index()
                    filtered_total_games = total_games[total_games['stat_games'] > 200]
                    
                    if not filtered_total_games.empty:
                        df_splits['season_start_yr'] = [x[0:4] for x in df_splits['season']]
                        df_splits['season_start_dt'] =  [datetime.strptime(x + '0930', "%Y%m%d") for x in df_splits['season_start_yr']] 
                        df_splits['season_end'] = [x[4:8] for x in df_splits['season']]
                        
                        df_splits['weight'] = player['weight'][0]
                        df_splits['height'] = player['height'][0]
                        df_splits['shot_dir'] = player['shootsCatches'][0]
                        df_splits['birth_date'] = pd.to_datetime(player['birthDate'][0])
                        df_splits['age'] = (np.floor((df_splits['season_start_dt'] - df_splits['birth_date'])/ np.timedelta64(1,'Y') ))
                        df_splits['age'] = df_splits['age'].astype(int)
                        df_splits['position_name'] = player['primaryPosition.name'][0]
                        df_splits['position_type'] = player['primaryPosition.type'][0]
                        df_splits['birth_country'] = player['birthCountry'][0]
                        df_splits['nationality'] = player['nationality'][0]
                        
                        # concatenate the DF for the current player, to the main DF
                        main_df_test = pd.concat([main_df_test, df_splits], sort=False).reset_index(drop=True)
                    else:
                        pass        
                else:
                    pass
            else:
                pass
        else:
            pass
    else:
        pass

In [49]:
# show the resulting DataFrame
main_df_test

Unnamed: 0,season,sequenceNumber,stat_timeOnIce,stat_assists,stat_goals,stat_pim,stat_games,stat_penaltyMinutes,stat_points,team_name,...,season_end,weight,height,shot_dir,birth_date,age,position_name,position_type,birth_country,nationality
0,19891990,1,,0.0,0.0,4.0,4,4,0.0,Los Angeles Kings,...,1990,220,"6' 4""",R,1969-12-10,19,Defenseman,Defenseman,CAN,CAN
1,19901991,1,,34.0,12.0,125.0,75,125,46.0,Los Angeles Kings,...,1991,220,"6' 4""",R,1969-12-10,20,Defenseman,Defenseman,CAN,CAN
2,19911992,1,,13.0,7.0,102.0,57,102,20.0,Los Angeles Kings,...,1992,220,"6' 4""",R,1969-12-10,21,Defenseman,Defenseman,CAN,CAN
3,19921993,1,,43.0,16.0,152.0,76,152,59.0,Los Angeles Kings,...,1993,220,"6' 4""",R,1969-12-10,22,Defenseman,Defenseman,CAN,CAN
4,19931994,1,,48.0,20.0,137.0,84,137,68.0,Los Angeles Kings,...,1994,220,"6' 4""",R,1969-12-10,23,Defenseman,Defenseman,CAN,CAN
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
56,19961997,1,,15.0,1.0,64.0,81,64,16.0,San Jose Sharks,...,1997,210,"6' 2""",L,1966-06-18,30,Defenseman,Defenseman,CAN,CAN
57,19971998,1,625:33,6.0,4.0,32.0,28,32,10.0,San Jose Sharks,...,1998,210,"6' 2""",L,1966-06-18,31,Defenseman,Defenseman,CAN,CAN
58,19971998,2,831:41,5.0,5.0,25.0,49,25,10.0,New Jersey Devils,...,1998,210,"6' 2""",L,1966-06-18,31,Defenseman,Defenseman,CAN,CAN
59,19981999,1,1264:08,11.0,3.0,34.0,65,34,14.0,Los Angeles Kings,...,1999,210,"6' 2""",L,1966-06-18,32,Defenseman,Defenseman,CAN,CAN
