# Test Notebook for the NHL Linemate Scraper
---
### Purpose
The purpose of the ipynb file is to test to ensure that the data created by the NHL Linemate Scraper is accurate. This scraper presents a unique challenge, in that we don't have a data set to directly compare it to, so we have to get creative.
### Methods
We need to do the following to ensure our data is actually correct:

1. Verify our function to extract the shift data from the html reports works properly
2. Verify that our function to add each player on the ice at every second works properly
3. Verify that our function that tells us how long each line/pair was on the ice for at 5v5 strength works properly

To reach these conclusions, we must take the following steps:

1. Compare the player shifts report Data Frame that the scraper is extracting from the html reports to the actual html reports for a handful of games.
    - Acceptance Criteria -They should match 1:1 and we should confirm that no data is lost from the bs4 scraping function.
2. Add up each players All Sit, EV, and PP TOI.
    - Acceptance Criteria - Eery player's TOI matches on our calculations to the NHL's html report for each game state scenario (with the exception of when a goalie is pulled/too many men).
3. Compare the linemate data frame (the one that contains who is on the ice at every second of the game) to the paly-by-play html file and match up game seconds to confirm who was on the ice at each event.
    - Acceptance Criteria - It should match up in a way that we can index an event to my linemate df, and we should have the same players on ice on our df and the html paly-by-play report
4. Manually loop through each game second and check the TOI for the top 10 forward combos and top 10 defensive pairs according to the 5v5 reports
    - Acceptance Criteria - The TOIs of these lines and pairs should match the reports when we manually look up how long they were on the ice

If the cases above are valid, then our data will be correct.

I will be testing 3 games - A game where a goalie gets pulled, a "normal" game, and a game with a too many men penalty. I chose these test cases to show that the scraper can handle any kind of game.

# Prepare Notebook

In [130]:
# Import
import pandas as pd

In [131]:
# Functions I'll use throughout
# Function to extract a players TOI at any strength:
def extract_toi(player,team,strength):
    if strength=="all":
        df = linemate_data
    else:
        df = linemate_data[linemate_data['strength_cat']==strength]
    df = df[(df['{}_player_1_name'.format(team)]==player)|(df['{}_player_2_name'.format(team)]==player)|(df['{}_player_3_name'.format(team)]==player)|(df['{}_player_4_name'.format(team)]==player)|(df['{}_player_5_name'.format(team)]==player)|(df['{}_player_6_name'.format(team)]==player)]
    toi = len(df)/60
    return toi

# Function to calcualte each lines TOI
def calculate_forward_line_toi(p1,p2,p3,team):
    df = linemate_data[linemate_data['strength']=="5v5"]
    df = df[((df['{}_player_1_name'.format(team)]==p1)|(df['{}_player_2_name'.format(team)]==p1)|(df['{}_player_3_name'.format(team)]==p1)|(df['{}_player_4_name'.format(team)]==p1)|(df['{}_player_5_name'.format(team)]==p1)|(df['{}_player_6_name'.format(team)]==p1))&
            ((df['{}_player_1_name'.format(team)]==p2)|(df['{}_player_2_name'.format(team)]==p2)|(df['{}_player_3_name'.format(team)]==p2)|(df['{}_player_4_name'.format(team)]==p2)|(df['{}_player_5_name'.format(team)]==p2)|(df['{}_player_6_name'.format(team)]==p2))&
            ((df['{}_player_1_name'.format(team)]==p3)|(df['{}_player_2_name'.format(team)]==p3)|(df['{}_player_3_name'.format(team)]==p3)|(df['{}_player_4_name'.format(team)]==p3)|(df['{}_player_5_name'.format(team)]==p3)|(df['{}_player_6_name'.format(team)]==p3))]
    toi = len(df)/60
    return toi

# Function to calcualte each pairs TOI
def calculate_defender_pair_toi(p1,p2,team):
    df = linemate_data[linemate_data['strength']=="5v5"]
    df = df[((df['{}_player_1_name'.format(team)]==p1)|(df['{}_player_2_name'.format(team)]==p1)|(df['{}_player_3_name'.format(team)]==p1)|(df['{}_player_4_name'.format(team)]==p1)|(df['{}_player_5_name'.format(team)]==p1)|(df['{}_player_6_name'.format(team)]==p1))&
            ((df['{}_player_1_name'.format(team)]==p2)|(df['{}_player_2_name'.format(team)]==p2)|(df['{}_player_3_name'.format(team)]==p2)|(df['{}_player_4_name'.format(team)]==p2)|(df['{}_player_5_name'.format(team)]==p2)|(df['{}_player_6_name'.format(team)]==p2))]
    toi = len(df)/60
    return toi


# Test Game 1 - 2023020017 - A normal game
What we expect - This game features no goalie pulls or too many men calls, so we should see exact matches for all cases

### Test Case 1 - Compare Extracted Shift Data to html Shift Data
Shift for game 2023020017 can be found at https://www.nhl.com/scores/htmlreports/20232024/TV020017.HTM and https://www.nhl.com/scores/htmlreports/20232024/TH020017.HTM

In [236]:
# Read in my extracted shifts from this game. This is the df of extracted shifts from the bs4 function. It's what is used to build the by second df
extracted_shift_data = pd.read_csv("data/shift-data-for-testing-2023020017.csv")
extracted_shift_data

Unnamed: 0.1,Unnamed: 0,shift_number,period,shift_start_time,shift_end_time,duration,Event,player_number,last_name,first_name,shift_start_time_seconds,shift_end_time_seconds,duration_seconds,team,playerId,positionCode,firstName.default,lastName.default,position,full_name
0,0,1,1,1:28,2:18,00:50,,2,SMITH,BRENDAN,88,138,50,NJD,8474090,D,Brendan,Smith,D,Brendan Smith
1,1,2,1,4:33,5:26,00:53,,2,SMITH,BRENDAN,273,326,53,NJD,8474090,D,Brendan,Smith,D,Brendan Smith
2,2,3,1,7:26,8:16,00:50,,2,SMITH,BRENDAN,446,496,50,NJD,8474090,D,Brendan,Smith,D,Brendan Smith
3,3,4,1,9:28,9:57,00:29,P,2,SMITH,BRENDAN,568,597,29,NJD,8474090,D,Brendan,Smith,D,Brendan Smith
4,4,5,1,12:54,13:34,00:40,,2,SMITH,BRENDAN,774,814,40,NJD,8474090,D,Brendan,Smith,D,Brendan Smith
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
797,401,18,3,12:44,13:55,01:11,G,92,COOLEY,LOGAN,3164,3235,71,ARI,8483431,C,Logan,Cooley,F,Logan Cooley
798,402,19,3,15:09,16:10,01:01,,92,COOLEY,LOGAN,3309,3370,61,ARI,8483431,C,Logan,Cooley,F,Logan Cooley
799,403,20,3,17:06,17:34,00:28,,92,COOLEY,LOGAN,3426,3454,28,ARI,8483431,C,Logan,Cooley,F,Logan Cooley
800,404,21,4,1:04,1:50,00:46,,92,COOLEY,LOGAN,3664,3710,46,ARI,8483431,C,Logan,Cooley,F,Logan Cooley


#### Test Case 1A - Compare Shift Counts for Every Player in Game 2023020017

In [237]:
# This is the number of shifts for each player. Go throguh each html doc and compare
extracted_shift_data.groupby('full_name').count()[['shift_number']]

Unnamed: 0_level_0,shift_number
full_name,Unnamed: 1_level_1
Akira Schmid,4
Alex Kerfoot,18
Alexander Holtz,14
Barrett Hayton,28
Brendan Smith,22
Clayton Keller,26
Dawson Mercer,20
Dougie Hamilton,29
Erik Haula,20
J.J. Moser,24


#### Test Case 1B - Compare shifts match for a few players on the HTML reports

In [238]:
# John Marino
extracted_shift_data[extracted_shift_data['full_name']=="John Marino"][['shift_number','period','shift_start_time','shift_end_time','duration','Event','full_name']] # Passed

Unnamed: 0,shift_number,period,shift_start_time,shift_end_time,duration,Event,full_name
22,1,1,0:00,0:26,00:26,,John Marino
23,2,1,2:18,3:09,00:51,,John Marino
24,3,1,6:22,7:28,01:06,P,John Marino
25,4,1,8:15,8:41,00:26,,John Marino
26,5,1,9:54,9:57,00:03,P,John Marino
27,6,1,11:53,12:54,01:01,,John Marino
28,7,1,15:26,16:25,00:59,,John Marino
29,8,1,17:21,18:15,00:54,,John Marino
30,9,2,0:00,0:27,00:27,,John Marino
31,10,2,3:00,4:23,01:23,G,John Marino


In [239]:
# Clayton Keller
extracted_shift_data[extracted_shift_data['full_name']=="Clayton Keller"][['shift_number','period','shift_start_time','shift_end_time','duration','Event','full_name']] # Passed

Unnamed: 0,shift_number,period,shift_start_time,shift_end_time,duration,Event,full_name
464,1,1,0:00,0:44,00:44,,Clayton Keller
465,2,1,2:49,3:33,00:44,,Clayton Keller
466,3,1,6:13,7:50,01:37,GP,Clayton Keller
467,4,1,9:17,9:57,00:40,P,Clayton Keller
468,5,1,12:15,12:59,00:44,,Clayton Keller
469,6,1,15:22,16:07,00:45,,Clayton Keller
470,7,1,17:21,18:10,00:49,,Clayton Keller
471,8,2,0:00,0:29,00:29,,Clayton Keller
472,9,2,3:00,4:23,01:23,G,Clayton Keller
473,10,2,6:08,6:40,00:32,,Clayton Keller


In [240]:
# Karel Vejmelka
extracted_shift_data[extracted_shift_data['full_name']=="Karel Vejmelka"][['shift_number','period','shift_start_time','shift_end_time','duration','Event','full_name']] # Passed

Unnamed: 0,shift_number,period,shift_start_time,shift_end_time,duration,Event,full_name
752,1,1,0:00,20:00,20:00,GP,Karel Vejmelka
753,2,2,0:00,20:00,20:00,GP,Karel Vejmelka
754,3,3,0:00,20:00,20:00,GP,Karel Vejmelka
755,4,4,0:00,5:00,05:00,P,Karel Vejmelka


Shift counts and shift times/durations match for each tested case.

Test Case 1 Passed ✅

### Test Case 2 - Compare Each Players All Sit, EV, PP, SH TOI
Shifts for game 2023020017 can be found at https://www.nhl.com/scores/htmlreports/20232024/TV020017.HTM and https://www.nhl.com/scores/htmlreports/20232024/TH020017.HTM

In [241]:
# Read in linemate_data 
linemate_data = pd.read_csv("data/linemate-data-for-testing-2023020017.csv")
linemate_data

Unnamed: 0.1,Unnamed: 0,home_player_1_name,home_player_1_id,home_player_1_position,home_player_2_name,home_player_2_id,home_player_2_position,home_player_3_name,home_player_3_id,home_player_3_position,...,home_skaters_on_ice,away_skaters_on_ice,strength,strength_cat,home_team,away_team,game_date,game_season,game_id,game_type
0,0,John Marino,8478507,D,Kevin Bahl,8480860,D,Nico Hischier,8480002,F,...,5,5,5v5,even,NJD,ARI,2023-10-13,20232024,2023020017,regular-season
1,1,John Marino,8478507,D,Kevin Bahl,8480860,D,Nico Hischier,8480002,F,...,5,5,5v5,even,NJD,ARI,2023-10-13,20232024,2023020017,regular-season
2,2,John Marino,8478507,D,Kevin Bahl,8480860,D,Nico Hischier,8480002,F,...,5,5,5v5,even,NJD,ARI,2023-10-13,20232024,2023020017,regular-season
3,3,John Marino,8478507,D,Kevin Bahl,8480860,D,Nico Hischier,8480002,F,...,5,5,5v5,even,NJD,ARI,2023-10-13,20232024,2023020017,regular-season
4,4,John Marino,8478507,D,Kevin Bahl,8480860,D,Nico Hischier,8480002,F,...,5,5,5v5,even,NJD,ARI,2023-10-13,20232024,2023020017,regular-season
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3895,3895,John Marino,8478507,D,Kevin Bahl,8480860,D,Nico Hischier,8480002,F,...,3,4,3v4,away_advantage,NJD,ARI,2023-10-13,20232024,2023020017,regular-season
3896,3896,John Marino,8478507,D,Kevin Bahl,8480860,D,Nico Hischier,8480002,F,...,3,4,3v4,away_advantage,NJD,ARI,2023-10-13,20232024,2023020017,regular-season
3897,3897,John Marino,8478507,D,Kevin Bahl,8480860,D,Nico Hischier,8480002,F,...,3,4,3v4,away_advantage,NJD,ARI,2023-10-13,20232024,2023020017,regular-season
3898,3898,John Marino,8478507,D,Kevin Bahl,8480860,D,Nico Hischier,8480002,F,...,3,4,3v4,away_advantage,NJD,ARI,2023-10-13,20232024,2023020017,regular-season


In [242]:
# Matias Maccelli
all_toi = extract_toi("Matias Maccelli","away","all")
ev_toi = extract_toi("Matias Maccelli","away","even")
pp_toi = extract_toi("Matias Maccelli","away","away_advantage")
sh_toi = extract_toi("Matias Maccelli","away","home_advantage")
print("All:",all_toi,"EV:",ev_toi,"PP:",pp_toi,"SH:",sh_toi)

All: 16.633333333333333 EV: 13.833333333333334 PP: 2.7333333333333334 SH: 0.06666666666666667


In [243]:
# Nick Bjugstad
all_toi = extract_toi("Nick Bjugstad","away","all")
ev_toi = extract_toi("Nick Bjugstad","away","even")
pp_toi = extract_toi("Nick Bjugstad","away","away_advantage")
sh_toi = extract_toi("Nick Bjugstad","away","home_advantage")
print("All:",all_toi,"EV:",ev_toi,"PP:",pp_toi,"SH:",sh_toi)

All: 15.95 EV: 12.116666666666667 PP: 2.566666666666667 SH: 1.2666666666666666


In [244]:
# Brendan Smith
all_toi = extract_toi("Brendan Smith","home","all")
ev_toi = extract_toi("Brendan Smith","home","even")
pp_toi = extract_toi("Brendan Smith","home","home_advantage")
sh_toi = extract_toi("Brendan Smith","home","away_advantage")
print("All:",all_toi,"EV:",ev_toi,"PP:",pp_toi,"SH:",sh_toi)

All: 15.466666666666667 EV: 12.516666666666667 PP: 0.0 SH: 2.95


In [245]:
# Dougie Hamilton
all_toi = extract_toi("Dougie Hamilton","home","all")
ev_toi = extract_toi("Dougie Hamilton","home","even")
pp_toi = extract_toi("Dougie Hamilton","home","home_advantage")
sh_toi = extract_toi("Dougie Hamilton","home","away_advantage")
print("All:",all_toi,"EV:",ev_toi,"PP:",pp_toi,"SH:",sh_toi)

All: 22.166666666666668 EV: 19.8 PP: 2.3666666666666667 SH: 0.0


In [246]:
# Akiria Schmid
all_toi = extract_toi("Akira Schmid","home","all")
ev_toi = extract_toi("Akira Schmid","home","even")
pp_toi = extract_toi("Akira Schmid","home","home_advantage")
sh_toi = extract_toi("Akira Schmid","home","away_advantage")
print("All:",all_toi,"EV:",ev_toi,"PP:",pp_toi,"SH:",sh_toi)

All: 65.0 EV: 49.016666666666666 PP: 6.216666666666667 SH: 9.766666666666667


Player TOI's match exact for test cases

Test Case 2 Passed ✅

### Test Case 3 - Compare html PBP On-Ice for Events to linemate_data Data Frame
Should show the same players on ice at each even. PBP data can be found at https://www.nhl.com/scores/htmlreports/20232024/PL020017.HTM

In [247]:
# Goal scored at 6:22 of the 1st period
linemate_data[linemate_data['second']==382][['home_player_1_name','home_player_2_name','home_player_3_name','home_player_4_name','home_player_5_name','home_player_6_name',
                                             'away_player_1_name','away_player_2_name','away_player_3_name','away_player_4_name','away_player_5_name','away_player_6_name']] # Exact match

Unnamed: 0,home_player_1_name,home_player_2_name,home_player_3_name,home_player_4_name,home_player_5_name,home_player_6_name,away_player_1_name,away_player_2_name,away_player_3_name,away_player_4_name,away_player_5_name,away_player_6_name
381,Dougie Hamilton,Jonas Siegenthaler,Alexander Holtz,Ondrej Palat,Erik Haula,Akira Schmid,Matt Dumba,J.J. Moser,Nick Schmaltz,Clayton Keller,Barrett Hayton,Karel Vejmelka


In [248]:
# Random shot at 1:21 of the 2nd
linemate_data[linemate_data['second']==1281][['home_player_1_name','home_player_2_name','home_player_3_name','home_player_4_name','home_player_5_name','home_player_6_name',
                                             'away_player_1_name','away_player_2_name','away_player_3_name','away_player_4_name','away_player_5_name','away_player_6_name']] # Exact match

Unnamed: 0,home_player_1_name,home_player_2_name,home_player_3_name,home_player_4_name,home_player_5_name,home_player_6_name,away_player_1_name,away_player_2_name,away_player_3_name,away_player_4_name,away_player_5_name,away_player_6_name
1280,Brendan Smith,Luke Hughes,Jesper Bratt,Tyler Toffoli,Jack Hughes,Akira Schmid,Josh Brown,Juuso Valimaki,Matias Maccelli,Lawson Crouse,Logan Cooley,Karel Vejmelka


In [249]:
# Random shot at 2:17 of OT
linemate_data[linemate_data['second']==3738][['home_player_1_name','home_player_2_name','home_player_3_name','home_player_4_name','home_player_5_name','home_player_6_name',
                                             'away_player_1_name','away_player_2_name','away_player_3_name','away_player_4_name','away_player_5_name','away_player_6_name']] # Exact match

Unnamed: 0,home_player_1_name,home_player_2_name,home_player_3_name,home_player_4_name,home_player_5_name,home_player_6_name,away_player_1_name,away_player_2_name,away_player_3_name,away_player_4_name,away_player_5_name,away_player_6_name
3737,John Marino,Jesper Bratt,Jack Hughes,Akira Schmid,,,Sean Durzi,Jason Zucker,Barrett Hayton,Karel Vejmelka,,


On-Ice players match exactly for test cases

Test Case 3 Passed ✅

### Test Case 4 - Ensure Forward Line and Defensive Pair 5v5 TOI is Correct
Manually loop through to make sure the function to calculate forward and defensemen line toi works

In [251]:
# Read in forward 5v5 report DF
forward_5v5 = pd.read_csv("data/5v5-forward-report-for-testing-2023020017.csv").head(10)
forward_5v5

Unnamed: 0.1,Unnamed: 0,forward_line_id,forward_1_name,forward_1_id,forward_2_name,forward_2_id,forward_3_name,forward_3_id,toi_secs,toi_mins,team,date,season,game_id,game_type
0,0,8475726-8479407-8481559,Jesper Bratt,8479407,Tyler Toffoli,8475726,Jack Hughes,8481559.0,622,10.366667,NJD,2023-10-13,20232024,2023020017,regular-season
1,1,8477951-8479343-8480849,Nick Schmaltz,8477951,Clayton Keller,8479343,Barrett Hayton,8480849.0,584,9.733333,ARI,2023-10-13,20232024,2023020017,regular-season
2,2,8475760-8478474-8481711,Nick Bjugstad,8475760,Matias Maccelli,8481711,Lawson Crouse,8478474.0,573,9.55,ARI,2023-10-13,20232024,2023020017,regular-season
3,3,8478414-8480002-8482110,Nico Hischier,8480002,Timo Meier,8478414,Dawson Mercer,8482110.0,533,8.883333,NJD,2023-10-13,20232024,2023020017,regular-season
4,4,8475722-8477021-8483431,Alex Kerfoot,8477021,Jason Zucker,8475722,Logan Cooley,8483431.0,500,8.333333,ARI,2023-10-13,20232024,2023020017,regular-season
5,5,8475287-8476292-8482125,Alexander Holtz,8482125,Ondrej Palat,8476292,Erik Haula,8475287.0,376,6.266667,NJD,2023-10-13,20232024,2023020017,regular-season
6,6,8477070-8479619-8480855,Jack McBain,8480855,Liam O'Brien,8477070,Michael Carcone,8479619.0,354,5.9,ARI,2023-10-13,20232024,2023020017,regular-season
7,7,8477931-8479414-8479415,Nathan Bastian,8479414,Michael McLeod,8479415,Tomas Nosek,8477931.0,337,5.616667,NJD,2023-10-13,20232024,2023020017,regular-season
8,8,8478414-8481559-8482110,Timo Meier,8478414,Jack Hughes,8481559,Dawson Mercer,8482110.0,125,2.083333,NJD,2023-10-13,20232024,2023020017,regular-season
9,9,8475287-8476292-8479414,Nathan Bastian,8479414,Ondrej Palat,8476292,Erik Haula,8475287.0,74,1.233333,NJD,2023-10-13,20232024,2023020017,regular-season


In [252]:
# For the top 5 most common lines in the game, lookup how maany 5v5 mins they played together through the linemate_data df. If they  match exactly, our function to build forward line toi reports works perfectly
for line in forward_5v5['forward_line_id'].tolist():
    p1 = forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['forward_1_name']
    p2 = forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['forward_2_name']
    p3 = forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['forward_3_name']
    print(p1,p2,p3)
    if forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['team']=="NJD":
        print("Manual Calculation TOI:",calculate_forward_line_toi(p1,p2,p3,'home')," - Linemate Report TOI:",forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['toi_mins'])
    else:
        print("Manual Calculation TOI:",calculate_forward_line_toi(p1,p2,p3,'away')," - Linemate Report TOI:",forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['toi_mins'])
    print()
# Looks good. Lets do defenders now

Jesper Bratt Tyler Toffoli Jack Hughes
Manual Calculation TOI: 10.366666666666667  - Linemate Report TOI: 10.366666666666667

Nick Schmaltz Clayton Keller Barrett Hayton
Manual Calculation TOI: 9.733333333333333  - Linemate Report TOI: 9.733333333333333

Nick Bjugstad Matias Maccelli Lawson Crouse
Manual Calculation TOI: 9.55  - Linemate Report TOI: 9.55

Nico Hischier Timo Meier Dawson Mercer
Manual Calculation TOI: 8.883333333333333  - Linemate Report TOI: 8.883333333333333

Alex Kerfoot Jason Zucker Logan Cooley
Manual Calculation TOI: 8.333333333333334  - Linemate Report TOI: 8.333333333333334

Alexander Holtz Ondrej Palat Erik Haula
Manual Calculation TOI: 6.266666666666667  - Linemate Report TOI: 6.266666666666667

Jack McBain Liam O'Brien Michael Carcone
Manual Calculation TOI: 5.9  - Linemate Report TOI: 5.9

Nathan Bastian Michael McLeod Tomas Nosek
Manual Calculation TOI: 5.616666666666666  - Linemate Report TOI: 5.616666666666666

Timo Meier Jack Hughes Dawson Mercer
Manual 

In [253]:
# Read in defender report
defender_5v5 = pd.read_csv("data/5v5-defender-report-for-testing-2023020017.csv").head(10)
defender_5v5

Unnamed: 0.1,Unnamed: 0,defensemen_pair_id,defensemen_1_name,defensemen_1_id,defensemen_2_name,defensemen_2_id,toi_secs,toi_mins,team,date,season,game_id,game_type
0,0,8476462-8478399,Dougie Hamilton,8476462,Jonas Siegenthaler,8478399,841,14.016667,NJD,2023-10-13,20232024,2023020017,regular-season
1,1,8478507-8480860,John Marino,8478507,Kevin Bahl,8480860,717,11.95,NJD,2023-10-13,20232024,2023020017,regular-season
2,2,8474090-8482684,Brendan Smith,8474090,Luke Hughes,8482684,685,11.416667,NJD,2023-10-13,20232024,2023020017,regular-season
3,3,8476856-8482655,Matt Dumba,8476856,J.J. Moser,8482655,643,10.716667,ARI,2023-10-13,20232024,2023020017,regular-season
4,4,8478408-8479976,Juuso Valimaki,8479976,Travis Dermott,8478408,591,9.85,ARI,2023-10-13,20232024,2023020017,regular-season
5,5,8477384-8480434,Josh Brown,8477384,Sean Durzi,8480434,466,7.766667,ARI,2023-10-13,20232024,2023020017,regular-season
6,6,8476856-8478408,Matt Dumba,8476856,Travis Dermott,8478408,340,5.666667,ARI,2023-10-13,20232024,2023020017,regular-season
7,7,8479976-8480434,Juuso Valimaki,8479976,Sean Durzi,8480434,201,3.35,ARI,2023-10-13,20232024,2023020017,regular-season
8,8,8476462-8480860,Dougie Hamilton,8476462,Kevin Bahl,8480860,101,1.683333,NJD,2023-10-13,20232024,2023020017,regular-season
9,9,8477384-8482655,Josh Brown,8477384,J.J. Moser,8482655,93,1.55,ARI,2023-10-13,20232024,2023020017,regular-season


In [254]:
# For the top 5 most common lines in the game, lookup how maany 5v5 mins they played together through the linemate_data df. If they  match exactly, our function to build forward line toi reports works perfectly
for pair in defender_5v5['defensemen_pair_id'].tolist():
    p1 = defender_5v5[defender_5v5['defensemen_pair_id']==pair].iloc[0]['defensemen_1_name']
    p2 = defender_5v5[defender_5v5['defensemen_pair_id']==pair].iloc[0]['defensemen_2_name']
    print(p1,p2)
    if defender_5v5[defender_5v5['defensemen_pair_id']==pair].iloc[0]['team']=="NJD":
        print("Manual Calculation TOI:",calculate_defender_pair_toi(p1,p2,'home')," - Linemate Report TOI:",defender_5v5[defender_5v5['defensemen_pair_id']==pair].iloc[0]['toi_mins'])
    else:
        print("Manual Calculation TOI:",calculate_defender_pair_toi(p1,p2,'away')," - Linemate Report TOI:",defender_5v5[defender_5v5['defensemen_pair_id']==pair].iloc[0]['toi_mins'])
    print()

Dougie Hamilton Jonas Siegenthaler
Manual Calculation TOI: 14.016666666666667  - Linemate Report TOI: 14.016666666666667

John Marino Kevin Bahl
Manual Calculation TOI: 11.95  - Linemate Report TOI: 11.95

Brendan Smith Luke Hughes
Manual Calculation TOI: 11.416666666666666  - Linemate Report TOI: 11.416666666666666

Matt Dumba J.J. Moser
Manual Calculation TOI: 10.716666666666667  - Linemate Report TOI: 10.716666666666669

Juuso Valimaki Travis Dermott
Manual Calculation TOI: 9.85  - Linemate Report TOI: 9.85

Josh Brown Sean Durzi
Manual Calculation TOI: 7.766666666666667  - Linemate Report TOI: 7.766666666666667

Matt Dumba Travis Dermott
Manual Calculation TOI: 5.666666666666667  - Linemate Report TOI: 5.666666666666667

Juuso Valimaki Sean Durzi
Manual Calculation TOI: 3.35  - Linemate Report TOI: 3.35

Dougie Hamilton Kevin Bahl
Manual Calculation TOI: 1.6833333333333333  - Linemate Report TOI: 1.6833333333333331

Josh Brown J.J. Moser
Manual Calculation TOI: 1.55  - Linemate Rep

5v5 TOI's match perfectly

Test Case 4 Passed ✅

#### Game 2023020017 Passed ✅

# Test Game 2 - 2023020337 - A Game Where a Goalie Gets Pulled
What we expect - A slight difference in a few player TOI's from my data to the html shift data totals. From my understanding, the NHL counts power paly time rather than man advantage time. This scraper instead does man advantage time (and the other way around of course for man disadvantge/short handed)

### Test Case 1 - Compare Extracted Shift Data to html Shift Data
Shift for game 2023020337 can be found at https://www.nhl.com/scores/htmlreports/20232024/TV020337.HTM and https://www.nhl.com/scores/htmlreports/20232024/TH020337.HTM

In [170]:
# Read in my extracted shifts from this game. This is the df of extracted shifts from the bs4 function. It's what is used to build the by second df
extracted_shift_data = pd.read_csv("data/shift-data-for-testing-2023020337.csv")
extracted_shift_data

Unnamed: 0.1,Unnamed: 0,shift_number,period,shift_start_time,shift_end_time,duration,Event,player_number,last_name,first_name,shift_start_time_seconds,shift_end_time_seconds,duration_seconds,team,playerId,positionCode,firstName.default,lastName.default,position,full_name
0,0,1,1,1:35,2:14,00:39,,4,SILLINGER,COLE,95,134,39,CBJ,8482705,C,Cole,Sillinger,F,Cole Sillinger
1,1,2,1,4:55,5:36,00:41,,4,SILLINGER,COLE,295,336,41,CBJ,8482705,C,Cole,Sillinger,F,Cole Sillinger
2,2,3,1,8:59,9:56,00:57,,4,SILLINGER,COLE,539,596,57,CBJ,8482705,C,Cole,Sillinger,F,Cole Sillinger
3,3,4,1,13:35,14:26,00:51,,4,SILLINGER,COLE,815,866,51,CBJ,8482705,C,Cole,Sillinger,F,Cole Sillinger
4,4,5,1,17:00,18:05,01:05,,4,SILLINGER,COLE,1020,1085,65,CBJ,8482705,C,Cole,Sillinger,F,Cole Sillinger
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
645,321,13,3,3:56,4:13,00:17,,91,MONAHAN,SEAN,2636,2653,17,MTL,8477497,C,Sean,Monahan,F,Sean Monahan
646,322,14,3,6:57,7:53,00:56,,91,MONAHAN,SEAN,2817,2873,56,MTL,8477497,C,Sean,Monahan,F,Sean Monahan
647,323,15,3,12:47,14:01,01:14,,91,MONAHAN,SEAN,3167,3241,74,MTL,8477497,C,Sean,Monahan,F,Sean Monahan
648,324,16,3,16:44,17:42,00:58,G,91,MONAHAN,SEAN,3404,3462,58,MTL,8477497,C,Sean,Monahan,F,Sean Monahan


#### Test Case 1A - Compare Shift Counts for Every Player in Game 2023020017

In [171]:
# This is the number of shifts for each player. Go throguh each html doc and compare
extracted_shift_data.groupby('full_name').count()[['shift_number']]

Unnamed: 0_level_0,shift_number
full_name,Unnamed: 1_level_1
Adam Boqvist,20
Adam Fantilli,15
Alex Newhook,18
Alexandre Texier,14
Boone Jenner,18
Brendan Gallagher,17
Christian Dvorak,17
Cole Caufield,15
Cole Sillinger,19
David Jiricek,18


#### Test Case 1B - Compare shifts match for a few players on the HTML reports

In [172]:
# Cole Sillinger
extracted_shift_data[extracted_shift_data['full_name']=="Cole Sillinger"][['shift_number','period','shift_start_time','shift_end_time','duration','Event','full_name']] # Passed

Unnamed: 0,shift_number,period,shift_start_time,shift_end_time,duration,Event,full_name
0,1,1,1:35,2:14,00:39,,Cole Sillinger
1,2,1,4:55,5:36,00:41,,Cole Sillinger
2,3,1,8:59,9:56,00:57,,Cole Sillinger
3,4,1,13:35,14:26,00:51,,Cole Sillinger
4,5,1,17:00,18:05,01:05,,Cole Sillinger
5,6,2,0:00,1:23,01:23,,Cole Sillinger
6,7,2,4:02,4:08,00:06,,Cole Sillinger
7,8,2,4:49,5:52,01:03,,Cole Sillinger
8,9,2,9:47,10:40,00:53,,Cole Sillinger
9,10,2,12:09,12:53,00:44,,Cole Sillinger


In [173]:
# Yegor Chinakhov
extracted_shift_data[extracted_shift_data['full_name']=="Yegor Chinakhov"][['shift_number','period','shift_start_time','shift_end_time','duration','Event','full_name']] # Passed

Unnamed: 0,shift_number,period,shift_start_time,shift_end_time,duration,Event,full_name
284,1,1,1:37,2:14,00:37,,Yegor Chinakhov
285,2,1,4:54,5:33,00:39,,Yegor Chinakhov
286,3,1,8:30,9:39,01:09,,Yegor Chinakhov
287,4,1,13:44,14:24,00:40,,Yegor Chinakhov
288,5,1,16:54,17:59,01:05,,Yegor Chinakhov
289,6,2,0:00,1:25,01:25,,Yegor Chinakhov
290,7,2,4:55,5:54,00:59,,Yegor Chinakhov
291,8,2,9:16,10:13,00:57,G,Yegor Chinakhov
292,9,2,12:09,12:53,00:44,,Yegor Chinakhov
293,10,2,15:00,15:57,00:57,,Yegor Chinakhov


In [174]:
# Sam Montembeault
extracted_shift_data[extracted_shift_data['full_name']=="Sam Montembeault"][['shift_number','period','shift_start_time','shift_end_time','duration','Event','full_name']] # Passed

Unnamed: 0,shift_number,period,shift_start_time,shift_end_time,duration,Event,full_name
531,1,1,0:00,20:00,20:00,,Sam Montembeault
532,2,2,0:00,20:00,20:00,GP,Sam Montembeault
533,3,3,0:00,20:00,20:00,GP,Sam Montembeault


Shift counts and shift times/durations match for each tested case.

Test Case 1 Passed ✅

### Test Case 2 - Compare Each Players All Sit, EV, PP, SH TOI
Shifts for game 2023020337 can be found at https://www.nhl.com/scores/htmlreports/20232024/TV020337.HTM and https://www.nhl.com/scores/htmlreports/20232024/TH020337.HTM

In [175]:
# Read in linemate_data 
linemate_data = pd.read_csv("data/linemate-data-for-testing-2023020337.csv")
linemate_data

Unnamed: 0.1,Unnamed: 0,home_player_1_name,home_player_1_id,home_player_1_position,home_player_2_name,home_player_2_id,home_player_2_position,home_player_3_name,home_player_3_id,home_player_3_position,...,home_skaters_on_ice,away_skaters_on_ice,strength,strength_cat,home_team,away_team,game_date,game_season,game_id,game_type
0,0,Zach Werenski,8478460,D,Adam Boqvist,8480871,D,Johnny Gaudreau,8476346,F,...,5,5,5v5,even,CBJ,MTL,2023-11-29,20232024,2023020337,regular-season
1,1,Zach Werenski,8478460,D,Adam Boqvist,8480871,D,Johnny Gaudreau,8476346,F,...,5,5,5v5,even,CBJ,MTL,2023-11-29,20232024,2023020337,regular-season
2,2,Zach Werenski,8478460,D,Adam Boqvist,8480871,D,Johnny Gaudreau,8476346,F,...,5,5,5v5,even,CBJ,MTL,2023-11-29,20232024,2023020337,regular-season
3,3,Zach Werenski,8478460,D,Adam Boqvist,8480871,D,Johnny Gaudreau,8476346,F,...,5,5,5v5,even,CBJ,MTL,2023-11-29,20232024,2023020337,regular-season
4,4,Zach Werenski,8478460,D,Adam Boqvist,8480871,D,Johnny Gaudreau,8476346,F,...,5,5,5v5,even,CBJ,MTL,2023-11-29,20232024,2023020337,regular-season
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3595,3595,Jake Bean,8479402,D,Adam Boqvist,8480871,D,Sean Kuraly,8476374,F,...,5,5,5v5,even,CBJ,MTL,2023-11-29,20232024,2023020337,regular-season
3596,3596,Jake Bean,8479402,D,Adam Boqvist,8480871,D,Sean Kuraly,8476374,F,...,5,5,5v5,even,CBJ,MTL,2023-11-29,20232024,2023020337,regular-season
3597,3597,Jake Bean,8479402,D,Adam Boqvist,8480871,D,Sean Kuraly,8476374,F,...,5,5,5v5,even,CBJ,MTL,2023-11-29,20232024,2023020337,regular-season
3598,3598,Jake Bean,8479402,D,Adam Boqvist,8480871,D,Sean Kuraly,8476374,F,...,5,5,5v5,even,CBJ,MTL,2023-11-29,20232024,2023020337,regular-season


In [185]:
# Nick Suzuki
all_toi = extract_toi("Nick Suzuki","away","all")
ev_toi = extract_toi("Nick Suzuki","away","even")
pp_toi = extract_toi("Nick Suzuki","away","away_advantage")
sh_toi = extract_toi("Nick Suzuki","away","home_advantage")
print("All:",all_toi,"EV:",ev_toi,"PP:",pp_toi,"SH:",sh_toi)
'''
This is a test case I wanted to bring up. 
If you compare to the html shifts, we see that Fantilli is crtedited with less EV TOI and more PP TOI form the scraper compared to the html shift report
The html shift report does not count when a goalie is pulled as an advtange/disadvanage like the scraper does.
If we look at the html play py play report, we can see that at the end of the game, CBJ pulled their goalie. Suzuki was on the ice during this time, as we can see from a shot taken at 18:44 in the 3rd period on the html pbp reprot.
If we look into the linemate_data df, we can confirm this is true as well.
The html pbp shows this as an EV shot, which technially it's not as CBJ has a skater advantage

I will keep this as it is because again, this is technially not even strength play, in my opinion. I would hear people out for changing this, but I think this is correct.
'''

All: 16.45 EV: 15.533333333333333 PP: 0.0 SH: 0.9166666666666666


"\nThis is a test case I wanted to bring up. \nIf you compare to the html shifts, we see that Fantilli is crtedited with less EV TOI and more PP TOI form the scraper compared to the html shift report\nThe html shift report does not count when a goalie is pulled as an advtange/disadvanage like the scraper does.\nIf we look at the html play py play report, we can see that at the end of the game, CBJ pulled their goalie. Suzuki was on the ice during this time, as we can see from a shot taken at 18:44 in the 3rd.\nThe html pbp shows this as an EV shot, which technially it's not as CBJ has a skater advantage\n\nI will keep this as it is because again, this is technially not even strength play, in my opinion. I would hear people out for changing this, but I think this is correct.\n"

In [178]:
# Josh Anderson
all_toi = extract_toi("Josh Anderson","away","all")
ev_toi = extract_toi("Josh Anderson","away","even")
pp_toi = extract_toi("Josh Anderson","away","away_advantage")
sh_toi = extract_toi("Josh Anderson","away","home_advantage")
print("All:",all_toi,"EV:",ev_toi,"PP:",pp_toi,"SH:",sh_toi)

All: 16.216666666666665 EV: 16.216666666666665 PP: 0.0 SH: 0.0


In [179]:
# Eric Robinson
all_toi = extract_toi("Eric Robinson","home","all")
ev_toi = extract_toi("Eric Robinson","home","even")
pp_toi = extract_toi("Eric Robinson","home","home_advantage")
sh_toi = extract_toi("Eric Robinson","home","away_advantage")
print("All:",all_toi,"EV:",ev_toi,"PP:",pp_toi,"SH:",sh_toi)

All: 13.166666666666666 EV: 13.166666666666666 PP: 0.0 SH: 0.0


In [184]:
# Adam Fantilli
all_toi = extract_toi("Adam Fantilli","home","all")
ev_toi = extract_toi("Adam Fantilli","home","even")
pp_toi = extract_toi("Adam Fantilli","home","home_advantage")
sh_toi = extract_toi("Adam Fantilli","home","away_advantage")
print("All:",all_toi,"EV:",ev_toi,"PP:",pp_toi,"SH:",sh_toi)
'''
Same case as Suzuki
'''

All: 12.5 EV: 11.116666666666667 PP: 1.3833333333333333 SH: 0.0


'\nSame case as Suzuki\n'

In [183]:
# Elvis Merzlinkis
all_toi = extract_toi("Elvis Merzlikins","home","all")
ev_toi = extract_toi("Elvis Merzlikins","home","even")
pp_toi = extract_toi("Elvis Merzlikins","home","home_advantage")
sh_toi = extract_toi("Elvis Merzlikins","home","away_advantage")
print("All:",all_toi,"EV:",ev_toi,"PP:",pp_toi,"SH:",sh_toi)

All: 58.93333333333333 EV: 56.93333333333333 PP: 2.0 SH: 0.0


Player TOI's and the time differences match when they are expected to

Test Case 2 Passed ✅

### Test Case 3 - Compare html PBP On-Ice for Events to linemate_data Data Frame
Should show the same players on ice at each even. PBP data can be found at https://www.nhl.com/scores/htmlreports/20232024/PL020337.HTM

In [190]:
# Hit at 3 minutes of the 1st period
linemate_data[linemate_data['second']==180][['home_player_1_name','home_player_2_name','home_player_3_name','home_player_4_name','home_player_5_name','home_player_6_name',
                                             'away_player_1_name','away_player_2_name','away_player_3_name','away_player_4_name','away_player_5_name','away_player_6_name']] # Exact match

Unnamed: 0,home_player_1_name,home_player_2_name,home_player_3_name,home_player_4_name,home_player_5_name,home_player_6_name,away_player_1_name,away_player_2_name,away_player_3_name,away_player_4_name,away_player_5_name,away_player_6_name
179,Zach Werenski,Adam Boqvist,Sean Kuraly,Justin Danforth,Eric Robinson,Elvis Merzlikins,Mike Matheson,Kaiden Guhle,Jesse Ylönen,Tanner Pearson,Jake Evans,Sam Montembeault


In [192]:
# Shot at 2:44 of the 2nd period
linemate_data[linemate_data['second']==1364][['home_player_1_name','home_player_2_name','home_player_3_name','home_player_4_name','home_player_5_name','home_player_6_name',
                                             'away_player_1_name','away_player_2_name','away_player_3_name','away_player_4_name','away_player_5_name','away_player_6_name']] # Exact match

Unnamed: 0,home_player_1_name,home_player_2_name,home_player_3_name,home_player_4_name,home_player_5_name,home_player_6_name,away_player_1_name,away_player_2_name,away_player_3_name,away_player_4_name,away_player_5_name,away_player_6_name
1363,Ivan Provorov,David Jiricek,Sean Kuraly,Justin Danforth,Eric Robinson,Elvis Merzlikins,Johnathan Kovacevic,Jayden Struble,Juraj Slafkovsky,Cole Caufield,Christian Dvorak,Sam Montembeault


In [193]:
# Shot at 18:44 of the 3rd period
linemate_data[linemate_data['second']==3524][['home_player_1_name','home_player_2_name','home_player_3_name','home_player_4_name','home_player_5_name','home_player_6_name',
                                             'away_player_1_name','away_player_2_name','away_player_3_name','away_player_4_name','away_player_5_name','away_player_6_name']] # Exact match

Unnamed: 0,home_player_1_name,home_player_2_name,home_player_3_name,home_player_4_name,home_player_5_name,home_player_6_name,away_player_1_name,away_player_2_name,away_player_3_name,away_player_4_name,away_player_5_name,away_player_6_name
3523,Zach Werenski,Adam Fantilli,Johnny Gaudreau,Patrik Laine,Boone Jenner,Kirill Marchenko,Kaiden Guhle,Justin Barron,Nick Suzuki,Christian Dvorak,Joel Armia,Sam Montembeault


### Test Case 4 - Ensure Forward Line and Defensive Pair 5v5 TOI is Correct
Manually loop through to make sure the function to calculate forward and defensemen line toi works

In [195]:
# Read in forward 5v5 report DF
forward_5v5 = pd.read_csv("data/5v5-forward-report-for-testing-2023020337.csv").head(10)
forward_5v5

Unnamed: 0.1,Unnamed: 0,forward_line_id,forward_1_name,forward_1_id,forward_2_name,forward_2_id,forward_3_name,forward_3_id,toi_secs,toi_mins,team,date,season,game_id,game_type
0,0,8476346-8476432-8480893,Johnny Gaudreau,8476346,Boone Jenner,8476432,Kirill Marchenko,8480893,831,13.85,CBJ,2023-11-29,20232024,2023020337,regular-season
1,1,8476374-8479941-8480762,Sean Kuraly,8476374,Justin Danforth,8479941,Eric Robinson,8480762,747,12.45,CBJ,2023-11-29,20232024,2023020337,regular-season
2,2,8475848-8480018-8481618,Brendan Gallagher,8475848,Nick Suzuki,8480018,Alex Newhook,8481618,738,12.3,MTL,2023-11-29,20232024,2023020337,regular-season
3,3,8477989-8481540-8483515,Juraj Slafkovsky,8483515,Cole Caufield,8481540,Christian Dvorak,8477989,710,11.833333,MTL,2023-11-29,20232024,2023020337,regular-season
4,4,8481716-8482475-8482705,Cole Sillinger,8482705,Dmitri Voronkov,8481716,Yegor Chinakhov,8482475,679,11.316667,CBJ,2023-11-29,20232024,2023020337,regular-season
5,5,8476469-8476981-8477497,Josh Anderson,8476981,Joel Armia,8476469,Sean Monahan,8477497,647,10.783333,MTL,2023-11-29,20232024,2023020337,regular-season
6,6,8479339-8480074-8484166,Adam Fantilli,8484166,Patrik Laine,8479339,Alexandre Texier,8480074,464,7.733333,CBJ,2023-11-29,20232024,2023020337,regular-season
7,7,8476871-8478133-8481058,Jesse Ylönen,8481058,Tanner Pearson,8476871,Jake Evans,8478133,457,7.616667,MTL,2023-11-29,20232024,2023020337,regular-season
8,8,8476981-8477497-8478133,Josh Anderson,8476981,Jake Evans,8478133,Sean Monahan,8477497,149,2.483333,MTL,2023-11-29,20232024,2023020337,regular-season
9,9,8476346-8476374-8480893,Sean Kuraly,8476374,Johnny Gaudreau,8476346,Kirill Marchenko,8480893,71,1.183333,CBJ,2023-11-29,20232024,2023020337,regular-season


In [196]:
# For the top 5 most common lines in the game, lookup how maany 5v5 mins they played together through the linemate_data df. If they  match exactly, our function to build forward line toi reports works perfectly
for line in forward_5v5['forward_line_id'].tolist():
    p1 = forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['forward_1_name']
    p2 = forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['forward_2_name']
    p3 = forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['forward_3_name']
    print(p1,p2,p3)
    if forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['team']=="CBJ":
        print("Manual Calculation TOI:",calculate_forward_line_toi(p1,p2,p3,'home')," - Linemate Report TOI:",forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['toi_mins'])
    else:
        print("Manual Calculation TOI:",calculate_forward_line_toi(p1,p2,p3,'away')," - Linemate Report TOI:",forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['toi_mins'])
    print()
# Looks good

Johnny Gaudreau Boone Jenner Kirill Marchenko
Manual Calculation TOI: 13.85  - Linemate Report TOI: 13.85

Sean Kuraly Justin Danforth Eric Robinson
Manual Calculation TOI: 12.45  - Linemate Report TOI: 12.45

Brendan Gallagher Nick Suzuki Alex Newhook
Manual Calculation TOI: 12.3  - Linemate Report TOI: 12.3

Juraj Slafkovsky Cole Caufield Christian Dvorak
Manual Calculation TOI: 11.833333333333334  - Linemate Report TOI: 11.833333333333334

Cole Sillinger Dmitri Voronkov Yegor Chinakhov
Manual Calculation TOI: 11.316666666666666  - Linemate Report TOI: 11.316666666666666

Josh Anderson Joel Armia Sean Monahan
Manual Calculation TOI: 10.783333333333333  - Linemate Report TOI: 10.783333333333331

Adam Fantilli Patrik Laine Alexandre Texier
Manual Calculation TOI: 7.733333333333333  - Linemate Report TOI: 7.733333333333333

Jesse Ylönen Tanner Pearson Jake Evans
Manual Calculation TOI: 7.616666666666666  - Linemate Report TOI: 7.616666666666666

Josh Anderson Jake Evans Sean Monahan
Man

In [201]:
# Read in defender report
defender_5v5 = pd.read_csv("data/5v5-defender-report-for-testing-2023020337.csv").head(10)
defender_5v5

Unnamed: 0.1,Unnamed: 0,defensemen_pair_id,defensemen_1_name,defensemen_1_id,defensemen_2_name,defensemen_2_id,toi_secs,toi_mins,team,date,season,game_id,game_type
0,0,8482087-8482111,Kaiden Guhle,8482087,Justin Barron,8482111,1125,18.75,MTL,2023-11-29,20232024,2023020337,regular-season
1,1,8478460-8480871,Zach Werenski,8478460,Adam Boqvist,8480871,1000,16.666667,CBJ,2023-11-29,20232024,2023020337,regular-season
2,2,8475790-8479402,Jake Bean,8479402,Erik Gudbranson,8475790,947,15.783333,CBJ,2023-11-29,20232024,2023020337,regular-season
3,3,8476875-8480184,Mike Matheson,8476875,Gustav Lindström,8480184,936,15.6,MTL,2023-11-29,20232024,2023020337,regular-season
4,4,8478500-8483460,Ivan Provorov,8478500,David Jiricek,8483460,918,15.3,CBJ,2023-11-29,20232024,2023020337,regular-season
5,5,8480192-8481593,Johnathan Kovacevic,8480192,Jayden Struble,8481593,694,11.566667,MTL,2023-11-29,20232024,2023020337,regular-season
6,6,8476875-8481593,Mike Matheson,8476875,Jayden Struble,8481593,109,1.816667,MTL,2023-11-29,20232024,2023020337,regular-season
7,7,8475790-8478500,Ivan Provorov,8478500,Erik Gudbranson,8475790,108,1.8,CBJ,2023-11-29,20232024,2023020337,regular-season
8,8,8475790-8478460,Zach Werenski,8478460,Erik Gudbranson,8475790,108,1.8,CBJ,2023-11-29,20232024,2023020337,regular-season
9,9,8476875-8480192,Mike Matheson,8476875,Johnathan Kovacevic,8480192,92,1.533333,MTL,2023-11-29,20232024,2023020337,regular-season


In [202]:
# For the top 5 most common lines in the game, lookup how maany 5v5 mins they played together through the linemate_data df. If they  match exactly, our function to build forward line toi reports works perfectly
for pair in defender_5v5['defensemen_pair_id'].tolist():
    p1 = defender_5v5[defender_5v5['defensemen_pair_id']==pair].iloc[0]['defensemen_1_name']
    p2 = defender_5v5[defender_5v5['defensemen_pair_id']==pair].iloc[0]['defensemen_2_name']
    print(p1,p2)
    if defender_5v5[defender_5v5['defensemen_pair_id']==pair].iloc[0]['team']=="CBJ":
        print("Manual Calculation TOI:",calculate_defender_pair_toi(p1,p2,'home')," - Linemate Report TOI:",defender_5v5[defender_5v5['defensemen_pair_id']==pair].iloc[0]['toi_mins'])
    else:
        print("Manual Calculation TOI:",calculate_defender_pair_toi(p1,p2,'away')," - Linemate Report TOI:",defender_5v5[defender_5v5['defensemen_pair_id']==pair].iloc[0]['toi_mins'])
    print()

Kaiden Guhle Justin Barron
Manual Calculation TOI: 18.75  - Linemate Report TOI: 18.75

Zach Werenski Adam Boqvist
Manual Calculation TOI: 16.666666666666668  - Linemate Report TOI: 16.666666666666668

Jake Bean Erik Gudbranson
Manual Calculation TOI: 15.783333333333333  - Linemate Report TOI: 15.783333333333331

Mike Matheson Gustav Lindström
Manual Calculation TOI: 15.6  - Linemate Report TOI: 15.6

Ivan Provorov David Jiricek
Manual Calculation TOI: 15.3  - Linemate Report TOI: 15.3

Johnathan Kovacevic Jayden Struble
Manual Calculation TOI: 11.566666666666666  - Linemate Report TOI: 11.566666666666666

Mike Matheson Jayden Struble
Manual Calculation TOI: 1.8166666666666667  - Linemate Report TOI: 1.8166666666666669

Ivan Provorov Erik Gudbranson
Manual Calculation TOI: 1.8  - Linemate Report TOI: 1.8

Zach Werenski Erik Gudbranson
Manual Calculation TOI: 1.8  - Linemate Report TOI: 1.8

Mike Matheson Johnathan Kovacevic
Manual Calculation TOI: 1.5333333333333334  - Linemate Report 

5v5 TOI's match perfectly

Test Case 4 Passed ✅

#### Game 2023020337 Passed ✅

# Test Game 3 - 2023020028 - A Game With a Too Many Men Penalty
What we expect - Since this is another case that's a little odd and we will encounter, I figured including this in the testing book made sense. I expect the same thing as a goalie pull. there will be a small TOI discrepancy between the html shifts file and what the scraper says

### Test Case 1 - Compare Extracted Shift Data to html Shift Data
Shift for game 2023020028 can be found at https://www.nhl.com/scores/htmlreports/20232024/TV020028.HTM and https://www.nhl.com/scores/htmlreports/20232024/TH020028.HTM

In [203]:
# Read in my extracted shifts from this game. This is the df of extracted shifts from the bs4 function. It's what is used to build the by second df
extracted_shift_data = pd.read_csv("data/shift-data-for-testing-2023020028.csv")
extracted_shift_data

Unnamed: 0.1,Unnamed: 0,shift_number,period,shift_start_time,shift_end_time,duration,Event,player_number,last_name,first_name,shift_start_time_seconds,shift_end_time_seconds,duration_seconds,team,playerId,positionCode,firstName.default,lastName.default,position,full_name
0,0,1,1,0:00,0:34,00:34,,4,LEDDY,NICK,0,34,34,STL,8475181,D,Nick,Leddy,D,Nick Leddy
1,1,2,1,2:14,2:57,00:43,,4,LEDDY,NICK,134,177,43,STL,8475181,D,Nick,Leddy,D,Nick Leddy
2,2,3,1,3:27,4:17,00:50,,4,LEDDY,NICK,207,257,50,STL,8475181,D,Nick,Leddy,D,Nick Leddy
3,3,4,1,6:24,6:55,00:31,P,4,LEDDY,NICK,384,415,31,STL,8475181,D,Nick,Leddy,D,Nick Leddy
4,4,5,1,9:18,10:07,00:49,,4,LEDDY,NICK,558,607,49,STL,8475181,D,Nick,Leddy,D,Nick Leddy
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
763,393,16,3,13:06,13:38,00:32,,95,BURAKOVSKY,ANDRE,3186,3218,32,SEA,8477444,L,Andre,Burakovsky,F,Andre Burakovsky
764,394,17,3,14:36,15:35,00:59,,95,BURAKOVSKY,ANDRE,3276,3335,59,SEA,8477444,L,Andre,Burakovsky,F,Andre Burakovsky
765,395,18,3,18:03,18:39,00:36,,95,BURAKOVSKY,ANDRE,3483,3519,36,SEA,8477444,L,Andre,Burakovsky,F,Andre Burakovsky
766,396,19,4,0:55,2:00,01:05,,95,BURAKOVSKY,ANDRE,3655,3720,65,SEA,8477444,L,Andre,Burakovsky,F,Andre Burakovsky


#### Test Case 1A - Compare Shift Counts for Every Player in Game 2023020028

In [204]:
# This is the number of shifts for each player. Go throguh each html doc and compare
extracted_shift_data.groupby('full_name').count()[['shift_number']]

Unnamed: 0_level_0,shift_number
full_name,Unnamed: 1_level_1
Adam Larsson,30
Alex Wennberg,25
Alexey Toropchenko,19
Andre Burakovsky,20
Brandon Saad,23
Brayden Schenn,20
Brian Dumoulin,23
Colton Parayko,30
Eeli Tolvanen,16
Jaden Schwartz,22


#### Test Case 1B - Compare shifts match for a few players on the HTML reports

In [208]:
# Torey Krug
extracted_shift_data[extracted_shift_data['full_name']=="Torey Krug"][['shift_number','period','shift_start_time','shift_end_time','duration','Event','full_name']] # Passed

Unnamed: 0,shift_number,period,shift_start_time,shift_end_time,duration,Event,full_name
207,1,1,0:34,1:19,00:45,,Torey Krug
208,2,1,2:57,3:27,00:30,,Torey Krug
209,3,1,5:02,6:23,01:21,,Torey Krug
210,4,1,6:55,8:17,01:22,,Torey Krug
211,5,1,10:59,11:47,00:48,,Torey Krug
212,6,1,12:36,13:06,00:30,,Torey Krug
213,7,1,14:52,15:56,01:04,,Torey Krug
214,8,1,16:28,17:24,00:56,,Torey Krug
215,9,1,19:16,19:40,00:24,,Torey Krug
216,10,2,0:40,1:20,00:40,,Torey Krug


In [209]:
# Tye Kartye
extracted_shift_data[extracted_shift_data['full_name']=="Tye Kartye"][['shift_number','period','shift_start_time','shift_end_time','duration','Event','full_name']] # Passed

Unnamed: 0,shift_number,period,shift_start_time,shift_end_time,duration,Event,full_name
723,1,1,2:07,2:57,00:50,,Tye Kartye
724,2,1,5:05,6:10,01:05,,Tye Kartye
725,3,1,11:55,12:11,00:16,,Tye Kartye
726,4,1,13:58,14:55,00:57,,Tye Kartye
727,5,2,2:46,3:15,00:29,,Tye Kartye
728,6,2,6:18,7:17,00:59,,Tye Kartye
729,7,2,13:29,14:09,00:40,,Tye Kartye
730,8,3,2:37,3:37,01:00,,Tye Kartye
731,9,3,12:12,12:25,00:13,,Tye Kartye
732,10,3,14:11,14:30,00:19,,Tye Kartye


In [210]:
# Joey Daccord
extracted_shift_data[extracted_shift_data['full_name']=="Joey Daccord"][['shift_number','period','shift_start_time','shift_end_time','duration','Event','full_name']] # Passed

Unnamed: 0,shift_number,period,shift_start_time,shift_end_time,duration,Event,full_name
680,1,1,0:00,20:00,20:00,P,Joey Daccord
681,2,2,0:00,20:00,20:00,GP,Joey Daccord
682,3,3,0:00,20:00,20:00,P,Joey Daccord
683,4,4,0:00,5:00,05:00,,Joey Daccord


Shift counts and shift times/durations match for each tested case.

Test Case 1 Passed ✅

### Test Case 2 - Compare Each Players All Sit, EV, PP, SH TOI
Shifts for game 2023020028 can be found at https://www.nhl.com/scores/htmlreports/20232024/TV020028.HTM and https://www.nhl.com/scores/htmlreports/20232024/TH020028.HTM

In [212]:
# Read in linemate_data 
linemate_data = pd.read_csv("data/linemate-data-for-testing-2023020028.csv")
linemate_data

Unnamed: 0.1,Unnamed: 0,home_player_1_name,home_player_1_id,home_player_1_position,home_player_2_name,home_player_2_id,home_player_2_position,home_player_3_name,home_player_3_id,home_player_3_position,...,strength_cat,home_player_7_name,home_player_7_id,home_player_7_position,home_team,away_team,game_date,game_season,game_id,game_type
0,0,Nick Leddy,8475181,D,Colton Parayko,8476892,D,Robert Thomas,8480023,F,...,even,,,,STL,SEA,2023-10-14,20232024,2023020028,regular-season
1,1,Nick Leddy,8475181,D,Colton Parayko,8476892,D,Robert Thomas,8480023,F,...,even,,,,STL,SEA,2023-10-14,20232024,2023020028,regular-season
2,2,Nick Leddy,8475181,D,Colton Parayko,8476892,D,Robert Thomas,8480023,F,...,even,,,,STL,SEA,2023-10-14,20232024,2023020028,regular-season
3,3,Nick Leddy,8475181,D,Colton Parayko,8476892,D,Robert Thomas,8480023,F,...,even,,,,STL,SEA,2023-10-14,20232024,2023020028,regular-season
4,4,Nick Leddy,8475181,D,Colton Parayko,8476892,D,Robert Thomas,8480023,F,...,even,,,,STL,SEA,2023-10-14,20232024,2023020028,regular-season
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3895,3895,Torey Krug,8476792,D,Robert Thomas,8480023,F,Jordan Kyrou,8479385,F,...,even,,,,STL,SEA,2023-10-14,20232024,2023020028,regular-season
3896,3896,Torey Krug,8476792,D,Robert Thomas,8480023,F,Jordan Kyrou,8479385,F,...,even,,,,STL,SEA,2023-10-14,20232024,2023020028,regular-season
3897,3897,Torey Krug,8476792,D,Robert Thomas,8480023,F,Jordan Kyrou,8479385,F,...,even,,,,STL,SEA,2023-10-14,20232024,2023020028,regular-season
3898,3898,Torey Krug,8476792,D,Robert Thomas,8480023,F,Jordan Kyrou,8479385,F,...,even,,,,STL,SEA,2023-10-14,20232024,2023020028,regular-season


In [214]:
# Eeli Tolvanaen
all_toi = extract_toi("Eeli Tolvanen","away","all")
ev_toi = extract_toi("Eeli Tolvanen","away","even")
pp_toi = extract_toi("Eeli Tolvanen","away","away_advantage")
sh_toi = extract_toi("Eeli Tolvanen","away","home_advantage")
print("All:",all_toi,"EV:",ev_toi,"PP:",pp_toi,"SH:",sh_toi)

All: 13.216666666666667 EV: 12.55 PP: 0.6666666666666666 SH: 0.0


In [215]:
# Vince Dunn
all_toi = extract_toi("Vince Dunn","away","all")
ev_toi = extract_toi("Vince Dunn","away","even")
pp_toi = extract_toi("Vince Dunn","away","away_advantage")
sh_toi = extract_toi("Vince Dunn","away","home_advantage")
print("All:",all_toi,"EV:",ev_toi,"PP:",pp_toi,"SH:",sh_toi)

All: 24.166666666666668 EV: 23.25 PP: 0.9166666666666666 SH: 0.0


In [217]:
# Adam Larsson
all_toi = extract_toi("Adam Larsson","away","all")
ev_toi = extract_toi("Adam Larsson","away","even")
pp_toi = extract_toi("Adam Larsson","away","away_advantage")
sh_toi = extract_toi("Adam Larsson","away","home_advantage")
print("All:",all_toi,"EV:",ev_toi,"PP:",pp_toi,"SH:",sh_toi)
'''
This is the test case I wanted to really test for this game. 
Larsson has 4:46 of SH TOI on the html shifts report
but over 5 here, and thats because he was on the ice while
STL had too many men on the ice for 6 seconds. The differecne
is exactly what we expect

I will keep this in and it is not a defect, just like for when a goalie is pulled
'''

All: 29.866666666666667 EV: 24.833333333333332 PP: 0.0 SH: 5.033333333333333


'\nThis is the test case I wanted to really test for this game. \nLarsson has 4:46 of SH TOI on the html shifts report\nbut over 5 here, and thats because he was on the ice while\nSTL had too many men on the ice for 6 seconds. The differecne\nis exactly what we expect\n\nI will keep this in and it is not a defect, just like for when a goalie is pulled\n'

In [220]:
# Kasperi Kapanen
all_toi = extract_toi("Kasperi Kapanen","home","all")
ev_toi = extract_toi("Kasperi Kapanen","home","even")
pp_toi = extract_toi("Kasperi Kapanen","home","home_advantage")
sh_toi = extract_toi("Kasperi Kapanen","home","away_advantage")
print("All:",all_toi,"EV:",ev_toi,"PP:",pp_toi,"SH:",sh_toi)

All: 17.883333333333333 EV: 14.583333333333334 PP: 2.15 SH: 1.15


In [223]:
# Pavel Buchnevich
all_toi = extract_toi("Pavel Buchnevich","home","all")
ev_toi = extract_toi("Pavel Buchnevich","home","even")
pp_toi = extract_toi("Pavel Buchnevich","home","home_advantage")
sh_toi = extract_toi("Pavel Buchnevich","home","away_advantage")
print("All:",all_toi,"EV:",ev_toi,"PP:",pp_toi,"SH:",sh_toi) # off by 1 second

All: 5.45 EV: 3.5166666666666666 PP: 1.9333333333333333 SH: 0.0


Player TOI's and the time differences match when they are expected to

Test Case 2 Passed ✅

### Test Case 3 - Compare html PBP On-Ice for Events to linemate_data Data Frame
Should show the same players on ice at each even. PBP data can be found at https://www.nhl.com/scores/htmlreports/20232024/PL020028.HTM

In [226]:
# Hit at 1:03 of the 1st period
linemate_data[linemate_data['second']==63][['home_player_1_name','home_player_2_name','home_player_3_name','home_player_4_name','home_player_5_name','home_player_6_name',
                                             'away_player_1_name','away_player_2_name','away_player_3_name','away_player_4_name','away_player_5_name','away_player_6_name']] # Exact match

Unnamed: 0,home_player_1_name,home_player_2_name,home_player_3_name,home_player_4_name,home_player_5_name,home_player_6_name,away_player_1_name,away_player_2_name,away_player_3_name,away_player_4_name,away_player_5_name,away_player_6_name
62,Torey Krug,Justin Faulk,Brayden Schenn,Brandon Saad,Kasperi Kapanen,Jordan Binnington,Will Borgen,Jamie Oleksiak,Jaden Schwartz,Alex Wennberg,Andre Burakovsky,Joey Daccord


In [227]:
# Goal at :40 of the 2nd period
linemate_data[linemate_data['second']==1240][['home_player_1_name','home_player_2_name','home_player_3_name','home_player_4_name','home_player_5_name','home_player_6_name',
                                             'away_player_1_name','away_player_2_name','away_player_3_name','away_player_4_name','away_player_5_name','away_player_6_name']] # Exact match

Unnamed: 0,home_player_1_name,home_player_2_name,home_player_3_name,home_player_4_name,home_player_5_name,home_player_6_name,away_player_1_name,away_player_2_name,away_player_3_name,away_player_4_name,away_player_5_name,away_player_6_name
1239,Nick Leddy,Colton Parayko,Kasperi Kapanen,Oskar Sundqvist,Jordan Binnington,,Vince Dunn,Jaden Schwartz,Eeli Tolvanen,Oliver Bjorkstrand,Kailer Yamamoto,Joey Daccord


In [228]:
# Shot at :26 of the 3rd period
linemate_data[linemate_data['second']==2426][['home_player_1_name','home_player_2_name','home_player_3_name','home_player_4_name','home_player_5_name','home_player_6_name',
                                             'away_player_1_name','away_player_2_name','away_player_3_name','away_player_4_name','away_player_5_name','away_player_6_name']] # Exact match

Unnamed: 0,home_player_1_name,home_player_2_name,home_player_3_name,home_player_4_name,home_player_5_name,home_player_6_name,away_player_1_name,away_player_2_name,away_player_3_name,away_player_4_name,away_player_5_name,away_player_6_name
2425,Nick Leddy,Colton Parayko,Robert Thomas,Brandon Saad,Jordan Kyrou,Jordan Binnington,Adam Larsson,Vince Dunn,Jordan Eberle,Matty Beniers,Jared McCann,Joey Daccord


On-Ice players match exactly for test cases

Test Case 3 Passed ✅

### Test Case 4 - Ensure Forward Line and Defensive Pair 5v5 TOI is Correct
Manually loop through to make sure the function to calculate forward and defensemen line toi works

In [230]:
# Read in forward 5v5 report DF
forward_5v5 = pd.read_csv("data/5v5-forward-report-for-testing-2023020028.csv").head(10)
forward_5v5

Unnamed: 0.1,Unnamed: 0,forward_line_id,forward_1_name,forward_1_id,forward_2_name,forward_2_id,forward_3_name,forward_3_id,toi_secs,toi_mins,team,date,season,game_id,game_type
0,0,8474586-8477955-8482665,Jordan Eberle,8474586,Matty Beniers,8482665,Jared McCann,8477955.0,708,11.8,SEA,2023-10-14,20232024,2023020028,regular-season
1,1,8475768-8477444-8477505,Jaden Schwartz,8475768,Alex Wennberg,8477505,Andre Burakovsky,8477444.0,703,11.716667,SEA,2023-10-14,20232024,2023020028,regular-season
2,2,8476438-8479385-8480023,Robert Thomas,8480023,Brandon Saad,8476438,Jordan Kyrou,8479385.0,589,9.816667,STL,2023-10-14,20232024,2023020028,regular-season
3,3,8476826-8477416-8480009,Eeli Tolvanen,8480009,Oliver Bjorkstrand,8477416,Yanni Gourde,8476826.0,584,9.733333,SEA,2023-10-14,20232024,2023020028,regular-season
4,4,8476897-8480281-8482089,Alexey Toropchenko,8480281,Jake Neighbours,8482089,Oskar Sundqvist,8476897.0,481,8.016667,STL,2023-10-14,20232024,2023020028,regular-season
5,5,8475763-8477944-8478104,Kevin Hayes,8475763,Jakub Vrana,8477944,Sammy Blais,8478104.0,445,7.416667,STL,2023-10-14,20232024,2023020028,regular-season
6,6,8477930-8479977-8481789,Pierre-Edouard Bellemare,8477930,Tye Kartye,8481789,Kailer Yamamoto,8479977.0,386,6.433333,SEA,2023-10-14,20232024,2023020028,regular-season
7,7,8475170-8477953-8480281,Brayden Schenn,8475170,Alexey Toropchenko,8480281,Kasperi Kapanen,8477953.0,245,4.083333,STL,2023-10-14,20232024,2023020028,regular-season
8,8,8477402-8479385-8480023,Robert Thomas,8480023,Jordan Kyrou,8479385,Pavel Buchnevich,8477402.0,198,3.3,STL,2023-10-14,20232024,2023020028,regular-season
9,9,8475170-8476438-8477953,Brayden Schenn,8475170,Brandon Saad,8476438,Kasperi Kapanen,8477953.0,184,3.066667,STL,2023-10-14,20232024,2023020028,regular-season


In [231]:
# For the top 5 most common lines in the game, lookup how maany 5v5 mins they played together through the linemate_data df. If they  match exactly, our function to build forward line toi reports works perfectly
for line in forward_5v5['forward_line_id'].tolist():
    p1 = forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['forward_1_name']
    p2 = forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['forward_2_name']
    p3 = forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['forward_3_name']
    print(p1,p2,p3)
    if forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['team']=="STL":
        print("Manual Calculation TOI:",calculate_forward_line_toi(p1,p2,p3,'home')," - Linemate Report TOI:",forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['toi_mins'])
    else:
        print("Manual Calculation TOI:",calculate_forward_line_toi(p1,p2,p3,'away')," - Linemate Report TOI:",forward_5v5[forward_5v5['forward_line_id']==line].iloc[0]['toi_mins'])
    print()
# Looks good. Lets do defenders now

Jordan Eberle Matty Beniers Jared McCann
Manual Calculation TOI: 11.8  - Linemate Report TOI: 11.8

Jaden Schwartz Alex Wennberg Andre Burakovsky
Manual Calculation TOI: 11.716666666666667  - Linemate Report TOI: 11.716666666666669

Robert Thomas Brandon Saad Jordan Kyrou
Manual Calculation TOI: 9.816666666666666  - Linemate Report TOI: 9.816666666666666

Eeli Tolvanen Oliver Bjorkstrand Yanni Gourde
Manual Calculation TOI: 9.733333333333333  - Linemate Report TOI: 9.733333333333333

Alexey Toropchenko Jake Neighbours Oskar Sundqvist
Manual Calculation TOI: 8.016666666666667  - Linemate Report TOI: 8.016666666666667

Kevin Hayes Jakub Vrana Sammy Blais
Manual Calculation TOI: 7.416666666666667  - Linemate Report TOI: 7.416666666666667

Pierre-Edouard Bellemare Tye Kartye Kailer Yamamoto
Manual Calculation TOI: 6.433333333333334  - Linemate Report TOI: 6.433333333333334

Brayden Schenn Alexey Toropchenko Kasperi Kapanen
Manual Calculation TOI: 4.083333333333333  - Linemate Report TOI: 4

In [234]:
# Read in defender report
defender_5v5 = pd.read_csv("data/5v5-defender-report-for-testing-2023020028.csv").head(10)
defender_5v5

Unnamed: 0.1,Unnamed: 0,defensemen_pair_id,defensemen_1_name,defensemen_1_id,defensemen_2_name,defensemen_2_id,toi_secs,toi_mins,team,date,season,game_id,game_type
0,0,8476457-8478407,Adam Larsson,8476457,Vince Dunn,8478407,1129,18.816667,SEA,2023-10-14,20232024,2023020028,regular-season
1,1,8475181-8476892,Nick Leddy,8475181,Colton Parayko,8476892,1127,18.783333,STL,2023-10-14,20232024,2023020028,regular-season
2,2,8475753-8476792,Torey Krug,8476792,Justin Faulk,8475753,946,15.766667,STL,2023-10-14,20232024,2023020028,regular-season
3,3,8476467-8478840,Will Borgen,8478840,Jamie Oleksiak,8476467,803,13.383333,SEA,2023-10-14,20232024,2023020028,regular-season
4,4,8474602-8475208,Justin Schultz,8474602,Brian Dumoulin,8475208,690,11.5,SEA,2023-10-14,20232024,2023020028,regular-season
5,5,8474618-8481006,Marco Scandella,8474618,Tyler Tucker,8481006,505,8.416667,STL,2023-10-14,20232024,2023020028,regular-season
6,6,8476457-8478840,Will Borgen,8478840,Adam Larsson,8476457,122,2.033333,SEA,2023-10-14,20232024,2023020028,regular-season
7,7,8475753-8481006,Justin Faulk,8475753,Tyler Tucker,8481006,104,1.733333,STL,2023-10-14,20232024,2023020028,regular-season
8,8,8476467-8478407,Jamie Oleksiak,8476467,Vince Dunn,8478407,91,1.516667,SEA,2023-10-14,20232024,2023020028,regular-season
9,9,8476457-8476467,Adam Larsson,8476457,Jamie Oleksiak,8476467,77,1.283333,SEA,2023-10-14,20232024,2023020028,regular-season


In [235]:
# For the top 5 most common lines in the game, lookup how maany 5v5 mins they played together through the linemate_data df. If they  match exactly, our function to build forward line toi reports works perfectly
for pair in defender_5v5['defensemen_pair_id'].tolist():
    p1 = defender_5v5[defender_5v5['defensemen_pair_id']==pair].iloc[0]['defensemen_1_name']
    p2 = defender_5v5[defender_5v5['defensemen_pair_id']==pair].iloc[0]['defensemen_2_name']
    print(p1,p2)
    if defender_5v5[defender_5v5['defensemen_pair_id']==pair].iloc[0]['team']=="STL":
        print("Manual Calculation TOI:",calculate_defender_pair_toi(p1,p2,'home')," - Linemate Report TOI:",defender_5v5[defender_5v5['defensemen_pair_id']==pair].iloc[0]['toi_mins'])
    else:
        print("Manual Calculation TOI:",calculate_defender_pair_toi(p1,p2,'away')," - Linemate Report TOI:",defender_5v5[defender_5v5['defensemen_pair_id']==pair].iloc[0]['toi_mins'])
    print()

Adam Larsson Vince Dunn
Manual Calculation TOI: 18.816666666666666  - Linemate Report TOI: 18.816666666666663

Nick Leddy Colton Parayko
Manual Calculation TOI: 18.783333333333335  - Linemate Report TOI: 18.78333333333333

Torey Krug Justin Faulk
Manual Calculation TOI: 15.766666666666667  - Linemate Report TOI: 15.766666666666667

Will Borgen Jamie Oleksiak
Manual Calculation TOI: 13.383333333333333  - Linemate Report TOI: 13.383333333333333

Justin Schultz Brian Dumoulin
Manual Calculation TOI: 11.5  - Linemate Report TOI: 11.5

Marco Scandella Tyler Tucker
Manual Calculation TOI: 8.416666666666666  - Linemate Report TOI: 8.416666666666666

Will Borgen Adam Larsson
Manual Calculation TOI: 2.033333333333333  - Linemate Report TOI: 2.033333333333333

Justin Faulk Tyler Tucker
Manual Calculation TOI: 1.7333333333333334  - Linemate Report TOI: 1.7333333333333334

Jamie Oleksiak Vince Dunn
Manual Calculation TOI: 1.5166666666666666  - Linemate Report TOI: 1.5166666666666666

Adam Larsson 

5v5 TOI's match perfectly

Test Case 4 Passed ✅

#### Game 2023020017 Passed ✅

# ALL GAMES PASSED ✅
Each game we tested presented a new scenario that the scraper will encounter if used throughout the whole season. I know it's only 3 games on this notebook, but I think this gives a good enough idea that this scraper works. With that being said, I encourage anyone who actually took this time to read this / anyone who uses the scraper to let me know if theres any other way you want me to test this, or if you find something wrong / concerning. I really would love feedback and any comments you may have