# Developing a Women's Pro Hockey Expected Goals (xG) Model

#### Brian Johns, February 2026

## Notebook #5: Final Summary

### Overview

This project built an Expected Goals (xG) model for Women's professional hockey using data scraped from the PWHL HockeyTech API and supplemented with historical PWHPA data. An xG model assigns each shot a probability of scoring based on contextual factors like location, shot type, and game situation — providing a more stable measure of performance than raw goals.

The Women's game presents a unique analytical challenge: the PWHL is only in its third season, public analytics tooling is virtually non-existent, and the dataset (~16,700 shots) is a fraction of what NHL xG models typically train on.

### Process
#### Notebook 1 — Data Acquisition & Cleaning:
PWHL shot data was scraped across all available seasons (271 games). Coordinates were rescaled from the API's 600x300 grid to real-world rink dimensions. PWHPA data was merged after confirming a nearly identical goal rate (8.0% vs 8.2%). Penalty shots, empty net shots, overtime, and shots outside the attacking zone were excluded.

#### Notebook 2 — Feature Engineering & EDA:
Features engineered include `shot_dist`, `angle_deg`, `arc_length`, `slot` shots, `rebound` shots, score state, strength state, shot type encoding, and shooter/goalie quality using Bayesian priors. EDA confirmed that shot location was the strongest predictor, with rebounds and shooter quality also showing meaningful correlations.

#### Notebook 3 — Modelling Part 1:
Logistic Regression and XGBoost were explored across the full dataset. Scaling was required.  Class balancing techniques (SMOTE, class_weight) hurt Log Loss without improving AUC. The best models reached AUC 0.775, Log Loss 0.244. Coefficient analysis revealed multicollinearity between location features and that mixing strength states was distorting the model.

#### Notebook 4 — Modelling Part 2 (Even Strength Only):
Remodelled using only Even Strength shots with properly labeled shot types, and with known redundant features removed. This produced a dataset of approximately 10,000 shots.  The final XGBoost model achieved AUC 0.742, Log Loss 0.270 — slightly lower AUC but on cleaner data with more interpretable and trustworthy probabilities. xG values were applied to all Even Strength shots for player and team analysis.

#### Final Model
The deployed model used XGBoost that was unweighted and tuned.  It had an **AUC of 0.7421** and a **Log Loss of 0.2701**.

### Key Challenges

- Shot type data quality: A significant portion (approximately 30%) of PWHL shots were labeled Default rather than a meaningful shot type, significantly limiting this feature's predictive value.
- Dataset size: ~10,000 Even Strength shots after filtering is small relative to the 100,000+ used in NHL models.
- Multicollinearity: x_ft, dist_ft, and arclength are all correlated location measures.
- Mixing strength states created noise that could not be filtered well in a single model.
- Shooter/goalie quality leakage: Career quality features use a player's final value applied retroactively — a known limitation to address in future iterations

### Future Improvements

- As the PWHL matures, more data will be the single biggest improvement
- Separate strength state models (Even Strength, Power Play, Shorthanded) are needed.  More data is needed for stronger Even Strength models, so even more games will need to be played for reliable Power Play and Shorthanded models.
- Fix the career quality leakage by using only historically available data at the time of each shot.
- Calibration correction via CalibratedClassifierCV to ensure predicted probabilities match true goal rates
- Better shot type recording from the league would unlock one of the strongest known xG predictors
- The addition of data that is not currently publicly accessible could greatly improve this model:
    - Tracking Data could help this model tremdously.  From accurate on-ice numbers, goalie position and locating players between the shot and the net, player tracking would help strengthen this model significantly.
    - Getting data that records all shots taken (not just shots reaching the net).  These help make NHL xG models much more accurate and would help strengthen the small dataset here.

### Preliminary Findings

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns

pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 100)

In [18]:
df = pd.read_csv('xg_model.csv')

In [35]:
xg_summary = df.groupby('player_name').agg(
    goals  = ('is_goal', 'sum'),
    xG     = ('xG',      'sum'),
    games  = ('game_id', 'nunique')
).sort_values('xG', ascending=False).round({'xG': 2})

In [36]:
xg_summary['xg/g'] = xg_summary['xG']/xg_summary['games']
xg_summary['goals/g'] = xg_summary['goals']/xg_summary['games']
xg_summary['xg_diff'] = xg_summary['xg/g'] - xg_summary['goals/g']

In [39]:
xg_summary[xg_summary['games'] > 10].sort_values('xg_diff', ascending = False).head(15)

Unnamed: 0_level_0,goals,xG,games,xg/g,goals/g,xg_diff
player_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Natalie Spooner,10,15.97,38,0.420263,0.263158,0.157105
Brittyn Fleming,0,1.48,12,0.123333,0.0,0.123333
Emma Woods,1,3.1,21,0.147619,0.047619,0.1
Claire Butorac,1,2.61,18,0.145,0.055556,0.089444
Izzy Daniel,2,3.55,18,0.197222,0.111111,0.086111
Loren Gabel,1,2.7,20,0.135,0.05,0.085
Sarah Potomak,2,2.89,12,0.240833,0.166667,0.074167
Dara Greig,0,1.36,19,0.071579,0.0,0.071579
Mikyla Grant-Mentis,5,7.58,38,0.199474,0.131579,0.067895
Alexandra Poznikoff,3,3.78,12,0.315,0.25,0.065


It appears that Natalie Spooner is under-producing relative to the Expected Goals that she is producing.

In [42]:
g_summary = df.groupby('goalie_name').agg(
    goals  = ('is_goal', 'sum'),
    xG     = ('xG',      'sum'),
    games  = ('game_id', 'nunique')
).sort_values('xG', ascending=False).round({'xG': 2}).head(25)

In [45]:
g_summary['xg/g'] = g_summary['xG']/g_summary['games']
g_summary['goals/g'] = g_summary['goals']/g_summary['games']
g_summary['xg_diff'] = g_summary['xg/g'] - g_summary['goals/g']

In [50]:
g_summary[g_summary['games'] > 10].sort_values('xg_diff', ascending = False).head(15)

Unnamed: 0_level_0,goals,xG,games,xg/g,goals/g,xg_diff
goalie_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Maddie Rooney,85,92.69,48,1.931042,1.770833,0.160208
Ann-Renée Desbiens,114,119.23,66,1.806515,1.727273,0.079242
Elaine Chuli,39,39.85,23,1.732609,1.695652,0.036957
Kayle Osborne,45,45.3,25,1.812,1.8,0.012
Gwyneth Philips,43,42.92,32,1.34125,1.34375,-0.0025
Emerance Maschmeyer,81,80.48,47,1.71234,1.723404,-0.011064
Aerin Frankel,77,74.92,51,1.46902,1.509804,-0.040784
Nicole Hensley,77,74.23,45,1.649556,1.711111,-0.061556
Corinne Schroeder,61,57.11,38,1.502895,1.605263,-0.102368
Raygan Kirk,30,26.87,21,1.279524,1.428571,-0.149048


Maddie Rooney and Ann-Renee Desbiens both are outperforming the Expected Goals scored against them, suggesting that they are two of the best in the league.

In [54]:
t_summary = df.groupby('player_team').agg(
    goals  = ('is_goal', 'sum'),
    xG     = ('xG',      'sum'),
    games  = ('game_id', 'nunique')
).sort_values('xG', ascending=False).round({'xG': 2}).head(25)

In [55]:
t_summary['xg/g'] = t_summary['xG']/t_summary['games']
t_summary['goals/g'] = t_summary['goals']/t_summary['games']
t_summary['xg_diff'] = t_summary['xg/g'] - t_summary['goals/g']

In [56]:
t_summary.sort_values('xg_diff', ascending = False)

Unnamed: 0_level_0,goals,xG,games,xg/g,goals/g,xg_diff
player_team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
VAN,23,31.03,16,1.939375,1.4375,0.501875
SEA,24,28.83,16,1.801875,1.5,0.301875
Sonnet,36,40.68,20,2.034,1.8,0.234
Scotiabank,50,52.24,20,2.612,2.5,0.112
TOR,79,78.22,56,1.396786,1.410714,-0.013929
MTL,130,127.8,73,1.750685,1.780822,-0.030137
OTT,94,91.83,65,1.412769,1.446154,-0.033385
BOS,88,85.23,58,1.469483,1.517241,-0.047759
Harvey's,64,63.02,20,3.151,3.2,-0.049
MIN,148,142.97,73,1.958493,2.027397,-0.068904


Vancouver and Seattle have produced significantly less than expected.  They are both halfway through their inaugural season, so they have a smaller sample size prone to fluctuation.  However, this could be a sign that if they continue to accrue Expected Goals that the production will come.