# ⚽ Wage-Performance Relationship in English Football (2013–2025)
**Leagues:** Premier League & Championship (2013–2025)  
**Purpose:** Analyse the relationship between team wage bills and on-pitch performance using log-linear and non-linear regression models  
**Author:** [Victoria Friss de Kereki](https://www.linkedin.com/in/victoria-friss-de-kereki/)  


**Notebook first written:** `12/01/2026`  
**Last updated:** `15/01/2026`  

> This notebook studies how team wage expenditure relates to league performance in English football. Using log-linear and log–quadratic regression models, it quantifies diminishing returns to spending while avoiding mechanical league effects introduced by constructed performance measures. The analysis focuses on interpretability and robustness rather than maximising goodness-of-fit.

In [1]:
import numpy as np
import pandas as pd

In [6]:
# Step 1: Load your CSV
df = pd.read_csv("wage-performance data.csv")
df.tail()

Unnamed: 0,position,team,gp,pts,season,league,league_level,pts_per_game,Gross_PY_GBP,Adj_Gross_GBP,pyramid_position,pts_absolute,Adj_Gross_GBP_z,pts_absolute_z,pyramid_position_z
523,20,Preston North End,46,50,2024-2025,Championship,2,1.086957,14354800.0,14354800.0,40,50,-0.732854,-1.132417,1.378124
524,21,Hull City,46,49,2024-2025,Championship,2,1.065217,23694400.0,23694400.0,41,49,-0.544819,-1.155784,1.456874
525,22,Luton Town,46,49,2024-2025,Championship,2,1.065217,27716000.0,27716000.0,42,49,-0.463852,-1.155784,1.535624
526,23,Plymouth Argyle,46,46,2024-2025,Championship,2,1.0,12313600.0,12313600.0,43,46,-0.77395,-1.225885,1.614373
527,24,Cardiff City,46,44,2024-2025,Championship,2,0.956522,23506000.0,23506000.0,44,44,-0.548613,-1.272619,1.693123


In [7]:
# Step 2: Define the prediction function
INTERCEPT = 402.8828232111315
B1 = -75.52443902274617      # log(wage)
B2 = 3.337091313790353       # log(wage)^2

def predict_points(wage):
    lw = np.log(wage)
    return INTERCEPT + B1 * lw + B2 * (lw ** 2)

In [10]:
# Step 3: Compute expected points and residuals
df['expected_points'] = df['Adj_Gross_GBP'].apply(predict_points)
df['residual'] = df['pts_absolute'] - df['expected_points']  # actual - expected

In [17]:
# Step 4: Quick check
df[['team', 'season', 'pts_absolute', 'Adj_Gross_GBP', 'expected_points', 'residual']]

Unnamed: 0,team,season,pts_absolute,Adj_Gross_GBP,expected_points,residual
0,Manchester City,2013-2014,175,147132001.0,162.826774,12.173226
1,Liverpool,2013-2014,173,84652806.0,136.210144,36.789856
2,Chelsea,2013-2014,171,136750953.0,159.186507,11.813493
3,Arsenal,2013-2014,168,95121913.0,141.654910,26.345090
4,Everton,2013-2014,161,68247394.0,126.389692,34.610308
...,...,...,...,...,...,...
523,Preston North End,2024-2025,50,14354800.0,64.548073,-14.548073
524,Hull City,2024-2025,49,23694400.0,82.657679,-33.657679
525,Luton Town,2024-2025,49,27716000.0,88.666926,-39.666926
526,Plymouth Argyle,2024-2025,46,12313600.0,59.340583,-13.340583
