# ⚽ Capology Team Salaries Web Scraping

**Leagues:** Premier League & Championship (2013–2026)  
**Purpose:** Scrape raw team payroll data from [Capology](https://www.capology.com/) using [Selenium](https://www.selenium.dev/)  
**Author:** [Victoria Friss de Kereki](https://www.linkedin.com/in/victoria-friss-de-kereki/)  

---

**Notebook first written:** `11/01/2026`  
**Last updated:** `11/01/2026`  

> This notebook collects and processes team payrolls for English football leagues. It was originally developed for my upcoming analytics project, but the code is available for anyone who wants to gather similar data. Simply change the league URL to adapt for other competitions or countries.


<div style="text-align: left;">
    <img src="Images and others for Medium/Capology web scraping.png" alt="Capology illustration" width="600">
</di>
>



## 1) Import packages

In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
import time

## 2) Define `fetch_payrolls_for_season`. This function handles the scraping of salary tables from Capology for a specified season, returning the data as a DataFrame.

In [45]:
def fetch_payrolls_for_season(url):
    options = Options()
    options.add_argument("--headless=new")
    options.add_argument("--disable-gpu")

    service = Service(ChromeDriverManager().install())

    driver = webdriver.Chrome(
        service=service,
        options=options
    )

    driver.get(url)

    # Wait until data rows exist (JS-rendered)
    WebDriverWait(driver, 20).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, "#table tbody tr"))
    )

    table = driver.find_element(By.ID, "table")

    # HEADERS (2nd header row)
    header_rows = table.find_elements(By.CSS_SELECTOR, "thead tr")
    columns = [
        th.text.strip()
        for th in header_rows[1].find_elements(By.TAG_NAME, "th")
    ]

    # ROWS
    rows = []
    for tr in table.find_elements(By.CSS_SELECTOR, "tbody tr"):
        rows.append([
            td.text.strip()
            for td in tr.find_elements(By.TAG_NAME, "td")
        ])

    driver.quit()

    return pd.DataFrame(rows, columns=columns)


## 3) Define `scrape_seasons()`. This  function iterates over a list of seasons and scrapes payroll data for each one, storing the results in separate DataFrames.


In [150]:
def scrape_seasons(seasons, league, sleep_sec=60):
    """
    Scrape Capology payroll data for a list of seasons for a given league.

    Parameters:
        seasons (list of str): Seasons to scrape (e.g., ['2013-2014', ..., 'current'])
        league (str): 'premier-league' or 'championship'
        sleep_sec (int): Seconds to wait between requests to avoid blocking

    Returns:
        dict: Dictionary with actual season as keys and DataFrames as values
    """
    all_dfs = {}

    for season in seasons:
        url = f"https://www.capology.com/uk/{league}/payrolls/" if season == "current" else f"https://www.capology.com/uk/{league}/payrolls/{season}/"
        print(f"Fetching {season} → {url}")

        df = fetch_payrolls_for_season(url)

        if df.empty:
            print(f"⚠️ No data for {season}, skipping")
            continue

        # Determine actual season
        actual_season = "2025-2026" if season == "current" else season
        df["Season"] = actual_season

        # Store in dictionary
        all_dfs[actual_season] = df

        # Create a valid Python variable name for convenience
        start_year = actual_season.split("-")[0]
        globals()[f"salaries_{start_year}"] = df

        print(f"✅ salaries_{start_year} created for season {actual_season}")

        time.sleep(sleep_sec)

    return all_dfs

## 4) Download Premier League Salaries

##### Choose `premier-league` for the league parameter.
##### If the scraping process is interrupted or blocked, set start_year to the first season that wasn’t downloaded and restart the scraping loop.

In [None]:
start_year = 2013 # 2013 is first one available
end_year = 2025 
seasons = [f"{y}-{y+1}" for y in range(start_year, end_year)]
seasons.append("current")

# Scrape Premier League
all_premier = scrape_seasons(seasons, league="premier-league", sleep_sec=120) # league="championship" or "premier-league"

In [None]:
payrolls_all_premierleague = pd.concat(
    [globals()[f"salaries_{y}"] for y in range(2013, 2025)],
    ignore_index=True
)

In [137]:
payrolls_all_premierleague.shape

(260, 10)

In [138]:
payrolls_all_premierleague.tail()

Unnamed: 0,CLUB,Unnamed: 2,GROSS P/W\n(GBP),GROSS P/Y\n(GBP),ADJ. GROSS\n(GBP),KEEPER\n(GBP),DEFENSE\n(GBP),MIDFIELD\n(GBP),FORWARD\n(GBP),Season
255,Bournemouth,BOU,"£ 1,207,000","£ 62,764,000","£ 62,764,000","£ 4,160,000","£ 17,160,000","£ 17,264,000","£ 24,180,000",2025-2026
256,Brighton,BHA,"£ 1,166,250","£ 60,645,000","£ 60,645,000","£ 3,640,000","£ 24,102,000","£ 9,906,000","£ 22,997,000",2025-2026
257,Leeds,LEE,"£ 1,140,500","£ 59,306,000","£ 59,306,000","£ 6,240,000","£ 18,200,000","£ 8,580,000","£ 26,286,000",2025-2026
258,Burnley,RNL,"£ 1,061,346","£ 55,190,000","£ 55,190,000","£ 4,004,000","£ 22,490,000","£ 9,100,000","£ 19,596,000",2025-2026
259,Brentford,BTD,"£ 1,044,250","£ 54,301,000","£ 54,301,000","£ 4,680,000","£ 14,456,000","£ 12,506,000","£ 22,659,000",2025-2026


## 5) Download Championship Salaries

##### Choose `championship` for the league parameter.
##### If the scraping process is interrupted or blocked, set start_year to the first season that wasn’t downloaded and restart the scraping loop.

In [None]:
start_year = 2013 # 2013 is first one available
end_year = 2025 
seasons = [f"{y}-{y+1}" for y in range(start_year, end_year)]
seasons.append("current")

# Scrape Championship
all_premier = scrape_seasons(seasons, league="championship", sleep_sec=120) # league="championship" or "premier-league"

In [135]:
payrolls_all_championship = pd.concat(
    [globals()[f"salaries_{y}"] for y in range(2013, 2026)],
    ignore_index=True
)

payrolls_all_championship.shape

(312, 10)

In [136]:
payrolls_all_championship.tail()

Unnamed: 0,CLUB,Unnamed: 2,GROSS P/W\n(GBP),GROSS P/Y\n(GBP),ADJ. GROSS\n(GBP),KEEPER\n(GBP),DEFENSE\n(GBP),MIDFIELD\n(GBP),FORWARD\n(GBP),Season
307,Coventry,COV,"£ 249,000","£ 12,948,000","£ 12,948,000","£ 910,000","£ 4,758,000","£ 3,276,000","£ 4,004,000",2025-2026
308,Swansea,SWA,"£ 242,500","£ 12,610,000","£ 12,610,000","£ 1,560,000","£ 3,718,000","£ 3,432,000","£ 3,900,000",2025-2026
309,Sheffield Wednesday,SHW,"£ 221,450","£ 11,515,400","£ 11,515,400","£ 1,024,400","£ 4,680,000","£ 3,029,000","£ 2,782,000",2025-2026
310,Portsmouth,PTM,"£ 184,000","£ 9,568,000","£ 9,568,000","£ 1,378,000","£ 2,392,000","£ 1,196,000","£ 4,602,000",2025-2026
311,Charlton,CHA,"£ 175,062","£ 9,103,200","£ 9,103,200","£ 1,350,000","£ 3,432,000","£ 2,085,200","£ 2,236,000",2025-2026


## 6) Combine all Premier League and Championship data, edit headers, and download csv.

In [139]:
# Add a League column to each dataframe
payrolls_all_championship["League"] = "Championship"
payrolls_all_premierleague["League"] = "Premier League"

# Concatenate them into a single dataframe
payrolls_all_england = pd.concat(
    [payrolls_all_championship, payrolls_all_premierleague],
    ignore_index=True
)

print(payrolls_all_england.shape)
payrolls_all_england.head()

(572, 11)


Unnamed: 0,CLUB,Unnamed: 2,GROSS P/W\n(GBP),GROSS P/Y\n(GBP),ADJ. GROSS\n(GBP),KEEPER\n(GBP),DEFENSE\n(GBP),MIDFIELD\n(GBP),FORWARD\n(GBP),Season,League
0,QPR,QPR,"£ 785,981","£ 40,871,000","£ 52,771,625","£ 312,000","£ 11,528,000","£ 10,792,000","£ 18,239,000",2013-2014,Championship
1,Nottingham Forest,NTG,"£ 385,654","£ 20,054,000","£ 25,893,228","£ 811,000","£ 6,681,000","£ 6,755,000","£ 5,807,000",2013-2014,Championship
2,Reading,RDG,"£ 353,077","£ 18,360,000","£ 23,705,974","£ 1,079,000","£ 6,783,400","£ 4,171,800","£ 6,325,800",2013-2014,Championship
3,Wigan,WIG,"£ 338,404","£ 17,597,000","£ 22,720,810","£ 2,387,000","£ 6,219,000","£ 5,039,000","£ 3,952,000",2013-2014,Championship
4,Blackburn,BBR,"£ 335,146","£ 17,427,600","£ 22,502,084","£ 806,000","£ 3,296,800","£ 6,128,000","£ 7,196,800",2013-2014,Championship


In [147]:
# Fix column names
payrolls_all_england.columns = [
    "Club", 
    "Abbreviation",
    "Gross_PW_GBP",
    "Gross_PY_GBP",
    "Adj_Gross_GBP",
    "Keeper_GBP",
    "Defense_GBP",
    "Midfield_GBP",
    "Forward_GBP",
    "Season",
    "League"
]

# Columns to convert
salary_cols = [
    "Gross_PW_GBP",
    "Gross_PY_GBP",
    "Adj_Gross_GBP",
    "Keeper_GBP",
    "Defense_GBP",
    "Midfield_GBP",
    "Forward_GBP"
]

# Remove £ and commas, convert to numeric
for col in salary_cols:
    payrolls_all_england[col] = payrolls_all_england[col].replace('[£,]', '', regex=True).astype(float)

# Check
payrolls_all_england.tail()

Unnamed: 0,Club,Abbreviation,Gross_PW_GBP,Gross_PY_GBP,Adj_Gross_GBP,Keeper_GBP,Defense_GBP,Midfield_GBP,Forward_GBP,Season,League
567,Bournemouth,BOU,1207000.0,62764000.0,62764000.0,4160000.0,17160000.0,17264000.0,24180000.0,2025-2026,Premier League
568,Brighton,BHA,1166250.0,60645000.0,60645000.0,3640000.0,24102000.0,9906000.0,22997000.0,2025-2026,Premier League
569,Leeds,LEE,1140500.0,59306000.0,59306000.0,6240000.0,18200000.0,8580000.0,26286000.0,2025-2026,Premier League
570,Burnley,RNL,1061346.0,55190000.0,55190000.0,4004000.0,22490000.0,9100000.0,19596000.0,2025-2026,Premier League
571,Brentford,BTD,1044250.0,54301000.0,54301000.0,4680000.0,14456000.0,12506000.0,22659000.0,2025-2026,Premier League


In [148]:
# Save payrolls_all_england to CSV in the current working directory
payrolls_all_england.to_csv("payrolls_all_england.csv", index=False)