## üìà Predicting Final League Positions Using Betting Odds, Probabilistic Modelling & Simulation

**Competition:** Premier League, Serie A, La Liga, Bundesliga, Ligue 1 (Season 2025/26)  
**Purpose:** Estimate probabilities of final league positions using betting market information and simulation  
**Methods:** Odds-implied probabilities, Monte Carlo simulation, scenario analysis  
**Author:** [Victoria Friss de Kereki](https://www.linkedin.com/in/victoria-friss-de-kereki/)  
**Medium Articles:**  TBC  

---

**Notebook first written:** `07/02/2026`  
**Last updated:** `09/02/2026`  

> This notebook develops a probabilistic framework to predict final league positions across Europe‚Äôs top five leagues (England, Italy, Spain, Germany, France) using betting odds as market-based expectations.
>
> Betting odds are transformed into implied probabilities and adjusted for bookmaker margins. These probabilities are then used to simulate the remainder of each season via Monte Carlo methods, generating distributions over final points totals and league positions.
>
> The analysis estimates the likelihood of key outcomes such as league titles, European qualification, relegation, and mid-table finishes. Results are presented at the team level with uncertainty intervals. The framework can be extended to incorporate additional inputs such as recent form, fixture difficulty, or alternative predictive models beyond betting markets.


<div style="text-align: left;">
    <img src="Images and others/Predicting Premier League Final Positions Using Betting Odds, Probabilistic Modelling & Simulation.png" alt="Predicting Premier League Final Positions Using Betting Odds, Probabilistic Modelling & Simulation" width="600">
</di>
>

In [1]:
# Core
from datetime import datetime, timedelta
import os

# Data manipulation
import numpy as np
import pandas as pd

# APIs & environment
import requests
from dotenv import load_dotenv
import time  # for delaying requests
from datetime import date

# Statistics
from scipy.stats import poisson

# Visualisation
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import dataframe_image as dfi # To download the final images

# Nicer printing of tables, no wrapping
pd.set_option("display.width", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.expand_frame_repr", False)

## 1. Final Standings (ESPN Scraping) for the 5 leagues

In [2]:
year = 2025  # current season start year

leagues = {
    "ENG.1": "premierleague_england",
    "ITA.1": "seriea_italy",
    "ESP.1": "laliga_spain",
    "GER.1": "bundesliga_germany",
    "FRA.1": "ligue1_france",
}

for league_code, df_name in leagues.items():
    url = f"https://www.espn.com/soccer/standings/_/league/{league_code}/season/{year}"
    tables = pd.read_html(url)

    teams_raw = tables[0]
    stats = tables[1]

    teams = pd.DataFrame()
    teams["position"] = teams_raw.iloc[:, 0].str.extract(r"^(\d+)").astype(int)
    teams["team"] = (
        teams_raw.iloc[:, 0]
        .str.replace(r"^\d+", "", regex=True)
        .str.replace(r"^[A-Z]{2,3}", "", regex=True)
        .str.strip()
    )

    stats.columns = ["gp", "w", "d", "l", "gf", "ga", "gd", "pts"]
    stats = stats.apply(
        lambda c: c.astype(str)
                  .str.replace("+", "", regex=False)
                  .astype(int)
    )

    globals()[df_name] = pd.concat([teams, stats], axis=1)

In [3]:
TEAM_NAME_MAPPING = {
    # Italy
    "AAS Roma": "AS Roma",
    "OComo": "Como",

    # Germany
    "B04Bayer Leverkusen": "Bayer Leverkusen",
    "M05Mainz": "Mainz",

    # France
    "NLyon": "Lyon",
    "LLille": "Lille",
    "ENice": "Nice",
    "ZMetz": "Metz",
}

def clean_team_names(df, column="team"):
    df = df.copy()
    df[column] = df[column].replace(TEAM_NAME_MAPPING)
    return df

premierleague_england = clean_team_names(premierleague_england)
seriea_italy = clean_team_names(seriea_italy)
laliga_spain = clean_team_names(laliga_spain)
bundesliga_germany = clean_team_names(bundesliga_germany)
ligue1_france = clean_team_names(ligue1_france)

In [4]:
print("\nPremier League (England)")
print(premierleague_england.head(3))

print("\nSerie A (Italy)")
print(seriea_italy.head(3))

print("\nLa Liga (Spain)")
print(laliga_spain.head(3))

print("\nBundesliga (Germany)")
print(bundesliga_germany.head(3))

print("\nLigue 1 (France)")
print(ligue1_france.head(3))


Premier League (England)
   position             team  gp   w  d  l  gf  ga  gd  pts
0         1          Arsenal  25  17  5  3  49  17  32   56
1         2  Manchester City  25  15  5  5  51  24  27   50
2         3      Aston Villa  25  14  5  6  36  27   9   47

Serie A (Italy)
   position            team  gp   w  d  l  gf  ga  gd  pts
0         1  Internazionale  24  19  1  4  57  19  38   58
1         2        AC Milan  23  14  8  1  38  17  21   50
2         3          Napoli  24  15  4  5  36  23  13   49

La Liga (Spain)
   position             team  gp   w  d  l  gf  ga  gd  pts
0         1        Barcelona  23  19  1  3  63  23  40   58
1         2      Real Madrid  23  18  3  2  49  18  31   57
2         3  Atl√©tico Madrid  23  13  6  4  38  18  20   45

Bundesliga (Germany)
   position               team  gp   w  d  l  gf  ga  gd  pts
0         1      Bayern Munich  21  17  3  1  79  19  60   54
1         2  Borussia Dortmund  21  14  6  1  43  20  23   48
2         3    

In [5]:
leagues_data = {
    "Premier League (England)": premierleague_england,
    "Serie A (Italy)": seriea_italy,
    "La Liga (Spain)": laliga_spain,
    "Bundesliga (Germany)": bundesliga_germany,
    "Ligue 1 (France)": ligue1_france,
}

matches_unplayed_ = {}

for league_name, df in leagues_data.items():
    num_teams = len(df)
    total_matches = num_teams * (num_teams - 1)  # double round-robin, total matches counted twice for GP
    matches_played = df["gp"].sum() /2              # GP already counts each match per team
    matches_unplayed = total_matches - matches_played
    
    matches_unplayed_[league_name] = matches_unplayed
    print(f"{league_name}: {matches_unplayed} matches unplayed")

Premier League (England): 130.0 matches unplayed
Serie A (Italy): 141.0 matches unplayed
La Liga (Spain): 152.0 matches unplayed
Bundesliga (Germany): 118.0 matches unplayed
Ligue 1 (France): 117.0 matches unplayed


## 2. Get betting odds using API

In [6]:
# Load variables from API_KEY.env
load_dotenv("API_KEY.env")

API_KEY = os.getenv("ODDS_DATA_API_KEY")

if API_KEY is None:
    raise ValueError("API_KEY not found. Check API_KEY.env")

print("API key loaded successfully")

API key loaded successfully


In [7]:
API_KEY = API_KEY  # assuming already defined

leagues = {
    "soccer_epl": "odds_premierleague_england",
    "soccer_italy_serie_a": "odds_seriea_italy",
    "soccer_spain_la_liga": "odds_laliga_spain",
    "soccer_germany_bundesliga": "odds_bundesliga_germany",
    "soccer_france_ligue_one": "odds_ligue1_france",
}

base_url = "https://api.the-odds-api.com/v4/sports/{}/odds"

params = {
    "apiKey": API_KEY,
    "regions": "uk",
    "markets": "h2h",
    "oddsFormat": "decimal",
    "dateFormat": "iso",
    "days": 365
}

for sport_key, var_name in leagues.items():
    url = base_url.format(sport_key)

    response = requests.get(url, params=params)
    response.raise_for_status()

    globals()[var_name] = response.json()

In [8]:
print("Premier League (England):", len(odds_premierleague_england))
print("Serie A (Italy):", len(odds_seriea_italy))
print("La Liga (Spain):", len(odds_laliga_spain))
print("Bundesliga (Germany):", len(odds_bundesliga_germany))
print("Ligue 1 (France):", len(odds_ligue1_france))

Premier League (England): 21
Serie A (Italy): 21
La Liga (Spain): 22
Bundesliga (Germany): 18
Ligue 1 (France): 18


In [9]:
def flatten_odds(data):
    rows = []

    for match in data:
        match_id = match["id"]
        home = match["home_team"]
        away = match["away_team"]
        time = match["commence_time"]

        for book in match["bookmakers"]:
            bookmaker = book["title"]

            # Find head-to-head (h2h) market. Find the market where key == 'h2h' (win/draw/win odds). If not found, skip this bookmaker.
            h2h = next((m for m in book["markets"] if m["key"] == "h2h"), None)
            if not h2h:
                continue

            outcomes = {o["name"]: o["price"] for o in h2h["outcomes"]}

            rows.append({
                "match_id": match_id,
                "commence_time": time,
                "home_team": home,
                "away_team": away,
                "bookmaker": bookmaker,
                "home_odds": outcomes.get(home),
                "draw_odds": outcomes.get("Draw"),
                "away_odds": outcomes.get(away),
            })

    return pd.DataFrame(rows)

In [10]:
# Flatten odds into DataFrames
df_premierleague_england = flatten_odds(odds_premierleague_england)
df_seriea_italy = flatten_odds(odds_seriea_italy)
df_laliga_spain = flatten_odds(odds_laliga_spain)
df_bundesliga_germany = flatten_odds(odds_bundesliga_germany)
df_ligue1_france = flatten_odds(odds_ligue1_france)

In [11]:
print("\nPremier League (England)")
print(df_premierleague_england.head(3))

print("\nSerie A (Italy)")
print(df_seriea_italy.head(3))

print("\nLa Liga (Spain)")
print(df_laliga_spain.head(3))

print("\nBundesliga (Germany)")
print(df_bundesliga_germany.head(3))

print("\nLigue 1 (France)")
print(df_ligue1_france.head(3))


Premier League (England)
                           match_id         commence_time home_team    away_team    bookmaker  home_odds  draw_odds  away_odds
0  5b5fa92a38c5a46da05ac81d003f6f1e  2026-02-10T19:30:00Z   Everton  Bournemouth  Unibet (UK)       2.45        3.4       2.90
1  5b5fa92a38c5a46da05ac81d003f6f1e  2026-02-10T19:30:00Z   Everton  Bournemouth        Coral       2.37        3.4       2.90
2  5b5fa92a38c5a46da05ac81d003f6f1e  2026-02-10T19:30:00Z   Everton  Bournemouth    Ladbrokes       2.37        3.4       2.87

Serie A (Italy)
                           match_id         commence_time home_team away_team     bookmaker  home_odds  draw_odds  away_odds
0  388f80ebfbf19ee68d444d4998706b02  2026-02-13T19:45:00Z      Pisa  AC Milan      888sport       5.75        4.0       1.50
1  388f80ebfbf19ee68d444d4998706b02  2026-02-13T19:45:00Z      Pisa  AC Milan  William Hill       5.80        4.0       1.50
2  388f80ebfbf19ee68d444d4998706b02  2026-02-13T19:45:00Z      Pisa  AC Mi

In [12]:
def bookmaker_implied_probs(df):
    # Convert odds to implied probabilities per bookmaker
    df = df.assign(
        p_home_raw=1 / df["home_odds"],
        p_draw_raw=1 / df["draw_odds"],
        p_away_raw=1 / df["away_odds"],
    )

    # Remove bookmaker margin (normalise)
    total = (
        df["p_home_raw"] +
        df["p_draw_raw"] +
        df["p_away_raw"]
    )

    df = df.assign(
        p_home_book=df["p_home_raw"] / total,
        p_draw_book=df["p_draw_raw"] / total,
        p_away_book=df["p_away_raw"] / total,
    )

    # Average normalised probabilities across bookmakers
    betting_odds_avg = (
        df.groupby(["home_team", "away_team"], as_index=False)
          .agg(
              p_home_book=("p_home_book", "mean"),
              p_draw_book=("p_draw_book", "mean"),
              p_away_book=("p_away_book", "mean"),
          )
    )

    # Keep only required fields
    betting_odds_avg = betting_odds_avg[
        [
            "home_team",
            "away_team",
            "p_home_book",
            "p_draw_book",
            "p_away_book",
        ]
    ]

    return betting_odds_avg

In [13]:
betting_odds_premierleague_england = bookmaker_implied_probs(df_premierleague_england)
betting_odds_seriea_italy = bookmaker_implied_probs(df_seriea_italy)
betting_odds_laliga_spain = bookmaker_implied_probs(df_laliga_spain)
betting_odds_bundesliga_germany = bookmaker_implied_probs(df_bundesliga_germany)
betting_odds_ligue1_france = bookmaker_implied_probs(df_ligue1_france)

In [14]:
leagues_odds = {
    "Premier League (England)": betting_odds_premierleague_england,
    "Serie A (Italy)": betting_odds_seriea_italy,
    "La Liga (Spain)": betting_odds_laliga_spain,
    "Bundesliga (Germany)": betting_odds_bundesliga_germany,
    "Ligue 1 (France)": betting_odds_ligue1_france
}

for league_name, df in leagues_odds.items():
    # Check duplicates based on home_team + away_team
    duplicates = df.duplicated(subset=["home_team", "away_team"], keep=False) # Do not include match id in the join because a same match may have different match ids (source's error).
    num_duplicates = duplicates.sum()
    
    print(f"\n{league_name}")
    print("Sample data:")
    display(df.head(3))
    
    if num_duplicates == 0:
        print("‚úÖ No duplicate matches found.")
    else:
        print(f"üö® Found {num_duplicates} duplicate row(s)!")
        display(df[duplicates])


Premier League (England)
Sample data:


Unnamed: 0,home_team,away_team,p_home_book,p_draw_book,p_away_book
0,Aston Villa,Brighton and Hove Albion,0.490589,0.260151,0.24926
1,Aston Villa,Leeds United,0.529842,0.254751,0.215407
2,Brentford,Arsenal,0.191587,0.244187,0.564227


‚úÖ No duplicate matches found.

Serie A (Italy)
Sample data:


Unnamed: 0,home_team,away_team,p_home_book,p_draw_book,p_away_book
0,AC Milan,Como,0.453931,0.277651,0.268417
1,AC Milan,Parma,0.697038,0.19456,0.108402
2,AS Roma,Cremonese,0.677538,0.208465,0.113997


‚úÖ No duplicate matches found.

La Liga (Spain)
Sample data:


Unnamed: 0,home_team,away_team,p_home_book,p_draw_book,p_away_book
0,Alav√©s,Girona,0.399041,0.318725,0.282234
1,Athletic Bilbao,Elche CF,0.585168,0.244477,0.170355
2,Atl√©tico Madrid,Espanyol,0.67493,0.205931,0.119139


‚úÖ No duplicate matches found.

Bundesliga (Germany)
Sample data:


Unnamed: 0,home_team,away_team,p_home_book,p_draw_book,p_away_book
0,1. FC Heidenheim,VfB Stuttgart,0.217662,0.228337,0.554002
1,1. FC K√∂ln,TSG Hoffenheim,0.318531,0.266114,0.415355
2,Augsburg,1. FC Heidenheim,0.510229,0.256177,0.233593


‚úÖ No duplicate matches found.

Ligue 1 (France)
Sample data:


Unnamed: 0,home_team,away_team,p_home_book,p_draw_book,p_away_book
0,AS Monaco,Nantes,0.652237,0.199513,0.14825
1,Angers,Lille,0.211448,0.267925,0.520627
2,Auxerre,Rennes,0.321061,0.288559,0.39038


‚úÖ No duplicate matches found.


## 3. Get fixtures for upcoming EPL games

In [15]:
# Load variables from API_KEY.env
load_dotenv("API_KEY.env")

API_KEY = os.getenv("FOOTBALL_DATA_API_KEY")

if API_KEY is None:
    raise ValueError("API_KEY not found. Check API_KEY.env")

print("API key loaded successfully")

API key loaded successfully


In [16]:
competitions = {
    "PL": "fixtures_premierleague_england",
    "SA": "fixtures_seriea_italy",
    "PD": "fixtures_laliga_spain",
    "BL1": "fixtures_bundesliga_germany",
    "FL1": "fixtures_ligue1_france",
}

headers = {
    "X-Auth-Token": API_KEY
}

today = datetime.utcnow().date()
end_of_season = today + timedelta(days=365)

params = {
    "status": "SCHEDULED",
    "dateFrom": today.isoformat(),
    "dateTo": end_of_season.isoformat()
}

for comp_code, df_name in competitions.items():
    url = f"https://api.football-data.org/v4/competitions/{comp_code}/matches"

    response = requests.get(url, headers=headers, params=params)
    response.raise_for_status()

    data = response.json()
    fixtures = data["matches"]

    df_fixtures = pd.DataFrame(fixtures)

    df_fixtures_clean = df_fixtures[
        ["utcDate", "status", "homeTeam", "awayTeam"]
    ].copy()  # copy avoids SettingWithCopyWarning

    # Extract team names
    df_fixtures_clean["homeTeam"] = df_fixtures_clean["homeTeam"].apply(lambda x: x["name"])
    df_fixtures_clean["awayTeam"] = df_fixtures_clean["awayTeam"].apply(lambda x: x["name"])

    globals()[df_name] = df_fixtures_clean

  today = datetime.utcnow().date()


In [17]:
print("Premier League (England):", len(fixtures_premierleague_england))
print("Serie A (Italy):", len(fixtures_seriea_italy))
print("La Liga (Spain):", len(fixtures_laliga_spain))
print("Bundesliga (Germany):", len(fixtures_bundesliga_germany))
print("Ligue 1 (France):", len(fixtures_ligue1_france))

Premier League (England): 129
Serie A (Italy): 141
La Liga (Spain): 151
Bundesliga (Germany): 118
Ligue 1 (France): 117


In [18]:
print("Premier League (England):", fixtures_premierleague_england.head(3))
print("Serie A (Italy):", fixtures_seriea_italy.head(3))
print("La Liga (Spain):", fixtures_laliga_spain.head(3))
print("Bundesliga (Germany):", fixtures_bundesliga_germany.head(3))
print("Ligue 1 (France):", fixtures_ligue1_france.head(3))

Premier League (England):                 utcDate status              homeTeam             awayTeam
0  2026-02-10T19:30:00Z  TIMED            Chelsea FC      Leeds United FC
1  2026-02-10T19:30:00Z  TIMED            Everton FC      AFC Bournemouth
2  2026-02-10T19:30:00Z  TIMED  Tottenham Hotspur FC  Newcastle United FC
Serie A (Italy):                 utcDate status      homeTeam        awayTeam
0  2026-02-13T19:45:00Z  TIMED  AC Pisa 1909        AC Milan
1  2026-02-14T14:00:00Z  TIMED     Como 1907  ACF Fiorentina
2  2026-02-14T17:00:00Z  TIMED      SS Lazio     Atalanta BC
La Liga (Spain):                 utcDate status                   homeTeam          awayTeam
0  2026-02-13T20:00:00Z  TIMED                   Elche CF        CA Osasuna
1  2026-02-14T13:00:00Z  TIMED  RCD Espanyol de Barcelona  RC Celta de Vigo
2  2026-02-14T15:15:00Z  TIMED                  Getafe CF     Villarreal CF
Bundesliga (Germany):                 utcDate status             homeTeam           awayTeam
0  

## 4. Get this season (2025/26) and last season (2024/25) results

In [19]:
competitions = {
    "PL": "premierleague_england",
    "SA": "seriea_italy",
    "PD": "laliga_spain",
    "BL1": "bundesliga_germany",
    "FL1": "ligue1_france",
}

seasons = [2025, 2024]  # finished seasons you want
headers = {
    "X-Auth-Token": API_KEY
}

for comp_code, league_name in competitions.items():
    for season in seasons:
        print(f"Fetching {league_name} season {season}...")
        url = f"https://api.football-data.org/v4/competitions/{comp_code}/matches"
        params = {
            "season": season,
            "status": "FINISHED"
        }

        response = requests.get(url, headers=headers, params=params)
        response.raise_for_status()

        matches = response.json()["matches"]

        clean_rows = []
        for m in matches:
            clean_rows.append({
                "utcDate": m["utcDate"],
                "matchday": m["matchday"],
                "status": m["status"],
                "homeTeam": m["homeTeam"]["name"],
                "awayTeam": m["awayTeam"]["name"],
                "homeGoals": m["score"]["fullTime"]["home"],
                "awayGoals": m["score"]["fullTime"]["away"],
                "winner": m["score"]["winner"],
            })

        df_clean = pd.DataFrame(clean_rows)

        globals()[f"past_matches_{league_name}_{season}_clean"] = df_clean
        
        # Wait a few seconds between each API request to avoid crushing
        time.sleep(10)

print("Done fetching all matches.")

Fetching premierleague_england season 2025...
Fetching premierleague_england season 2024...
Fetching seriea_italy season 2025...
Fetching seriea_italy season 2024...
Fetching laliga_spain season 2025...
Fetching laliga_spain season 2024...
Fetching bundesliga_germany season 2025...
Fetching bundesliga_germany season 2024...
Fetching ligue1_france season 2025...
Fetching ligue1_france season 2024...
Done fetching all matches.


In [20]:
for league in [
    "premierleague_england",
    "seriea_italy",
    "laliga_spain",
    "bundesliga_germany",
    "ligue1_france",
]:
    for season in [2025, 2024]:
        df = globals()[f"past_matches_{league}_{season}_clean"]
        print(f"\n{league.replace('_', ' ').title()} ‚Äì Season {season}")
        print(df.tail(2))


Premierleague England ‚Äì Season 2025
                  utcDate  matchday    status                   homeTeam            awayTeam  homeGoals  awayGoals     winner
248  2026-02-08T14:00:00Z        25  FINISHED  Brighton & Hove Albion FC   Crystal Palace FC          0          1  AWAY_TEAM
249  2026-02-08T16:30:00Z        25  FINISHED               Liverpool FC  Manchester City FC          1          2  AWAY_TEAM

Premierleague England ‚Äì Season 2024
                  utcDate  matchday    status                    homeTeam                   awayTeam  homeGoals  awayGoals     winner
378  2025-05-25T15:00:00Z        38  FINISHED        Tottenham Hotspur FC  Brighton & Hove Albion FC          1          4  AWAY_TEAM
379  2025-05-25T15:00:00Z        38  FINISHED  Wolverhampton Wanderers FC               Brentford FC          1          1       DRAW

Seriea Italy ‚Äì Season 2025
                  utcDate  matchday    status     homeTeam         awayTeam  homeGoals  awayGoals     winner
237

In [21]:
# Combine datasets into past_matches_ and future_matches_

leagues = [
    "premierleague_england",
    "seriea_italy",
    "laliga_spain",
    "bundesliga_germany",
    "ligue1_france",
]

for league in leagues:
    # Load DataFrames
    df_current = globals()[f"past_matches_{league}_2025_clean"]
    df_prev = globals()[f"past_matches_{league}_2024_clean"]
    df_future = globals()[f"fixtures_{league}"]

    # Combine all past fixtures together
    df_all = pd.concat([df_prev, df_current], ignore_index=True)

    # Store results
    globals()[f"past_matches_{league}_all"] = df_all
    globals()[f"future_matches_{league}"] = df_future

## 5. Unify team names across datasets

In [22]:
# === Premier League ===
mapping_premierleague_england = {
    # df_probabilities
    "Aston Villa FC": "Aston Villa",
    "Leeds United FC": "Leeds United",
    "Newcastle United FC": "Newcastle United",
    "Crystal Palace FC": "Crystal Palace",
    "Chelsea FC": "Chelsea",
    "Arsenal FC": "Arsenal",
    "Everton FC": "Everton",
    "Burnley FC": "Burnley",
    "Brighton & Hove Albion FC": "Brighton & Hove Albion",
    "Sunderland AFC": "Sunderland",
    "West Ham United FC": "West Ham United",
    "Manchester City FC": "Manchester City",
    "Manchester United FC": "Manchester United",
    "Fulham FC": "Fulham",
    "Liverpool FC": "Liverpool",
    "Brentford FC": "Brentford",
    "Wolverhampton Wanderers FC": "Wolverhampton Wanderers",
    "Nottingham Forest FC": "Nottingham Forest",
    "Tottenham Hotspur FC": "Tottenham Hotspur",
    # betting_odds
    "Brighton and Hove Albion": "Brighton & Hove Albion",
    "Bournemouth": "AFC Bournemouth"
}

# === Serie A ===
mapping_seriea_italy = {
    # df_probabilities
    "US Sassuolo Calcio": "Sassuolo",
    "Cagliari Calcio": "Cagliari",
    "Atalanta BC": "Atalanta",
    "SS Lazio": "Lazio",
    "Genoa CFC": "Genoa",
    "Udinese Calcio": "Udinese",
    "FC Internazionale Milano": "Internazionale",
    "Torino FC": "Torino",
    "AC Pisa 1909": "Pisa",
    "ACF Fiorentina": "Fiorentina",
    "AS Roma": "AS Roma",
    "Juventus FC": "Juventus",
    "Como 1907": "Como",
    "US Cremonese": "Cremonese",
    "Bologna FC 1909": "Bologna",
    "Parma Calcio 1913": "Parma",
    "Hellas Verona FC": "Hellas Verona",
    "SSC Napoli": "Napoli",
    "US Lecce": "Lecce",
    # betting_odds
    "Inter Milan": "Internazionale",
    "Como": "Como"
}

# === La Liga ===
mapping_laliga_spain = {
    # df_probabilities
    "Club Atl√©tico de Madrid": "Atl√©tico Madrid",
    "Rayo Vallecano de Madrid": "Rayo Vallecano",
    "Valencia CF": "Valencia",
    "Deportivo Alav√©s": "Alav√©s",
    "CA Osasuna": "Osasuna",
    "RCD Espanyol de Barcelona": "Espanyol",
    "Getafe CF": "Getafe",
    "Real Sociedad de F√∫tbol": "Real Sociedad",
    "Levante UD": "Levante",
    "Real Betis Balompi√©": "Real Betis",
    "RCD Mallorca": "Mallorca",
    "Girona FC": "Girona",
    "Villarreal CF": "Villarreal",
    "FC Barcelona": "Barcelona",
    "Elche CF": "Elche",
    "Sevilla FC": "Sevilla",
    "Real Madrid CF": "Real Madrid",
    "RC Celta de Vigo": "Celta Vigo",
    # betting_odds
    "Elche CF": "Elche",
    "Oviedo": "Real Oviedo",
    "CA Osasuna": "Osasuna",
    "Athletic Bilbao": "Athletic Club"
}

# === Bundesliga ===
mapping_bundesliga_germany = {
    # df_probabilities
    "1. FC K√∂ln": "FC Cologne",
    "TSG 1899 Hoffenheim": "TSG Hoffenheim",
    "1. FSV Mainz 05": "Mainz",
    "SV Werder Bremen": "Werder Bremen",
    "Hamburger SV": "Hamburg SV",
    "Bayer 04 Leverkusen": "Bayer Leverkusen",
    "FC St. Pauli 1910": "St. Pauli",
    "FC Bayern M√ºnchen": "Bayern Munich",
    # betting_odds
    "1. FC Heidenheim": "1. FC Heidenheim 1846",
    "Union Berlin": "1. FC Union Berlin",
    "Borussia Monchengladbach": "Borussia M√∂nchengladbach",
    "FSV Mainz 05": "Mainz",
    "Bayer Leverkusen": "Bayer Leverkusen",
    "Augsburg": "FC Augsburg",
    "FC St. Pauli": "St. Pauli"
}

# === Ligue 1 ===
mapping_ligue1_france = {
    # df_probabilities
    "Racing Club de Lens": "Lens",
    "OGC Nice": "Nice",
    "FC Metz": "Metz",
    "Angers SCO": "Angers",
    "Stade Brestois 29": "Brest",
    "Olympique Lyonnais": "Lyon",
    "Paris Saint-Germain FC": "Paris Saint-Germain",
    "AS Monaco FC": "AS Monaco",
    "Lille OSC": "Lille",
    "Toulouse FC": "Toulouse",
    "FC Nantes": "Nantes",
    "RC Strasbourg Alsace": "Strasbourg",
    "FC Lorient": "Lorient",
    "Olympique de Marseille": "Marseille",
    "Stade Rennais FC 1901": "Stade Rennais",
    # betting_odds
    "RC Lens": "Lens",
    "Paris Saint Germain": "Paris Saint-Germain",
    "Auxerre": "AJ Auxerre",
    "Lyon": "Lyon",
    "Le Havre": "Le Havre AC",
    "Rennes": "Stade Rennais",
    "Metz": "Metz",
    "Nice": "Nice",
    "Lille": "Lille"
}

In [23]:
# Mapping dictionaries per league
mappings = {
    "premierleague_england": mapping_premierleague_england,
    "seriea_italy": mapping_seriea_italy,
    "laliga_spain": mapping_laliga_spain,
    "bundesliga_germany": mapping_bundesliga_germany,
    "ligue1_france": mapping_ligue1_france,
}

# Datasets to unify
datasets_templates = [
    "past_matches_{}_all",
    "future_matches_{}",
    "betting_odds_{}"
]

for league, mapping in mappings.items():
    print(f"Applying mapping for {league}...")
    
    for ds_template in datasets_templates:
        ds_name = ds_template.format(league)
        df = globals()[ds_name]
        
        # Replace team names using the mapping
        df.replace(mapping, inplace=True)

        print(f"  Updated {ds_name} team names.")

Applying mapping for premierleague_england...
  Updated past_matches_premierleague_england_all team names.
  Updated future_matches_premierleague_england team names.
  Updated betting_odds_premierleague_england team names.
Applying mapping for seriea_italy...
  Updated past_matches_seriea_italy_all team names.
  Updated future_matches_seriea_italy team names.
  Updated betting_odds_seriea_italy team names.
Applying mapping for laliga_spain...
  Updated past_matches_laliga_spain_all team names.
  Updated future_matches_laliga_spain team names.
  Updated betting_odds_laliga_spain team names.
Applying mapping for bundesliga_germany...
  Updated past_matches_bundesliga_germany_all team names.
  Updated future_matches_bundesliga_germany team names.
  Updated betting_odds_bundesliga_germany team names.
Applying mapping for ligue1_france...
  Updated past_matches_ligue1_france_all team names.
  Updated future_matches_ligue1_france team names.
  Updated betting_odds_ligue1_france team names.


In [24]:
def get_team_columns(df):
    if {"homeTeam", "awayTeam"}.issubset(df.columns):
        return "homeTeam", "awayTeam"
    if {"home_team", "away_team"}.issubset(df.columns):
        return "home_team", "away_team"
    raise KeyError("No home/away team columns found")

def extract_teams(df):
    home_col, away_col = get_team_columns(df)
    return set(df[home_col]).union(set(df[away_col]))

print("\nüîé TEAM NAME CONSISTENCY CHECK - Only relegated teams should appear here.\n")

for league in mappings.keys():

    print(f"=== {league.replace('_', ' ').title()} ===")

    past = globals()[f"past_matches_{league}_all"]
    future = globals()[f"future_matches_{league}"]
    odds = globals()[f"betting_odds_{league}"]

    teams_past = extract_teams(past)
    teams_future = extract_teams(future)
    teams_odds = extract_teams(odds)

    all_sets = {
        "past_matches": teams_past,
        "future_matches": teams_future,
        "betting_odds": teams_odds,
    }

    union = set.union(*all_sets.values())
    ok = True

    for name, team_set in all_sets.items():
        missing = union - team_set
        extra = team_set - union

        if missing or extra:
            ok = False
            print(f"\n‚ö†Ô∏è {name}:")
            if missing:
                print(f"  Missing teams ({len(missing)}): {sorted(missing)}")
            if extra:
                print(f"  Extra teams ({len(extra)}): {sorted(extra)}")

    if ok:
        print("‚úÖ All datasets use identical team names")

    print()


üîé TEAM NAME CONSISTENCY CHECK - Only relegated teams should appear here.

=== Premierleague England ===

‚ö†Ô∏è future_matches:
  Missing teams (3): ['Ipswich Town FC', 'Leicester City FC', 'Southampton FC']

‚ö†Ô∏è betting_odds:
  Missing teams (3): ['Ipswich Town FC', 'Leicester City FC', 'Southampton FC']

=== Seriea Italy ===

‚ö†Ô∏è future_matches:
  Missing teams (3): ['AC Monza', 'Empoli FC', 'Venezia FC']

‚ö†Ô∏è betting_odds:
  Missing teams (3): ['AC Monza', 'Empoli FC', 'Venezia FC']

=== Laliga Spain ===

‚ö†Ô∏è future_matches:
  Missing teams (3): ['CD Legan√©s', 'Real Valladolid CF', 'UD Las Palmas']

‚ö†Ô∏è betting_odds:
  Missing teams (3): ['CD Legan√©s', 'Real Valladolid CF', 'UD Las Palmas']

=== Bundesliga Germany ===

‚ö†Ô∏è future_matches:
  Missing teams (2): ['Holstein Kiel', 'VfL Bochum 1848']

‚ö†Ô∏è betting_odds:
  Missing teams (2): ['Holstein Kiel', 'VfL Bochum 1848']

=== Ligue1 France ===

‚ö†Ô∏è future_matches:
  Missing teams (3): ['AS Saint-√âtienn

## 6. Check for missing fixtures (rescheduled matches)

In [25]:
for league in leagues:
    future_matches = globals()[f"future_matches_{league}"]
    betting_odds = globals()[f"betting_odds_{league}"]

    # Identify bookmaker matches missing from future_matches
    future_set = set(zip(future_matches["homeTeam"], future_matches["awayTeam"]))
    book_set = set(zip(betting_odds["home_team"], betting_odds["away_team"]))
    missing_matches = book_set - future_set

    # If any are missing, append them to the future_matches DataFrame
    if missing_matches:
        print(f"{league}: Adding {len(missing_matches)} missing matches from bookmaker to future_matches")
        rows_to_add = []
        for home_team, away_team in missing_matches:
            row = betting_odds[
                (betting_odds["home_team"] == home_team) &
                (betting_odds["away_team"] == away_team)
            ].iloc[0]  # take first row if duplicates
            rows_to_add.append({
                "utcDate": row.get("utcDate", pd.NaT),  # fallback if missing
                "homeTeam": home_team,
                "awayTeam": away_team,
            })

        future_matches = pd.concat([future_matches, pd.DataFrame(rows_to_add)], ignore_index=True)
        globals()[f"future_matches_{league}"] = future_matches
        print(f"‚úÖ Added {len(rows_to_add)} matches to future_matches")
    else:
        print(f"{league}: No missing matches from bookmaker odds")

premierleague_england: No missing matches from bookmaker odds
seriea_italy: No missing matches from bookmaker odds
laliga_spain: No missing matches from bookmaker odds
bundesliga_germany: No missing matches from bookmaker odds
ligue1_france: No missing matches from bookmaker odds


In [26]:
SEASON_START_DATE = pd.Timestamp("2025-08-01")

leagues = [
    "premierleague_england",
    "seriea_italy",
    "laliga_spain",
    "bundesliga_germany",
    "ligue1_france",
]

# HELPERS

def filter_current_season(past_matches):
    """
    Keep only matches from 2025/2026 season using utcDate
    """
    df = past_matches.copy()

    if "utcDate" not in df.columns:
        raise ValueError("Expected column 'utcDate' not found in past matches")

    df["utcDate"] = pd.to_datetime(df["utcDate"], utc=True).dt.tz_localize(None)
    return df[df["utcDate"] >= SEASON_START_DATE]


def season_fixtures(past_matches, future_matches):
    """
    All fixtures (home, away) for this season only
    """
    return pd.concat(
        [
            past_matches[["homeTeam", "awayTeam"]],
            future_matches[["homeTeam", "awayTeam"]],
        ],
        ignore_index=True,
    )


def find_missing_reverse_fixture(team, opponent, fixtures):
    """
    Detect missing home/away direction for a team-opponent pair
    """
    team_home = (
        (fixtures.homeTeam == team)
        & (fixtures.awayTeam == opponent)
    ).any()

    team_away = (
        (fixtures.homeTeam == opponent)
        & (fixtures.awayTeam == team)
    ).any()

    if team_home and not team_away:
        return opponent, team   # opponent HOME vs team AWAY

    if team_away and not team_home:
        return team, opponent   # team HOME vs opponent AWAY

    return None


# MAIN LOGIC

missing_fixtures = []

for league in leagues:
    print(f"\n=== {league.replace('_',' ').title()} ===")

    league_table = globals()[league]
    future_matches = globals()[f"future_matches_{league}"]
    past_matches_all = globals()[f"past_matches_{league}_all"]

    # FILTER PAST MATCHES TO THIS SEASON ONLY
    past_matches = filter_current_season(past_matches_all)

    fixtures = season_fixtures(past_matches, future_matches)

    total_teams = len(league_table)
    matches_per_team = (total_teams - 1) * 2

    played_counts = (
        fixtures.homeTeam.value_counts()
        .add(fixtures.awayTeam.value_counts(), fill_value=0)
    )

    missing_teams = {
        team: matches_per_team - played_counts.get(team, 0)
        for team in league_table["team"]
        if matches_per_team - played_counts.get(team, 0) > 0
    }

    print(
        f"Past matches (this season): {len(past_matches)} | "
        f"Future matches: {len(future_matches)} | "
        f"Total fixtures: {len(fixtures)}"
    )

    if not missing_teams:
        print("‚úÖ All teams have complete fixtures")
        continue

    print("‚ö†Ô∏è Teams missing fixtures:")
    for team, missing in missing_teams.items():
        print(f"  {team}: {missing}")

    league_teams = set(league_table["team"])

    for team in missing_teams:
        for opponent in league_teams - {team}:
            result = find_missing_reverse_fixture(team, opponent, fixtures)
            if result:
                home, away = result
                missing_fixtures.append(
                    {
                        "league": league,
                        "homeTeam": home,
                        "awayTeam": away,
                    }
                )


# FINAL OUTPUT

missing_df = (
    pd.DataFrame(missing_fixtures)
    .drop_duplicates()
    .sort_values(["league", "homeTeam"])
)

print("\nüö® Missing fixtures detected:")
display(missing_df)


=== Premierleague England ===
Past matches (this season): 250 | Future matches: 129 | Total fixtures: 379
‚ö†Ô∏è Teams missing fixtures:
  Manchester City: 1
  Crystal Palace: 1

=== Seriea Italy ===
Past matches (this season): 239 | Future matches: 141 | Total fixtures: 380
‚úÖ All teams have complete fixtures

=== Laliga Spain ===
Past matches (this season): 228 | Future matches: 151 | Total fixtures: 379
‚ö†Ô∏è Teams missing fixtures:
  Rayo Vallecano: 1
  Real Oviedo: 1

=== Bundesliga Germany ===
Past matches (this season): 188 | Future matches: 118 | Total fixtures: 306
‚úÖ All teams have complete fixtures

=== Ligue1 France ===
Past matches (this season): 189 | Future matches: 117 | Total fixtures: 306
‚úÖ All teams have complete fixtures

üö® Missing fixtures detected:


Unnamed: 0,league,homeTeam,awayTeam
2,laliga_spain,Rayo Vallecano,Real Oviedo
0,premierleague_england,Manchester City,Crystal Palace


In [27]:
# APPEND MISSING FIXTURES (IN-PLACE, WITH BACKUP)

# Optional safety backup (one-liner per league)
future_matches_backup = {}

for league in missing_df["league"].unique():
    print(f"\n‚ûï Appending missing fixtures to {league.replace('_',' ').title()}")

    # Backup original
    future_matches_backup[league] = globals()[f"future_matches_{league}"].copy()

    future_matches = globals()[f"future_matches_{league}"]

    league_missing = missing_df[missing_df["league"] == league]

    for _, row in league_missing.iterrows():
        # Create an empty row with same columns
        new_match = {col: pd.NA for col in future_matches.columns}
        new_match["homeTeam"] = row["homeTeam"]
        new_match["awayTeam"] = row["awayTeam"]

        # Append IN PLACE
        future_matches.loc[len(future_matches)] = new_match

        print(f"  Added: {row['homeTeam']} vs {row['awayTeam']}")

    print(f"  Total future matches now: {len(future_matches)}")


‚ûï Appending missing fixtures to Laliga Spain
  Added: Rayo Vallecano vs Real Oviedo
  Total future matches now: 152

‚ûï Appending missing fixtures to Premierleague England
  Added: Manchester City vs Crystal Palace
  Total future matches now: 130


In [28]:
# LEAGUE MATCH COUNT VALIDATION

league_map = {
    "Premier League (England)": "premierleague_england",
    "Serie A (Italy)": "seriea_italy",
    "La Liga (Spain)": "laliga_spain",
    "Bundesliga (Germany)": "bundesliga_germany",
    "Ligue 1 (France)": "ligue1_france",
}

print("\nüìä League match count validation (CORRECT)\n")

for league_name, league_key in league_map.items():
    league_table = globals()[league_key]

    past_matches = filter_current_season(
        globals()[f"past_matches_{league_key}_all"]
    )
    future_matches = globals()[f"future_matches_{league_key}"]

    num_teams = len(league_table)

    # ‚úÖ Correct total matches (each match counted once)
    total_matches = num_teams * (num_teams - 1) 

    # Matches played
    matches_played_gp = league_table["gp"].sum() / 2
    matches_played_data = len(past_matches)

    # Matches unplayed
    matches_unplayed_gp = total_matches - matches_played_gp
    matches_unplayed_data = len(future_matches)

    print(f"{league_name}")
    print(f"  Teams: {num_teams}")
    print(f"  Total matches (theoretical): {total_matches}")
    print(f"  Matches played (GP-based):   {int(matches_played_gp)}")
    print(f"  Matches played (data):       {matches_played_data}")
    print(f"  Matches unplayed (GP-based): {int(matches_unplayed_gp)}")
    print(f"  Matches unplayed (data):     {matches_unplayed_data}")

    if (
        matches_played_gp == matches_played_data
        and matches_unplayed_gp == matches_unplayed_data
    ):
        print("  ‚úÖ League is consistent\n")
    else:
        print("  üö® MISMATCH DETECTED\n")


üìä League match count validation (CORRECT)

Premier League (England)
  Teams: 20
  Total matches (theoretical): 380
  Matches played (GP-based):   250
  Matches played (data):       250
  Matches unplayed (GP-based): 130
  Matches unplayed (data):     130
  ‚úÖ League is consistent

Serie A (Italy)
  Teams: 20
  Total matches (theoretical): 380
  Matches played (GP-based):   239
  Matches played (data):       239
  Matches unplayed (GP-based): 141
  Matches unplayed (data):     141
  ‚úÖ League is consistent

La Liga (Spain)
  Teams: 20
  Total matches (theoretical): 380
  Matches played (GP-based):   228
  Matches played (data):       228
  Matches unplayed (GP-based): 152
  Matches unplayed (data):     152
  ‚úÖ League is consistent

Bundesliga (Germany)
  Teams: 18
  Total matches (theoretical): 306
  Matches played (GP-based):   188
  Matches played (data):       188
  Matches unplayed (GP-based): 118
  Matches unplayed (data):     118
  ‚úÖ League is consistent

Ligue 1 (France

## 7. Combine and calculate probabilities of W/D/L for each match

In [29]:
leagues = [
    "premierleague_england",
    "seriea_italy",
    "laliga_spain",
    "bundesliga_germany",
    "ligue1_france",
]

for league in leagues:
    df_all = globals()[f"past_matches_{league}_all"].copy()

    # Convert date
    df_all["date"] = pd.to_datetime(df_all["utcDate"])

    # Sort so newer matches get higher weight
    df_all = df_all.sort_values("date").reset_index(drop=True)

    # Add linear weights (oldest ‚Üí newest)
    df_all["weight"] = np.linspace(1, 2, len(df_all))

    # Store weighted dataset
    globals()[f"past_matches_{league}_weighted"] = df_all

In [30]:
for league in [
    "premierleague_england",
    "seriea_italy",
    "laliga_spain",
    "bundesliga_germany",
    "ligue1_france",
]:
    df = globals()[f"past_matches_{league}_weighted"]
    print(f"\n{league.replace('_', ' ').title()} ‚Äì weighted past matches")
    print(df.tail(2))


Premierleague England ‚Äì weighted past matches
                  utcDate  matchday    status                homeTeam         awayTeam  homeGoals  awayGoals     winner                      date   weight
628  2026-02-08T14:00:00Z        25  FINISHED  Brighton & Hove Albion   Crystal Palace          0          1  AWAY_TEAM 2026-02-08 14:00:00+00:00  1.99841
629  2026-02-08T16:30:00Z        25  FINISHED               Liverpool  Manchester City          1          2  AWAY_TEAM 2026-02-08 16:30:00+00:00  2.00000

Seriea Italy ‚Äì weighted past matches
                  utcDate  matchday    status  homeTeam   awayTeam  homeGoals  awayGoals     winner                      date    weight
617  2026-02-09T17:30:00Z        24  FINISHED  Atalanta  Cremonese          3          1  HOME_TEAM 2026-02-09 17:30:00+00:00  1.998382
618  2026-02-09T19:45:00Z        24  FINISHED   AS Roma   Cagliari          2          0  HOME_TEAM 2026-02-09 19:45:00+00:00  2.000000

Laliga Spain ‚Äì weighted past matche

In [31]:
# compute home advantage per league and save it to globals().

leagues = [
    "premierleague_england",
    "seriea_italy",
    "laliga_spain",
    "bundesliga_germany",
    "ligue1_france",
]

home_advantage_by_league = {}

for league in leagues:
    # Access the weighted past matches for this league
    df_all = globals()[f"past_matches_{league}_weighted"]

    # Compute home advantage
    home_adv = df_all["homeGoals"].mean() - df_all["awayGoals"].mean()

    # Save to dictionary
    home_advantage_by_league[league] = home_adv

    # Save to globals (for your Poisson model)
    globals()[f"home_advantage_{league}"] = home_adv

    # Print nicely
    print(f"{league.replace('_', ' ').title()}: {home_adv:.3f}")

Premierleague England: 0.183
Seriea Italy: 0.121
Laliga Spain: 0.329
Bundesliga Germany: 0.183
Ligue1 France: 0.328


In [32]:
leagues = [
    "premierleague_england",
    "seriea_italy",
    "laliga_spain",
    "bundesliga_germany",
    "ligue1_france",
]

for league in leagues:
    df_all = globals()[f"past_matches_{league}_weighted"]

    # All teams in the league
    teams = pd.unique(df_all[["homeTeam", "awayTeam"]].values.ravel("K"))

    attack = pd.Series(1.0, index=teams)
    defense = pd.Series(1.0, index=teams)

    team_stats = {}

    for team in teams:
        home_games = df_all[df_all["homeTeam"] == team]
        away_games = df_all[df_all["awayTeam"] == team]

        goals_scored = (
            (home_games["homeGoals"] * home_games["weight"]).sum() +
            (away_games["awayGoals"] * away_games["weight"]).sum()
        )

        goals_against = (
            (home_games["awayGoals"] * home_games["weight"]).sum() +
            (away_games["homeGoals"] * away_games["weight"]).sum()
        )

        matches = home_games["weight"].sum() + away_games["weight"].sum()

        team_stats[team] = {
            "scored": goals_scored / matches,
            "against": goals_against / matches
        }

    # League average goals per team per match
    league_avg_scored = (
        df_all["homeGoals"].mean() + df_all["awayGoals"].mean()
    ) / 2

    for team in teams:
        attack[team] = team_stats[team]["scored"] / league_avg_scored
        defense[team] = team_stats[team]["against"] / league_avg_scored

    # Store outputs
    globals()[f"attack_{league}"] = attack
    globals()[f"defense_{league}"] = defense
    globals()[f"league_avg_scored_{league}"] = league_avg_scored

üî• Summary

This function:
+ Calculates expected goals for each team
+ Uses Poisson distribution to compute goal probabilities
+ Converts score probabilities into match outcome probabilities
+ Returns probabilities for:
++ home win
++ draw
++ away win

In [33]:
def match_probabilities_league(
    home,
    away,
    attack,
    defense,
    league_avg_scored,
    home_advantage,
    max_goals=6,
):
    # expected goals
    exp_home = np.exp(
        np.log(league_avg_scored)
        + np.log(attack[home])
        + np.log(defense[away])
        + home_advantage
    )

    exp_away = np.exp(
        np.log(league_avg_scored)
        + np.log(attack[away])
        + np.log(defense[home])
    )

    p_home = poisson.pmf(range(max_goals + 1), exp_home)
    p_away = poisson.pmf(range(max_goals + 1), exp_away)

    p_win = p_draw = p_loss = 0.0

    for i in range(max_goals + 1):
        for j in range(max_goals + 1):
            prob = p_home[i] * p_away[j]
            if i > j:
                p_win += prob
            elif i == j:
                p_draw += prob
            else:
                p_loss += prob

    return p_win, p_draw, p_loss

In [34]:
leagues = [
    "premierleague_england",
    "seriea_italy",
    "laliga_spain",
    "bundesliga_germany",
    "ligue1_france",
]

for league in leagues:
    df_future = globals()[f"fixtures_{league}"] 

    attack = globals()[f"attack_{league}"]
    defense = globals()[f"defense_{league}"]
    league_avg_scored = globals()[f"league_avg_scored_{league}"]
    home_advantage = globals()[f"home_advantage_{league}"]

    results = []

    for _, row in df_future.iterrows():
        home = row["homeTeam"]
        away = row["awayTeam"]

        p_win, p_draw, p_loss = match_probabilities_league(
            home,
            away,
            attack,
            defense,
            league_avg_scored,
            home_advantage,
        )

        results.append({
            "utcDate": row["utcDate"],
            "homeTeam": home,
            "awayTeam": away,
            "p_home_win": p_win,
            "p_draw": p_draw,
            "p_away_win": p_loss,
        })

    globals()[f"df_probabilities_{league}"] = pd.DataFrame(results)

In [35]:
for league in leagues:
    print(f"\n=== {league.upper()} ===")
    print(globals()[f"df_probabilities_{league}"].head(2))


=== PREMIERLEAGUE_ENGLAND ===
                utcDate homeTeam         awayTeam  p_home_win    p_draw  p_away_win
0  2026-02-10T19:30:00Z  Chelsea     Leeds United    0.664383  0.175078    0.147392
1  2026-02-10T19:30:00Z  Everton  AFC Bournemouth    0.404917  0.262530    0.331692

=== SERIEA_ITALY ===
                utcDate homeTeam    awayTeam  p_home_win    p_draw  p_away_win
0  2026-02-13T19:45:00Z     Pisa    AC Milan    0.101870  0.178025    0.712704
1  2026-02-14T14:00:00Z     Como  Fiorentina    0.513967  0.236934    0.246825

=== LALIGA_SPAIN ===
                utcDate  homeTeam    awayTeam  p_home_win    p_draw  p_away_win
0  2026-02-13T20:00:00Z     Elche     Osasuna    0.461037  0.224675    0.310362
1  2026-02-14T13:00:00Z  Espanyol  Celta Vigo    0.388553  0.243007    0.366588

=== BUNDESLIGA_GERMANY ===
                utcDate           homeTeam       awayTeam  p_home_win    p_draw  p_away_win
0  2026-02-13T19:30:00Z  Borussia Dortmund          Mainz    0.620008  0.193

## 8. Compare calculated probabilities to bookmaker ones

In [36]:
rmse_results = {}

for league in leagues:
    print(f"\n=== {league.replace('_', ' ').title()} ===")
    
    # Load model probabilities and bookmaker odds
    df_model = globals()[f"df_probabilities_{league}"]
    df_book = globals()[f"betting_odds_{league}"]

    # Merge on home/away team names
    df_compare = df_model.merge(
        df_book,
        left_on=["homeTeam", "awayTeam"],
        right_on=["home_team", "away_team"],
        how="inner"
    )

    print("Matched rows:", len(df_compare))

    # RMSE per outcome
    rmse_home = np.sqrt(np.mean((df_compare["p_home_win"] - df_compare["p_home_book"])**2))
    rmse_draw = np.sqrt(np.mean((df_compare["p_draw"] - df_compare["p_draw_book"])**2))
    rmse_away = np.sqrt(np.mean((df_compare["p_away_win"] - df_compare["p_away_book"])**2))

    # Total RMSE across all outcomes
    rmse_total = np.sqrt(np.mean(
        (df_compare["p_home_win"] - df_compare["p_home_book"])**2 +
        (df_compare["p_draw"] - df_compare["p_draw_book"])**2 +
        (df_compare["p_away_win"] - df_compare["p_away_book"])**2
    ))

    rmse_results[league] = {
        "rmse_home": rmse_home,
        "rmse_draw": rmse_draw,
        "rmse_away": rmse_away,
        "rmse_total": rmse_total
    }

    print(f"RMSE Home: {rmse_home:.4f}, Draw: {rmse_draw:.4f}, Away: {rmse_away:.4f}")
    print(f"Total RMSE: {rmse_total:.4f}")


=== Premierleague England ===
Matched rows: 21
RMSE Home: 0.0546, Draw: 0.0290, Away: 0.0547
Total RMSE: 0.0825

=== Seriea Italy ===
Matched rows: 21
RMSE Home: 0.0483, Draw: 0.0410, Away: 0.0444
Total RMSE: 0.0773

=== Laliga Spain ===
Matched rows: 21
RMSE Home: 0.0925, Draw: 0.0611, Away: 0.0672
Total RMSE: 0.1296

=== Bundesliga Germany ===
Matched rows: 18
RMSE Home: 0.0578, Draw: 0.0441, Away: 0.0632
Total RMSE: 0.0963

=== Ligue1 France ===
Matched rows: 18
RMSE Home: 0.0795, Draw: 0.0567, Away: 0.0444
Total RMSE: 0.1072


## 9. Replace my estimates probabilities with the ones I have from odds, creating my final match probabilities

In [37]:
df_final_probabilities_all = {}

for league in leagues:
    print(f"\n=== {league.replace('_', ' ').title()} ===")
    
    df_probabilities = globals()[f"df_probabilities_{league}"]
    betting_odds_avg = globals()[f"betting_odds_{league}"]

    # Drop duplicate bookmaker odds for the same fixture (keep first)
    betting_odds_avg = betting_odds_avg.drop_duplicates(
        subset=["home_team", "away_team"], keep="first"
    )

    # Merge model and bookmaker probabilities
    df_final_probabilities = df_probabilities.merge(
        betting_odds_avg,
        left_on=["homeTeam", "awayTeam"],
        right_on=["home_team", "away_team"],
        how="left"
    )

    # Keep relevant columns
    df_final_probabilities = df_final_probabilities[[
        "utcDate",
        "homeTeam",
        "awayTeam",
        "p_home_win",
        "p_draw",
        "p_away_win",
        "p_home_book",
        "p_draw_book",
        "p_away_book",
    ]]

    # Replace model probabilities with bookmaker odds where available
    df_final_probabilities["p_home_final"] = np.where(
        df_final_probabilities["p_home_book"].notna(),
        df_final_probabilities["p_home_book"],
        df_final_probabilities["p_home_win"]
    )
    df_final_probabilities["p_draw_final"] = np.where(
        df_final_probabilities["p_draw_book"].notna(),
        df_final_probabilities["p_draw_book"],
        df_final_probabilities["p_draw"]
    )
    df_final_probabilities["p_away_final"] = np.where(
        df_final_probabilities["p_away_book"].notna(),
        df_final_probabilities["p_away_book"],
        df_final_probabilities["p_away_win"]
    )

    # Count totals
    total_rows = len(df_final_probabilities)
    used_book = df_final_probabilities["p_home_book"].notna().sum()
    used_model = df_final_probabilities["p_home_book"].isna().sum()

    print(f"Total matches: {total_rows}")
    print(f"Used betting odds: {used_book}")
    print(f"Used model: {used_model}")

    # Keep only final probabilities
    df_final_probabilities = df_final_probabilities[[
        "utcDate",
        "homeTeam",
        "awayTeam",
        "p_home_final",
        "p_draw_final",
        "p_away_final"
    ]]

    df_final_probabilities_all[league] = df_final_probabilities
    display(df_final_probabilities.head(2))


=== Premierleague England ===
Total matches: 130
Used betting odds: 21
Used model: 109


Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_final,p_draw_final,p_away_final
0,2026-02-10T19:30:00Z,Chelsea,Leeds United,0.615089,0.222499,0.162413
1,2026-02-10T19:30:00Z,Everton,AFC Bournemouth,0.395363,0.281382,0.323255



=== Seriea Italy ===
Total matches: 141
Used betting odds: 21
Used model: 120


Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_final,p_draw_final,p_away_final
0,2026-02-13T19:45:00Z,Pisa,AC Milan,0.145364,0.227572,0.627064
1,2026-02-14T14:00:00Z,Como,Fiorentina,0.542297,0.251768,0.205935



=== Laliga Spain ===
Total matches: 152
Used betting odds: 21
Used model: 131


Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_final,p_draw_final,p_away_final
0,2026-02-13T20:00:00Z,Elche,Osasuna,0.331665,0.308252,0.360083
1,2026-02-14T13:00:00Z,Espanyol,Celta Vigo,0.388042,0.293391,0.318567



=== Bundesliga Germany ===
Total matches: 118
Used betting odds: 18
Used model: 100


Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_final,p_draw_final,p_away_final
0,2026-02-13T19:30:00Z,Borussia Dortmund,Mainz,0.604747,0.220201,0.175052
1,2026-02-14T14:30:00Z,Werder Bremen,Bayern Munich,0.125752,0.169257,0.70499



=== Ligue1 France ===
Total matches: 117
Used betting odds: 18
Used model: 99


Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_final,p_draw_final,p_away_final
0,2026-02-13T18:00:00Z,Stade Rennais,Paris Saint-Germain,0.156318,0.193416,0.650266
1,2026-02-13T20:05:00Z,AS Monaco,Nantes,0.652237,0.199513,0.14825


In [38]:
# Dictionary to store simulation dataframes for each league
df_simulation_all = {}

# Columns for probability normalisation
prob_cols = ["p_home_final", "p_draw_final", "p_away_final"]

for league in leagues:
    df = df_final_probabilities_all[league].copy()
    
    # Normalise probabilities so each row sums to 1
    df[prob_cols] = df[prob_cols].div(df[prob_cols].sum(axis=1), axis=0)
    
    # Store in dictionary
    df_simulation_all[league] = df
    
    # Preview top 3 rows
    print(f"\n=== {league.replace('_', ' ').title()} ===")
    display(df.tail(3))

    # Number of matches
    print(f"Number of matches to be played:")
    display(len(df))


=== Premierleague England ===


Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_final,p_draw_final,p_away_final
127,2026-05-24T15:00:00Z,Tottenham Hotspur,Everton,0.40408,0.259912,0.336008
128,2026-05-24T15:00:00Z,West Ham United,Leeds United,0.407228,0.225595,0.367177
129,,Manchester City,Crystal Palace,0.644858,0.202868,0.152274


Number of matches to be played:


130


=== Seriea Italy ===


Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_final,p_draw_final,p_away_final
138,2026-05-24T00:00:00Z,Napoli,Udinese,0.680036,0.198017,0.121947
139,2026-05-24T00:00:00Z,Cremonese,Como,0.176671,0.242216,0.581113
140,2026-05-24T00:00:00Z,Lecce,Genoa,0.255853,0.294036,0.450111


Number of matches to be played:


141


=== Laliga Spain ===


Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_final,p_draw_final,p_away_final
149,2026-05-24T00:00:00Z,Valencia,Barcelona,0.11825,0.142567,0.739183
150,2026-05-24T00:00:00Z,Girona,Elche,0.404766,0.227832,0.367402
151,,Rayo Vallecano,Real Oviedo,0.623809,0.250318,0.125873


Number of matches to be played:


152


=== Bundesliga Germany ===


Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_final,p_draw_final,p_away_final
115,2026-05-16T13:30:00Z,Borussia M√∂nchengladbach,TSG Hoffenheim,0.398375,0.219275,0.38235
116,2026-05-16T13:30:00Z,Bayern Munich,FC Cologne,0.888154,0.073197,0.038649
117,2026-05-16T13:30:00Z,Bayer Leverkusen,Hamburg SV,0.674556,0.189251,0.136193


Number of matches to be played:


118


=== Ligue1 France ===


Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_final,p_draw_final,p_away_final
114,2026-05-16T19:00:00Z,Lille,AJ Auxerre,0.648499,0.19951,0.151991
115,2026-05-16T19:00:00Z,Strasbourg,AS Monaco,0.527277,0.209085,0.263638
116,2026-05-16T19:00:00Z,Brest,Angers,0.551229,0.224203,0.224567


Number of matches to be played:


117

## 10. Run simulations to build the Premier League table probabilities

In [39]:
def simulate_once(fixtures, table):
    """
    Simulate the remaining fixtures once and return the updated league table with positions.
    Assumes:
    - table has columns: 'team', 'pts', 'gd'
    - fixtures has columns: 'homeTeam', 'awayTeam', 'p_home_final', 'p_draw_final', 'p_away_final'
    """
    table_sim = table.copy()

    # Initialize points dict
    points = dict(zip(table_sim["team"], table_sim["pts"]))

    for _, row in fixtures.iterrows():
        home = row["homeTeam"]
        away = row["awayTeam"]

        # Outcome probabilities
        probs = [row["p_home_final"], row["p_draw_final"], row["p_away_final"]]
        outcome = np.random.choice(["H", "D", "A"], p=probs)

        if outcome == "H":
            points[home] += 3
        elif outcome == "D":
            points[home] += 1
            points[away] += 1
        else:
            points[away] += 3

    # Update table
    result_df = table_sim.copy()
    result_df["pts"] = result_df["team"].map(points)

    # Sort by points and goal difference
    result_df = result_df.sort_values(["pts", "gd"], ascending=[False, False])
    result_df["position"] = np.arange(1, len(result_df)+1)

    return result_df


def run_simulations(fixtures, table, n_sim=10000):
    """
    Run multiple simulations and return a DataFrame of position counts per team.
    """
    position_counts = {team: np.zeros(len(table)) for team in table["team"]}

    for _ in range(n_sim):
        final_table = simulate_once(fixtures, table)
        for _, row in final_table.iterrows():
            position_counts[row["team"]][row["position"]-1] += 1

    pos_df = pd.DataFrame(position_counts, index=np.arange(1, len(table)+1))
    pos_df.index.name = "position"
    return pos_df


In [40]:
n_sim = 10000  # total simulations

position_distribution_all = {}
position_distribution_pct_all = {}

for league in leagues:
    print(f"\n=== {league.replace('_', ' ').title()} ===")
    
    fixtures = df_simulation_all[league].copy()  # simulated/future fixtures
    table = globals()[league].copy()             # league table

    # Count positions for each team
    position_counts = {team: np.zeros(len(table)) for team in table["team"]}
    
    # Run simulations
    for i in range(n_sim):
        final_table = simulate_once(fixtures, table)  # make sure simulate_once uses 'team', 'homeTeam', 'awayTeam'
        for _, row in final_table.iterrows():
            position_counts[row["team"]][row["position"]-1] += 1
        
        # Optional progress print
        if (i+1) % 1000 == 0:
            print(f"{i+1}/{n_sim} simulations done...")
    
    pos_df = pd.DataFrame(position_counts, index=np.arange(1, len(table)+1))
    # pos_df.index.name = "position"
    pos_df_t = pos_df.T
    pos_df_pct = pos_df_t.div(pos_df_t.sum(axis=1), axis=0) * 100
    
    position_distribution_all[league] = pos_df
    position_distribution_pct_all[league] = pos_df_pct
    
    print(f"Finished simulations for {league} ‚úÖ")


=== Premierleague England ===
1000/10000 simulations done...
2000/10000 simulations done...
3000/10000 simulations done...
4000/10000 simulations done...
5000/10000 simulations done...
6000/10000 simulations done...
7000/10000 simulations done...
8000/10000 simulations done...
9000/10000 simulations done...
10000/10000 simulations done...
Finished simulations for premierleague_england ‚úÖ

=== Seriea Italy ===
1000/10000 simulations done...
2000/10000 simulations done...
3000/10000 simulations done...
4000/10000 simulations done...
5000/10000 simulations done...
6000/10000 simulations done...
7000/10000 simulations done...
8000/10000 simulations done...
9000/10000 simulations done...
10000/10000 simulations done...
Finished simulations for seriea_italy ‚úÖ

=== Laliga Spain ===
1000/10000 simulations done...
2000/10000 simulations done...
3000/10000 simulations done...
4000/10000 simulations done...
5000/10000 simulations done...
6000/10000 simulations done...
7000/10000 simulations d

## 11. Preview and present the results graphically

In [41]:
## Multi indexes

position_distribution_pct_all_labeled = {}

for league in leagues:
    table = globals()[league].copy()  # league table with 'team', 'position', 'gp', 'pts'
    pos_pct = position_distribution_pct_all[league].copy()  # simulation % table

    # Build metadata aligned with pos_pct
    meta = (
        table[["team", "position", "gp", "pts"]]
        .set_index("team")
        .rename(columns={
            "position": "POS",
            "gp": "GP",
            "pts": "PTS"
        })
    )

    # Align order with pos_pct
    meta = meta.loc[pos_pct.index]

    # Create MultiIndex: POS ‚Üí TEAM ‚Üí GP ‚Üí PTS
    pos_pct.index = pd.MultiIndex.from_arrays(
        [
            meta["POS"].astype(int),
            meta.index,              # team names
            meta["GP"].astype(int),
            meta["PTS"].astype(int)
        ],
        names=["POS", "TEAM", "GP", "PTS"]
    )

    # Drop position column if it exists (usually not needed now)
    pos_pct = pos_pct.drop(columns=["position"], errors="ignore")

    # Save labeled version
    position_distribution_pct_all_labeled[league] = pos_pct

    print(f"Labeled MultiIndex created for {league} ‚úÖ")

Labeled MultiIndex created for premierleague_england ‚úÖ
Labeled MultiIndex created for seriea_italy ‚úÖ
Labeled MultiIndex created for laliga_spain ‚úÖ
Labeled MultiIndex created for bundesliga_germany ‚úÖ
Labeled MultiIndex created for ligue1_france ‚úÖ


In [73]:
# Soft green colormap
greens = plt.cm.Greens
green_cmap = LinearSegmentedColormap.from_list(
    "Greens_soft",
    greens(np.linspace(0.05, 0.65, 256))
)

def zero_style(val):
    if val < 1:
        return "background-color: white !important;"
    return ""

# Set midpoint and max for visual scaling
mid_pct = 0.14   # e.g., 14% is mid-green
max_pct = 0.75    # e.g., 90% and above is full green

def color_scale(val, mid=mid_pct, max_val=max_pct):
    """
    Scale values 0‚Äìmax_val so that:
    - small values ‚Üí very light
    - mid values (mid_pct) ‚Üí mid-green
    - >= max_val ‚Üí full green
    """
    if val >= max_val:
        return 1.0
    elif val <= mid:
        return val / mid * 0.5
    else:
        return 0.5 + (val - mid) / (max_val - mid) * 0.5

styled_position_pct_all = {}

for league in leagues:
    pos_pct = position_distribution_pct_all_labeled[league].copy()
    
    if isinstance(pos_pct.index, pd.MultiIndex):
        pos_pct = pos_pct.reset_index()

    display_df = pos_pct.copy()
    display_df.index = range(len(display_df))
    display_df = display_df.reset_index(drop=True)

    text_cols = ["POS", "TEAM", "GP", "PTS"]
    num_cols = display_df.columns.difference(text_cols)

    vmax = max(display_df[num_cols].max().max(), 1)

    # Apply custom scaling to numeric columns with cap
    color_data = display_df[num_cols].divide(vmax).apply(lambda s: s.map(color_scale)) * vmax

    styled = (
        display_df.style
        # Gradient on numeric columns
        .background_gradient(cmap=green_cmap, vmin=0, vmax=vmax, gmap=color_data, axis=None)
        .map(zero_style, subset=num_cols)
        .format({col: "{:.2f}%" for col in num_cols})
        # Text columns formatting
        .set_properties(subset=["POS", "GP", "PTS"], **{
            "text-align": "center",
            "font-family": "Inter, Roboto, Arial, sans-serif",
            "font-size": "12px",
            "font-weight": "600",
            "color": "#000",
            "white-space": "nowrap"
        })
        .set_properties(subset=["TEAM"], **{
            "text-align": "left",
            "font-family": "Inter, Roboto, Arial, sans-serif",
            "font-size": "12px",
            "font-weight": "600",
            "color": "#000",
            "white-space": "nowrap"
        })
        # Numeric columns formatting
        .set_properties(subset=num_cols, **{
            "text-align": "center",
            "font-family": "Inter, Roboto, Arial, sans-serif",
            "font-size": "12px",
            "font-weight": "500",
            "color": "#000"
        })
        .hide(axis="index")
        # Table headers, row height, borders, zebra striping
        .set_table_styles([
            {"selector": "th", "props": [
                ("background-color", "#e6edf4"),
                ("color", "#333"),
                ("text-align", "center"),
                ("font-family", "Inter, Roboto, Arial, sans-serif"),
                ("font-size", "13px"),
                ("font-weight", "600")
            ]},
            {"selector": "tr", "props": [("height", "25px")]},
            {"selector": "th:nth-child(4), td:nth-child(4)", "props": [
                ("border-right", "2px solid #999")
            ]},
            {"selector": "td:nth-child(-n+4)", "props": [
                ("border-bottom", "1px solid #ccc")
            ]},
            {"selector": "tr:nth-child(odd) td:nth-child(-n+4)", "props": [
                ("background-color", "#f9f9f9")
            ]},
            {"selector": "tr:nth-child(even) td:nth-child(-n+4)", "props": [
                ("background-color", "#f2f2f2")
            ]}
        ])
    )

    styled_position_pct_all[league] = styled
    print(f"Styled table ready for {league} ‚úÖ")

Styled table ready for premierleague_england ‚úÖ
Styled table ready for seriea_italy ‚úÖ
Styled table ready for laliga_spain ‚úÖ
Styled table ready for bundesliga_germany ‚úÖ
Styled table ready for ligue1_france ‚úÖ


In [74]:
# Display all 5 leagues
for league in leagues:
    print(f"\n=== {league.replace('_',' ').title()} ===")
    display(styled_position_pct_all[league])


=== Premierleague England ===


POS,TEAM,GP,PTS,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
1,Arsenal,25,56,89.90%,9.60%,0.49%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
2,Manchester City,25,50,9.77%,74.00%,12.78%,2.62%,0.67%,0.13%,0.02%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
3,Aston Villa,25,47,0.26%,10.52%,40.54%,24.57%,13.70%,7.05%,2.49%,0.63%,0.15%,0.05%,0.03%,0.00%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
4,Manchester United,25,44,0.00%,0.92%,10.21%,16.76%,22.63%,22.48%,14.17%,6.69%,3.38%,1.67%,0.67%,0.28%,0.07%,0.04%,0.03%,0.00%,0.00%,0.00%,0.00%,0.00%
5,Chelsea,25,43,0.07%,3.61%,22.35%,27.38%,21.53%,13.72%,6.92%,2.76%,1.02%,0.39%,0.16%,0.04%,0.05%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
6,Liverpool,25,39,0.00%,1.23%,11.51%,21.04%,23.77%,20.99%,12.07%,4.96%,2.40%,1.13%,0.46%,0.28%,0.09%,0.04%,0.02%,0.01%,0.00%,0.00%,0.00%,0.00%
7,Brentford,25,39,0.00%,0.12%,1.81%,6.31%,12.20%,19.70%,26.38%,14.45%,8.03%,4.67%,2.90%,1.79%,0.98%,0.40%,0.21%,0.05%,0.00%,0.00%,0.00%,0.00%
8,Everton,25,37,0.00%,0.00%,0.09%,0.48%,1.63%,4.33%,9.54%,15.64%,16.22%,13.53%,12.41%,9.27%,7.28%,4.88%,2.76%,1.58%,0.34%,0.02%,0.00%,0.00%
9,Sunderland,25,36,0.00%,0.00%,0.10%,0.27%,1.16%,2.80%,6.06%,11.76%,13.02%,13.46%,12.16%,11.97%,9.78%,7.92%,5.33%,3.15%,1.00%,0.06%,0.00%,0.00%
10,Fulham,25,34,0.00%,0.00%,0.01%,0.14%,0.48%,1.45%,4.13%,8.13%,10.83%,12.19%,13.01%,13.21%,12.42%,10.50%,7.54%,3.95%,1.80%,0.21%,0.00%,0.00%



=== Seriea Italy ===


POS,TEAM,GP,PTS,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
1,Internazionale,24,58,92.07%,6.55%,1.16%,0.19%,0.03%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
2,AC Milan,23,50,5.40%,48.19%,25.87%,12.69%,5.64%,1.80%,0.40%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
3,Napoli,24,49,2.02%,29.84%,33.11%,20.45%,10.00%,3.80%,0.78%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
4,Juventus,24,46,0.33%,9.05%,20.24%,30.18%,24.53%,11.54%,4.04%,0.09%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
5,AS Roma,24,46,0.15%,5.21%,14.48%,24.13%,30.32%,17.69%,7.60%,0.41%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
6,Como,23,41,0.02%,0.94%,4.00%,8.59%,19.17%,36.14%,28.87%,2.10%,0.16%,0.00%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
7,Atalanta,24,39,0.01%,0.22%,1.14%,3.76%,10.23%,27.99%,50.49%,5.35%,0.71%,0.07%,0.01%,0.01%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
8,Lazio,24,33,0.00%,0.00%,0.00%,0.01%,0.07%,0.84%,5.50%,53.59%,23.02%,9.57%,4.41%,1.79%,0.70%,0.28%,0.17%,0.05%,0.00%,0.00%,0.00%,0.00%
9,Udinese,24,32,0.00%,0.00%,0.00%,0.00%,0.00%,0.02%,0.36%,6.39%,14.93%,21.08%,19.68%,14.77%,10.74%,6.29%,3.60%,1.63%,0.48%,0.03%,0.00%,0.00%
10,Bologna,24,30,0.00%,0.00%,0.00%,0.00%,0.01%,0.17%,1.60%,22.76%,34.61%,19.29%,10.59%,5.79%,2.84%,1.38%,0.72%,0.18%,0.05%,0.01%,0.00%,0.00%



=== Laliga Spain ===


POS,TEAM,GP,PTS,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
1,Barcelona,23,58,73.10%,26.37%,0.52%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
2,Real Madrid,23,57,26.77%,71.13%,2.03%,0.07%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
3,Atl√©tico Madrid,23,45,0.05%,1.37%,53.89%,42.45%,2.18%,0.05%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
4,Villarreal,22,45,0.08%,1.13%,43.12%,52.40%,3.18%,0.08%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
5,Real Betis,23,38,0.00%,0.00%,0.44%,4.75%,77.76%,11.69%,3.24%,1.36%,0.51%,0.17%,0.06%,0.02%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
6,Espanyol,23,34,0.00%,0.00%,0.00%,0.04%,3.20%,16.63%,19.21%,18.74%,15.96%,10.84%,6.76%,3.74%,2.31%,1.27%,0.69%,0.37%,0.21%,0.03%,0.00%,0.00%
7,Celta Vigo,23,33,0.00%,0.00%,0.00%,0.20%,8.23%,36.24%,22.76%,14.13%,8.29%,5.12%,2.45%,1.24%,0.71%,0.32%,0.17%,0.06%,0.06%,0.02%,0.00%,0.00%
8,Real Sociedad,23,31,0.00%,0.00%,0.00%,0.05%,2.52%,14.34%,20.18%,18.27%,15.97%,10.39%,7.41%,4.19%,2.90%,1.72%,0.94%,0.54%,0.43%,0.14%,0.01%,0.00%
9,Osasuna,23,29,0.00%,0.00%,0.00%,0.00%,1.27%,8.80%,13.36%,16.01%,15.47%,13.98%,10.35%,6.77%,5.31%,3.46%,2.06%,1.58%,0.81%,0.59%,0.17%,0.01%
10,Athletic Club,23,28,0.00%,0.00%,0.00%,0.03%,1.36%,9.39%,13.64%,15.97%,16.11%,14.04%,10.42%,7.01%,4.40%,3.20%,1.94%,1.25%,0.74%,0.34%,0.16%,0.00%



=== Bundesliga Germany ===


POS,TEAM,GP,PTS,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
1,Bayern Munich,21,54,99.35%,0.64%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
2,Borussia Dortmund,21,48,0.61%,84.63%,10.63%,2.91%,0.94%,0.26%,0.02%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
3,TSG Hoffenheim,21,42,0.01%,6.16%,33.30%,25.99%,19.43%,13.78%,1.25%,0.07%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
4,RB Leipzig,21,39,0.01%,2.93%,20.83%,24.66%,26.06%,22.27%,2.74%,0.43%,0.04%,0.02%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
5,VfB Stuttgart,21,39,0.00%,2.06%,13.68%,20.28%,26.64%,32.88%,3.97%,0.45%,0.04%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
6,Bayer Leverkusen,20,36,0.02%,3.58%,21.50%,25.83%,25.03%,21.49%,2.23%,0.29%,0.02%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
7,SC Freiburg,21,30,0.00%,0.00%,0.03%,0.11%,0.61%,3.27%,35.24%,34.86%,14.13%,6.03%,3.05%,1.64%,0.79%,0.16%,0.06%,0.02%,0.00%,0.00%
8,Eintracht Frankfurt,21,28,0.00%,0.00%,0.02%,0.22%,1.26%,5.65%,44.23%,29.15%,10.96%,4.41%,2.22%,1.09%,0.48%,0.23%,0.04%,0.03%,0.01%,0.00%
9,1. FC Union Berlin,21,25,0.00%,0.00%,0.00%,0.00%,0.02%,0.17%,3.80%,11.02%,20.91%,17.86%,14.99%,10.97%,8.27%,5.95%,3.66%,1.88%,0.48%,0.02%
10,FC Cologne,21,23,0.00%,0.00%,0.00%,0.00%,0.01%,0.08%,2.15%,7.20%,14.64%,16.69%,14.92%,13.12%,11.50%,8.68%,6.02%,3.49%,1.39%,0.11%



=== Ligue1 France ===


POS,TEAM,GP,PTS,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
1,Paris Saint-Germain,21,51,96.95%,2.91%,0.13%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
2,Lens,21,49,2.84%,71.37%,19.44%,5.95%,0.38%,0.02%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
3,Lyon,21,42,0.16%,13.66%,36.28%,43.80%,5.06%,0.82%,0.19%,0.03%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
4,Marseille,21,39,0.05%,11.98%,42.34%,38.43%,5.92%,1.02%,0.19%,0.05%,0.02%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
5,Lille,21,33,0.00%,0.08%,1.34%,7.66%,44.58%,22.87%,12.31%,5.86%,2.85%,1.54%,0.64%,0.18%,0.09%,0.00%,0.00%,0.00%,0.00%,0.00%
6,Stade Rennais,21,31,0.00%,0.00%,0.08%,0.65%,6.46%,14.39%,19.68%,20.18%,16.21%,10.89%,5.93%,3.32%,1.73%,0.41%,0.07%,0.00%,0.00%,0.00%
7,Strasbourg,21,30,0.00%,0.00%,0.33%,2.48%,22.21%,27.66%,18.35%,11.45%,7.82%,4.71%,2.98%,1.34%,0.53%,0.14%,0.00%,0.00%,0.00%,0.00%
8,Toulouse,21,30,0.00%,0.00%,0.04%,0.58%,8.94%,14.81%,19.12%,17.39%,15.18%,10.89%,6.95%,3.69%,1.86%,0.43%,0.12%,0.00%,0.00%,0.00%
9,Angers,21,29,0.00%,0.00%,0.00%,0.01%,0.25%,1.21%,2.68%,4.90%,8.76%,13.47%,16.89%,19.58%,17.78%,10.48%,3.42%,0.52%,0.05%,0.00%
10,AS Monaco,21,28,0.00%,0.00%,0.02%,0.34%,4.80%,11.84%,16.75%,19.88%,17.55%,11.90%,8.36%,4.77%,2.58%,1.00%,0.20%,0.01%,0.00%,0.00%


In [75]:
today = date.today().strftime("%Y-%m-%d")

os.makedirs("league_tables", exist_ok=True)

for league in leagues:
    filename = f"league_tables/{league}_{today}.jpg"
    dfi.export(
        styled_position_pct_all[league],
        filename,
        table_conversion="chrome"
    )
    print(f"Saved: {filename}")

Saved: league_tables/premierleague_england_2026-02-09.jpg
Saved: league_tables/seriea_italy_2026-02-09.jpg
Saved: league_tables/laliga_spain_2026-02-09.jpg
Saved: league_tables/bundesliga_germany_2026-02-09.jpg
Saved: league_tables/ligue1_france_2026-02-09.jpg
