## ðŸ“ˆ Predicting Premier League Final Positions Using Betting Odds & Simulation

**Competition:** English Premier League 2025/26  
**Purpose:** Estimate probabilities of final league positions using betting market information and simulation  
**Methods:** Odds-implied probabilities, Monte Carlo simulation, scenario analysis  
**Author:** [Victoria Friss de Kereki](https://www.linkedin.com/in/victoria-friss-de-kereki/)  

---

**Notebook first written:** `17/01/2026`  
**Last updated:** `17/01/2026`  

> This notebook develops a probabilistic framework to predict final Premier League final positions using betting odds as market-based expectations.
>
> Betting odds are transformed into implied probabilities and adjusted for bookmaker margin. These probabilities are then used to simulate the remainder of the season via Monte Carlo methods, generating distributions over final points totals and league positions.
>
> The analysis focuses on estimating the likelihood of key outcomes such as title wins, top-four finishes, relegation, and mid-table placements. Results are presented at team level with uncertainty intervals, and the framework can be extended to incorporate form, fixture difficulty, or alternative predictive inputs beyond betting markets.


In [54]:
import os
from datetime import datetime, timedelta
import numpy as np
import pandas as pd
import requests
from scipy.stats import poisson
from dotenv import load_dotenv
import soccerdata as sd
import matplotlib.pyplot as plt

## 1. Premier League Final Standings (ESPN Scraping)
##### Using the ESPN scraper I built in my previous project.

In [2]:
year = 2025  # current Premier League season start year

url = f"https://www.espn.com/soccer/standings/_/league/ENG.1/season/{year}"
tables = pd.read_html(url)

teams_raw = tables[0]
stats = tables[1]

teams = pd.DataFrame()
teams["position"] = teams_raw.iloc[:, 0].str.extract(r"^(\d+)").astype(int)
teams["team"] = (
    teams_raw.iloc[:, 0]
    .str.replace(r"^\d+", "", regex=True)
    .str.replace(r"^[A-Z]{2,3}", "", regex=True)
    .str.strip()
)

stats.columns = ["gp", "w", "d", "l", "gf", "ga", "gd", "pts"]
stats = stats.apply(lambda c: c.astype(str)
                              .str.replace("+", "", regex=False)
                              .astype(int))

premierleague = pd.concat([teams, stats], axis=1)
# premierleague["season"] = f"{year}-{year+1}"

premierleague


Unnamed: 0,position,team,gp,w,d,l,gf,ga,gd,pts
0,1,Arsenal,22,15,5,2,40,14,26,50
1,2,Manchester City,22,13,4,5,45,21,24,43
2,3,Aston Villa,22,13,4,5,33,25,8,43
3,4,Liverpool,22,10,6,6,33,29,4,36
4,5,Manchester United,22,9,8,5,38,32,6,35
5,6,Chelsea,22,9,7,6,36,24,12,34
6,7,Brentford,22,10,3,9,35,30,5,33
7,8,Newcastle United,22,9,6,7,32,27,5,33
8,9,Sunderland,22,8,9,5,23,23,0,33
9,10,Everton,22,9,5,8,24,25,-1,32


## 2. Get betting odds using API

In [3]:
# Load variables from API_KEY.env
load_dotenv("API_KEY.env")

API_KEY = os.getenv("ODDS_DATA_API_KEY")

if API_KEY is None:
    raise ValueError("API_KEY not found. Check API_KEY.env")

print("API key loaded successfully")

API key loaded successfully


In [4]:
url = "https://api.the-odds-api.com/v4/sports/soccer_epl/odds"

params = {
    "apiKey": API_KEY,
    "regions": "uk",
    "markets": "h2h",
    "oddsFormat": "decimal",
    "dateFormat": "iso",
    "days": 365  # get all upcoming matches for the next year
}

response = requests.get(url, params=params)
response.raise_for_status()

odds_data = response.json()
print("Total upcoming matches:", len(odds_data))

Total upcoming matches: 20


In [5]:
def flatten_odds(data):
    rows = []

    for match in data:
        match_id = match["id"]
        home = match["home_team"]
        away = match["away_team"]
        time = match["commence_time"]

        for book in match["bookmakers"]:
            bookmaker = book["title"]

            # Find head-to-head (h2h) market. Find the market where key == 'h2h' (win/draw/win odds). If not found, skip this bookmaker.
            h2h = next((m for m in book["markets"] if m["key"] == "h2h"), None)
            if not h2h:
                continue

            outcomes = {o["name"]: o["price"] for o in h2h["outcomes"]}

            rows.append({
                "match_id": match_id,
                "commence_time": time,
                "home_team": home,
                "away_team": away,
                "bookmaker": bookmaker,
                "home_odds": outcomes.get(home),
                "draw_odds": outcomes.get("Draw"),
                "away_odds": outcomes.get(away),
            })

    return pd.DataFrame(rows)

df = flatten_odds(odds_data)
df.head()

Unnamed: 0,match_id,commence_time,home_team,away_team,bookmaker,home_odds,draw_odds,away_odds
0,a396966205b21287109ea8cae42e44c1,2026-01-24T12:30:00Z,West Ham United,Sunderland,Unibet (UK),2.45,3.3,2.88
1,a396966205b21287109ea8cae42e44c1,2026-01-24T12:30:00Z,West Ham United,Sunderland,Paddy Power,2.38,3.2,3.0
2,a396966205b21287109ea8cae42e44c1,2026-01-24T12:30:00Z,West Ham United,Sunderland,Betway,2.5,3.2,2.8
3,a396966205b21287109ea8cae42e44c1,2026-01-24T12:30:00Z,West Ham United,Sunderland,Smarkets,2.56,3.35,2.98
4,a396966205b21287109ea8cae42e44c1,2026-01-24T12:30:00Z,West Ham United,Sunderland,Sky Bet,2.38,3.1,3.0


In [6]:
betting_odds_avg = (
    df.groupby(["match_id", "home_team", "away_team"])
      .agg({
          "home_odds": "mean",
          "draw_odds": "mean",
          "away_odds": "mean"
      })
      .reset_index()
)

betting_odds_avg.head()

Unnamed: 0,match_id,home_team,away_team,home_odds,draw_odds,away_odds
0,1ca6d3d9cde3e58a39211feb9188530c,Newcastle United,Aston Villa,1.953333,3.666667,3.563889
1,1e811fa7ead0a3e6ef920b15b2bbb95d,Burnley,Tottenham Hotspur,3.683333,3.386111,2.021667
2,36820753efb36739a83c6e5e440827b2,Brighton and Hove Albion,Everton,1.808571,3.607143,4.059286
3,38a3cb5e295f55e274d589fc646cf2dd,Tottenham Hotspur,Manchester City,4.411429,3.946429,1.69
4,4d6c0ac53b2b38aeb5bc8c8163f3f8b9,Crystal Palace,Chelsea,3.455556,3.622222,2.015556


In [7]:
# Convert odds -> raw probabilities
betting_odds_avg["p_home_raw"] = 1 / betting_odds_avg["home_odds"]
betting_odds_avg["p_draw_raw"] = 1 / betting_odds_avg["draw_odds"]
betting_odds_avg["p_away_raw"] = 1 / betting_odds_avg["away_odds"]

#2) Normalize (remove bookmaker margin)
betting_odds_avg["total_raw"] = (
    betting_odds_avg["p_home_raw"] +
    betting_odds_avg["p_draw_raw"] +
    betting_odds_avg["p_away_raw"]
)

betting_odds_avg["p_home_book"] = betting_odds_avg["p_home_raw"] / betting_odds_avg["total_raw"]
betting_odds_avg["p_draw_book"] = betting_odds_avg["p_draw_raw"] / betting_odds_avg["total_raw"]
betting_odds_avg["p_away_book"] = betting_odds_avg["p_away_raw"] / betting_odds_avg["total_raw"]

In [8]:
# Keep only useful columns
betting_odds_avg = betting_odds_avg[[
#match_id",
    "home_team",
    "away_team",
    "p_home_book",
    "p_draw_book",
    "p_away_book"
]]

betting_odds_avg.head()

Unnamed: 0,home_team,away_team,p_home_book,p_draw_book,p_away_book
0,Newcastle United,Aston Villa,0.48058,0.256018,0.263401
1,Burnley,Tottenham Hotspur,0.255774,0.278225,0.466002
2,Brighton and Hove Albion,Everton,0.51363,0.257527,0.228843
3,Tottenham Hotspur,Manchester City,0.2115,0.23642,0.55208
4,Crystal Palace,Chelsea,0.272596,0.260053,0.467351


## 3. Get fixtures for upcoming EPL games

In [9]:
# Load variables from API_KEY.env
load_dotenv("API_KEY.env")

API_KEY = os.getenv("FOOTBALL_DATA_API_KEY")

if API_KEY is None:
    raise ValueError("API_KEY not found. Check API_KEY.env")

print("API key loaded successfully")

API key loaded successfully


In [10]:
url = "https://api.football-data.org/v4/competitions/PL/matches"

headers = {
    "X-Auth-Token": API_KEY
}

today = datetime.utcnow().date()
end_of_season = today + timedelta(days=365)  # big range to cover all remaining games

params = {
    "status": "SCHEDULED",
    "dateFrom": today.isoformat(),
    "dateTo": end_of_season.isoformat()
}

response = requests.get(url, headers=headers, params=params)
response.raise_for_status()

data = response.json()
fixtures = data["matches"]

df_fixtures = pd.DataFrame(fixtures)

df_fixtures_clean = df_fixtures[[
    "utcDate",
    "status",
    "homeTeam",
    "awayTeam"
]]

df_fixtures_clean.head()
print("Total scheduled matches:", len(df_fixtures_clean))


Total scheduled matches: 160


In [11]:
df_fixtures_clean["homeTeam"] = df_fixtures_clean["homeTeam"].apply(lambda x: x["name"])
df_fixtures_clean["awayTeam"] = df_fixtures_clean["awayTeam"].apply(lambda x: x["name"])

In [12]:
df_fixtures_clean

Unnamed: 0,utcDate,status,homeTeam,awayTeam
0,2026-01-24T12:30:00Z,TIMED,West Ham United FC,Sunderland AFC
1,2026-01-24T15:00:00Z,TIMED,Burnley FC,Tottenham Hotspur FC
2,2026-01-24T15:00:00Z,TIMED,Fulham FC,Brighton & Hove Albion FC
3,2026-01-24T15:00:00Z,TIMED,Manchester City FC,Wolverhampton Wanderers FC
4,2026-01-24T17:30:00Z,TIMED,AFC Bournemouth,Liverpool FC
...,...,...,...,...
155,2026-05-24T15:00:00Z,TIMED,Liverpool FC,Brentford FC
156,2026-05-24T15:00:00Z,TIMED,Manchester City FC,Aston Villa FC
157,2026-05-24T15:00:00Z,TIMED,Nottingham Forest FC,AFC Bournemouth
158,2026-05-24T15:00:00Z,TIMED,Tottenham Hotspur FC,Everton FC


## 4. Get this season (2025/26) and last season (2024/25) results

In [13]:
url = "https://api.football-data.org/v4/competitions/PL/matches"
params = {
    "season": 2025,   # season year
    "status": "FINISHED"
}

response = requests.get(url, headers=headers, params=params)
response.raise_for_status()
past_matches = response.json()["matches"]

In [14]:
clean_rows = []

for m in past_matches:
    row = {
        "utcDate": m["utcDate"],
        "matchday": m["matchday"],
        "status": m["status"],
        "homeTeam": m["homeTeam"]["name"],
        "awayTeam": m["awayTeam"]["name"],
        "homeGoals": m["score"]["fullTime"]["home"],
        "awayGoals": m["score"]["fullTime"]["away"],
        "winner": m["score"]["winner"]
    }
    clean_rows.append(row)

past_matches_25_clean = pd.DataFrame(clean_rows)
past_matches_25_clean.tail()

Unnamed: 0,utcDate,matchday,status,homeTeam,awayTeam,homeGoals,awayGoals,winner
215,2026-01-17T15:00:00Z,22,FINISHED,Tottenham Hotspur FC,West Ham United FC,1,2,AWAY_TEAM
216,2026-01-17T17:30:00Z,22,FINISHED,Nottingham Forest FC,Arsenal FC,0,0,DRAW
217,2026-01-18T14:00:00Z,22,FINISHED,Wolverhampton Wanderers FC,Newcastle United FC,0,0,DRAW
218,2026-01-18T16:30:00Z,22,FINISHED,Aston Villa FC,Everton FC,0,1,AWAY_TEAM
219,2026-01-19T20:00:00Z,22,FINISHED,Brighton & Hove Albion FC,AFC Bournemouth,1,1,DRAW


In [15]:
url = "https://api.football-data.org/v4/competitions/PL/matches"
params = {
    "season": 2024,   # season year
    "status": "FINISHED"
}

response = requests.get(url, headers=headers, params=params)
response.raise_for_status()
past_matches_24 = response.json()["matches"]

In [16]:
clean_rows = []

for m in past_matches_24:
    row = {
        "utcDate": m["utcDate"],
        "matchday": m["matchday"],
        "status": m["status"],
        "homeTeam": m["homeTeam"]["name"],
        "awayTeam": m["awayTeam"]["name"],
        "homeGoals": m["score"]["fullTime"]["home"],
        "awayGoals": m["score"]["fullTime"]["away"],
        "winner": m["score"]["winner"]
    }
    clean_rows.append(row)

past_matches_24_clean = pd.DataFrame(clean_rows)
past_matches_24_clean.head()

Unnamed: 0,utcDate,matchday,status,homeTeam,awayTeam,homeGoals,awayGoals,winner
0,2024-08-16T19:00:00Z,1,FINISHED,Manchester United FC,Fulham FC,1,0,HOME_TEAM
1,2024-08-17T11:30:00Z,1,FINISHED,Ipswich Town FC,Liverpool FC,0,2,AWAY_TEAM
2,2024-08-17T14:00:00Z,1,FINISHED,Arsenal FC,Wolverhampton Wanderers FC,2,0,HOME_TEAM
3,2024-08-17T14:00:00Z,1,FINISHED,Everton FC,Brighton & Hove Albion FC,0,3,AWAY_TEAM
4,2024-08-17T14:00:00Z,1,FINISHED,Newcastle United FC,Southampton FC,1,0,HOME_TEAM


## 5. Combine and calculate probabilities of W/D/L for each match

In [17]:
# Load Dataframes
df_current = past_matches_25_clean
df_prev = past_matches_24_clean
df_future = df_fixtures_clean

# Combine all past fixtures together
df_all = pd.concat([df_prev, df_current], ignore_index=True)

In [18]:
# Add weights: more recent games = more weight
df_all["date"] = pd.to_datetime(df_all["utcDate"])
df_all["weight"] = np.linspace(1, 2, len(df_all))  # simple linear weighting

In [19]:
df_all.tail()

Unnamed: 0,utcDate,matchday,status,homeTeam,awayTeam,homeGoals,awayGoals,winner,date,weight
595,2026-01-17T15:00:00Z,22,FINISHED,Tottenham Hotspur FC,West Ham United FC,1,2,AWAY_TEAM,2026-01-17 15:00:00+00:00,1.993322
596,2026-01-17T17:30:00Z,22,FINISHED,Nottingham Forest FC,Arsenal FC,0,0,DRAW,2026-01-17 17:30:00+00:00,1.994992
597,2026-01-18T14:00:00Z,22,FINISHED,Wolverhampton Wanderers FC,Newcastle United FC,0,0,DRAW,2026-01-18 14:00:00+00:00,1.996661
598,2026-01-18T16:30:00Z,22,FINISHED,Aston Villa FC,Everton FC,0,1,AWAY_TEAM,2026-01-18 16:30:00+00:00,1.998331
599,2026-01-19T20:00:00Z,22,FINISHED,Brighton & Hove Albion FC,AFC Bournemouth,1,1,DRAW,2026-01-19 20:00:00+00:00,2.0


In [20]:
# Compute home advantage
# Home advantage = average home goals - average away goals
home_avg = df_all["homeGoals"].mean()
away_avg = df_all["awayGoals"].mean()
home_advantage = home_avg - away_avg
home_advantage

0.19333333333333336

In [21]:
#  Calculate attack & defense strengths
teams = pd.unique(df_all[["homeTeam", "awayTeam"]].values.ravel("K"))

attack = pd.Series(1.0, index=teams)
defense = pd.Series(1.0, index=teams)

# Initialize with goals per match
team_stats = {}

for team in teams:
    home_games = df_all[df_all["homeTeam"] == team]
    away_games = df_all[df_all["awayTeam"] == team]

    goals_scored = (home_games["homeGoals"] * home_games["weight"]).sum() + \
                   (away_games["awayGoals"] * away_games["weight"]).sum()

    goals_against = (home_games["awayGoals"] * home_games["weight"]).sum() + \
                    (away_games["homeGoals"] * away_games["weight"]).sum()

    matches = home_games["weight"].sum() + away_games["weight"].sum()

    team_stats[team] = {
        "scored": goals_scored / matches,
        "against": goals_against / matches
    }

# Strengths = relative to league average
league_avg_scored = df_all["homeGoals"].mean() + df_all["awayGoals"].mean()
league_avg_scored /= 2

for team in teams:
    attack[team] = team_stats[team]["scored"] / league_avg_scored
    defense[team] = team_stats[team]["against"] / league_avg_scored

ðŸ”¥ Summary

This function:
+ Calculates expected goals for each team
+ Uses Poisson distribution to compute goal probabilities
+ Converts score probabilities into match outcome probabilities
+ Returns probabilities for:
++ home win
++ draw
++ away win

The Poisson distribution models the number of goals a team scores in a match based on an expected goal rate (Î»). Using the formula \(P(X=k)=e^{-\lambda}\lambda^k/k!\), it calculates the probability of scoring 0, 1, 2, â€¦ goals, where Î» is estimated from team attack/defense strengths and league averages. In the model, I compute separate Poisson probabilities for home and away goals, then combine them to get the probabilities of each possible scoreline and therefore the probabilities of a home win, draw, or away win.


In [22]:
# Calculate probabilities for each future match

def match_probabilities(home, away):
    # expected goals
    exp_home = np.exp(np.log(league_avg_scored) + np.log(attack[home]) + np.log(defense[away]) + home_advantage)
    exp_away = np.exp(np.log(league_avg_scored) + np.log(attack[away]) + np.log(defense[home]))

    # compute probabilities up to 6 goals
    max_goals = 6
    p_home = poisson.pmf(range(max_goals + 1), exp_home)
    p_away = poisson.pmf(range(max_goals + 1), exp_away)

    # result probabilities
    p_win = 0
    p_draw = 0
    p_loss = 0

    for i in range(max_goals + 1):
        for j in range(max_goals + 1):
            prob = p_home[i] * p_away[j]
            if i > j:
                p_win += prob
            elif i == j:
                p_draw += prob
            else:
                p_loss += prob

    return p_win, p_draw, p_loss

In [23]:
# Apply to all fixtures

results = []

for _, row in df_future.iterrows():
    home = row["homeTeam"]
    away = row["awayTeam"]

    p_win, p_draw, p_loss = match_probabilities(home, away)

    results.append({
        "utcDate": row["utcDate"],
        "homeTeam": home,
        "awayTeam": away,
        "p_home_win": p_win,
        "p_draw": p_draw,
        "p_away_win": p_loss,
    })

df_odds = pd.DataFrame(results)
df_odds.head()


Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_win,p_draw,p_away_win
0,2026-01-24T12:30:00Z,West Ham United FC,Sunderland AFC,0.300547,0.279822,0.41916
1,2026-01-24T15:00:00Z,Burnley FC,Tottenham Hotspur FC,0.264293,0.215537,0.514841
2,2026-01-24T15:00:00Z,Fulham FC,Brighton & Hove Albion FC,0.405496,0.22847,0.362613
3,2026-01-24T15:00:00Z,Manchester City FC,Wolverhampton Wanderers FC,0.775213,0.122029,0.071776
4,2026-01-24T17:30:00Z,AFC Bournemouth,Liverpool FC,0.304759,0.214773,0.474532


## 6. Compare calculated probabilities to bookmaker ones

In [24]:
unique_bet_home = betting_odds_avg["home_team"].unique()
unique_model_home = df_odds["homeTeam"].unique()

In [25]:
print(unique_bet_home)
print(unique_model_home)

['Newcastle United' 'Burnley' 'Brighton and Hove Albion'
 'Tottenham Hotspur' 'Crystal Palace' 'Sunderland' 'Arsenal' 'Bournemouth'
 'Brentford' 'Liverpool' 'Aston Villa' 'West Ham United' 'Chelsea'
 'Manchester City' 'Wolverhampton Wanderers' 'Nottingham Forest' 'Fulham'
 'Manchester United' 'Leeds United' 'Everton']
['West Ham United FC' 'Burnley FC' 'Fulham FC' 'Manchester City FC'
 'AFC Bournemouth' 'Crystal Palace FC' 'Brentford FC'
 'Newcastle United FC' 'Arsenal FC' 'Everton FC'
 'Brighton & Hove Albion FC' 'Leeds United FC'
 'Wolverhampton Wanderers FC' 'Chelsea FC' 'Liverpool FC' 'Aston Villa FC'
 'Manchester United FC' 'Nottingham Forest FC' 'Tottenham Hotspur FC'
 'Sunderland AFC']


In [26]:
def normalize_team(name):
    name = name.lower()
    name = name.replace(" fc", "")
    name = name.replace(" afc", "")
    name = name.replace("&", "and")
    name = name.replace("afc ", "")   # <--- this removes AFC from start
    name = name.strip()
    return name


In [27]:
df_odds["home_norm"] = df_odds["homeTeam"].apply(normalize_team)
df_odds["away_norm"] = df_odds["awayTeam"].apply(normalize_team)

betting_odds_avg["home_norm"] = betting_odds_avg["home_team"].apply(normalize_team)
betting_odds_avg["away_norm"] = betting_odds_avg["away_team"].apply(normalize_team)


In [28]:
unique_model_norm = df_odds["home_norm"].unique()
unique_bet_norm = betting_odds_avg["home_norm"].unique()

set(unique_model_norm) == set(unique_bet_norm)

True

In [29]:
df_compare = df_odds.merge(
    betting_odds_avg,
    left_on=["home_norm", "away_norm"],
    right_on=["home_norm", "away_norm"],
    how="inner"
)

print("Matched rows:", len(df_compare))
df_compare.head()

Matched rows: 20


Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_win,p_draw,p_away_win,home_norm,away_norm,home_team,away_team,p_home_book,p_draw_book,p_away_book
0,2026-01-24T12:30:00Z,West Ham United FC,Sunderland AFC,0.300547,0.279822,0.41916,west ham united,sunderland,West Ham United,Sunderland,0.386261,0.289448,0.324291
1,2026-01-24T15:00:00Z,Burnley FC,Tottenham Hotspur FC,0.264293,0.215537,0.514841,burnley,tottenham hotspur,Burnley,Tottenham Hotspur,0.255774,0.278225,0.466002
2,2026-01-24T15:00:00Z,Fulham FC,Brighton & Hove Albion FC,0.405496,0.22847,0.362613,fulham,brighton and hove albion,Fulham,Brighton and Hove Albion,0.376212,0.280847,0.342941
3,2026-01-24T15:00:00Z,Manchester City FC,Wolverhampton Wanderers FC,0.775213,0.122029,0.071776,manchester city,wolverhampton wanderers,Manchester City,Wolverhampton Wanderers,0.795969,0.134538,0.069493
4,2026-01-24T17:30:00Z,AFC Bournemouth,Liverpool FC,0.304759,0.214773,0.474532,bournemouth,liverpool,Bournemouth,Liverpool,0.257024,0.243672,0.499304


In [30]:
df_compare["diff_home"] = df_compare["p_home_win"] - df_compare["p_home_book"]
df_compare["diff_draw"] = df_compare["p_draw"] - df_compare["p_draw_book"]
df_compare["diff_away"] = df_compare["p_away_win"] - df_compare["p_away_book"]

df_compare[["homeTeam", "awayTeam", "diff_home", "diff_draw", "diff_away"]].head()

Unnamed: 0,homeTeam,awayTeam,diff_home,diff_draw,diff_away
0,West Ham United FC,Sunderland AFC,-0.085714,-0.009627,0.094869
1,Burnley FC,Tottenham Hotspur FC,0.008519,-0.062688,0.04884
2,Fulham FC,Brighton & Hove Albion FC,0.029283,-0.052376,0.019672
3,Manchester City FC,Wolverhampton Wanderers FC,-0.020756,-0.012509,0.002283
4,AFC Bournemouth,Liverpool FC,0.047735,-0.028899,-0.024772


In [31]:
import numpy as np

rmse_home = np.sqrt(np.mean((df_compare["p_home_win"] - df_compare["p_home_book"])**2))
rmse_draw = np.sqrt(np.mean((df_compare["p_draw"] - df_compare["p_draw_book"])**2))
rmse_away = np.sqrt(np.mean((df_compare["p_away_win"] - df_compare["p_away_book"])**2))

rmse_home, rmse_draw, rmse_away


(0.05503852146323975, 0.03854795251723141, 0.048998304404854046)

In [32]:
rmse_total = np.sqrt(np.mean((
    df_compare["p_home_win"] - df_compare["p_home_book"]
)**2 + (
    df_compare["p_draw"] - df_compare["p_draw_book"]
)**2 + (
    df_compare["p_away_win"] - df_compare["p_away_book"]
)**2 ))

rmse_total

0.08316259569470502

In [33]:
df_compare["abs_diff"] = (
    abs(df_compare["diff_home"]) +
    abs(df_compare["diff_draw"]) +
    abs(df_compare["diff_away"])
)

df_compare.sort_values("abs_diff", ascending=False).head(10)[
    ["homeTeam", "awayTeam", "diff_home", "diff_draw", "diff_away"]
]


Unnamed: 0,homeTeam,awayTeam,diff_home,diff_draw,diff_away
8,Arsenal FC,Manchester United FC,0.108539,-0.062273,-0.053795
16,Manchester United FC,Fulham FC,-0.110446,0.001523,0.106377
0,West Ham United FC,Sunderland AFC,-0.085714,-0.009627,0.094869
9,Everton FC,Leeds United FC,0.092224,-0.049945,-0.043653
12,Wolverhampton Wanderers FC,AFC Bournemouth,-0.029397,-0.048658,0.073666
19,Sunderland AFC,Burnley FC,0.066898,-0.033451,-0.03514
6,Brentford FC,Nottingham Forest FC,0.063797,-0.060344,-0.007581
1,Burnley FC,Tottenham Hotspur FC,0.008519,-0.062688,0.04884
13,Chelsea FC,West Ham United FC,0.05359,-0.039107,-0.027231
17,Nottingham Forest FC,Crystal Palace FC,-0.032759,-0.02061,0.052612


## 7. Replace my estimates probabilities with the ones I have from odds, creating my final match probabilities

In [34]:
df_odds.head(2)

Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_win,p_draw,p_away_win,home_norm,away_norm
0,2026-01-24T12:30:00Z,West Ham United FC,Sunderland AFC,0.300547,0.279822,0.41916,west ham united,sunderland
1,2026-01-24T15:00:00Z,Burnley FC,Tottenham Hotspur FC,0.264293,0.215537,0.514841,burnley,tottenham hotspur


In [35]:
betting_odds_avg.head(2)

Unnamed: 0,home_team,away_team,p_home_book,p_draw_book,p_away_book,home_norm,away_norm
0,Newcastle United,Aston Villa,0.48058,0.256018,0.263401,newcastle united,aston villa
1,Burnley,Tottenham Hotspur,0.255774,0.278225,0.466002,burnley,tottenham hotspur


In [36]:
df_final_probabilities = df_odds.merge(
    betting_odds_avg,
    left_on=["home_norm", "away_norm"],
    right_on=["home_norm", "away_norm"],
    how="left"
)

In [37]:
df_final_probabilities = df_final_probabilities[[
    "utcDate",
    "homeTeam",
    "awayTeam",
    "p_home_win",
    "p_draw",
    "p_away_win",
    "p_home_book",
    "p_draw_book",
    "p_away_book",
]]

df_final_probabilities

Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_win,p_draw,p_away_win,p_home_book,p_draw_book,p_away_book
0,2026-01-24T12:30:00Z,West Ham United FC,Sunderland AFC,0.300547,0.279822,0.419160,0.386261,0.289448,0.324291
1,2026-01-24T15:00:00Z,Burnley FC,Tottenham Hotspur FC,0.264293,0.215537,0.514841,0.255774,0.278225,0.466002
2,2026-01-24T15:00:00Z,Fulham FC,Brighton & Hove Albion FC,0.405496,0.228470,0.362613,0.376212,0.280847,0.342941
3,2026-01-24T15:00:00Z,Manchester City FC,Wolverhampton Wanderers FC,0.775213,0.122029,0.071776,0.795969,0.134538,0.069493
4,2026-01-24T17:30:00Z,AFC Bournemouth,Liverpool FC,0.304759,0.214773,0.474532,0.257024,0.243672,0.499304
...,...,...,...,...,...,...,...,...,...
155,2026-05-24T15:00:00Z,Liverpool FC,Brentford FC,0.564764,0.197342,0.228107,,,
156,2026-05-24T15:00:00Z,Manchester City FC,Aston Villa FC,0.582085,0.210743,0.201947,,,
157,2026-05-24T15:00:00Z,Nottingham Forest FC,AFC Bournemouth,0.411739,0.237440,0.348489,,,
158,2026-05-24T15:00:00Z,Tottenham Hotspur FC,Everton FC,0.420254,0.258064,0.320653,,,


In [38]:
df_final_probabilities["p_home_final"] = np.where(
    df_final_probabilities["p_home_book"].notna(),
    df_final_probabilities["p_home_book"],
    df_final_probabilities["p_home_win"]
)

df_final_probabilities["p_draw_final"] = np.where(
    df_final_probabilities["p_draw_book"].notna(),
    df_final_probabilities["p_draw_book"],
    df_final_probabilities["p_draw"]
)

df_final_probabilities["p_away_final"] = np.where(
    df_final_probabilities["p_away_book"].notna(),
    df_final_probabilities["p_away_book"],
    df_final_probabilities["p_away_win"]
)

In [39]:
print("Used betting odds:", df_final_probabilities["p_home_book"].notna().sum())
print("Used model:", df_final_probabilities["p_home_book"].isna().sum())


Used betting odds: 20
Used model: 140


In [40]:
df_final_probabilities = df_final_probabilities[[
    "utcDate",
    "homeTeam",
    "awayTeam",
    "p_home_final",
    "p_draw_final",
    "p_away_final"
]]

In [41]:
df_final_probabilities

Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_final,p_draw_final,p_away_final
0,2026-01-24T12:30:00Z,West Ham United FC,Sunderland AFC,0.386261,0.289448,0.324291
1,2026-01-24T15:00:00Z,Burnley FC,Tottenham Hotspur FC,0.255774,0.278225,0.466002
2,2026-01-24T15:00:00Z,Fulham FC,Brighton & Hove Albion FC,0.376212,0.280847,0.342941
3,2026-01-24T15:00:00Z,Manchester City FC,Wolverhampton Wanderers FC,0.795969,0.134538,0.069493
4,2026-01-24T17:30:00Z,AFC Bournemouth,Liverpool FC,0.257024,0.243672,0.499304
...,...,...,...,...,...,...
155,2026-05-24T15:00:00Z,Liverpool FC,Brentford FC,0.564764,0.197342,0.228107
156,2026-05-24T15:00:00Z,Manchester City FC,Aston Villa FC,0.582085,0.210743,0.201947
157,2026-05-24T15:00:00Z,Nottingham Forest FC,AFC Bournemouth,0.411739,0.237440,0.348489
158,2026-05-24T15:00:00Z,Tottenham Hotspur FC,Everton FC,0.420254,0.258064,0.320653


In [42]:
df_final_probabilities["homeTeam"].unique()

array(['West Ham United FC', 'Burnley FC', 'Fulham FC',
       'Manchester City FC', 'AFC Bournemouth', 'Crystal Palace FC',
       'Brentford FC', 'Newcastle United FC', 'Arsenal FC', 'Everton FC',
       'Brighton & Hove Albion FC', 'Leeds United FC',
       'Wolverhampton Wanderers FC', 'Chelsea FC', 'Liverpool FC',
       'Aston Villa FC', 'Manchester United FC', 'Nottingham Forest FC',
       'Tottenham Hotspur FC', 'Sunderland AFC'], dtype=object)

In [43]:
name_map = {
    "Aston Villa FC": "Aston Villa",
    "Brighton & Hove Albion FC": "Brighton & Hove Albion",
    "AFC Bournemouth": "AFC Bournemouth",   # keep as is
    "Bournemouth": "AFC Bournemouth",
    "Sunderland AFC": "Sunderland",
    "Newcastle United FC": "Newcastle United",
    "Manchester City FC": "Manchester City",
    "Manchester United FC": "Manchester United",
    "West Ham United FC": "West Ham United",
    "Wolverhampton Wanderers FC": "Wolverhampton Wanderers",
    "Tottenham Hotspur FC": "Tottenham Hotspur",
    "Crystal Palace FC": "Crystal Palace",
    "Brentford FC": "Brentford",
    "Everton FC": "Everton",
    "Leeds United FC": "Leeds United",
    "Chelsea FC": "Chelsea",
    "Liverpool FC": "Liverpool",
    "Nottingham Forest FC": "Nottingham Forest",
    "Burnley FC": "Burnley",
    "Fulham FC": "Fulham",
    "Arsenal FC": "Arsenal"
}

df_final_probabilities["home_team_norm"] = df_final_probabilities["homeTeam"].replace(name_map)
df_final_probabilities["away_team_norm"] = df_final_probabilities["awayTeam"].replace(name_map)

premierleague["team_norm"] = premierleague["team"].replace({
    "Brighton & Hove Albion": "Brighton & Hove Albion",
    "AFC Bournemouth": "AFC Bournemouth"
})


In [45]:
df_simulation = df_final_probabilities.copy()

In [46]:
# Normalize probabilities so they sum to 1
prob_cols = ["p_home_final", "p_draw_final", "p_away_final"]
df_simulation[prob_cols] = df_simulation[prob_cols].div(df_simulation[prob_cols].sum(axis=1), axis=0)

In [47]:
df_simulation.head()

Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_final,p_draw_final,p_away_final,home_team_norm,away_team_norm
0,2026-01-24T12:30:00Z,West Ham United FC,Sunderland AFC,0.386261,0.289448,0.324291,West Ham United,Sunderland
1,2026-01-24T15:00:00Z,Burnley FC,Tottenham Hotspur FC,0.255774,0.278225,0.466002,Burnley,Tottenham Hotspur
2,2026-01-24T15:00:00Z,Fulham FC,Brighton & Hove Albion FC,0.376212,0.280847,0.342941,Fulham,Brighton & Hove Albion
3,2026-01-24T15:00:00Z,Manchester City FC,Wolverhampton Wanderers FC,0.795969,0.134538,0.069493,Manchester City,Wolverhampton Wanderers
4,2026-01-24T17:30:00Z,AFC Bournemouth,Liverpool FC,0.257024,0.243672,0.499304,AFC Bournemouth,Liverpool


## 8. Run simulations to build the Premier League table probabilities

In [48]:
def simulate_once(fixtures, table):
    table_sim = table.copy()

    # Use normalized team name column
    points = dict(zip(table_sim["team_norm"], table_sim["pts"]))

    for _, row in fixtures.iterrows():
        home = row["home_team_norm"]
        away = row["away_team_norm"]

        # choose outcome
        probs = [row["p_home_final"], row["p_draw_final"], row["p_away_final"]]
        outcome = np.random.choice(["H", "D", "A"], p=probs)

        if outcome == "H":
            points[home] += 3
        elif outcome == "D":
            points[home] += 1
            points[away] += 1
        else:
            points[away] += 3

    result_df = table_sim.copy()
    result_df["pts"] = result_df["team_norm"].map(points)

    # sort by points and goal difference
    result_df = result_df.sort_values(["pts", "gd"], ascending=[False, False])
    result_df["position"] = np.arange(1, len(result_df)+1)

    return result_df


def run_simulations(fixtures, table, n_sim=10000):
    position_counts = {team: np.zeros(len(table)) for team in table["team_norm"]}

    for _ in range(n_sim):
        final_table = simulate_once(fixtures, table)

        for _, row in final_table.iterrows():
            position_counts[row["team_norm"]][row["position"]-1] += 1

    pos_df = pd.DataFrame(position_counts, index=np.arange(1, len(table)+1))
    pos_df.index.name = "position"
    return pos_df

In [49]:
# RUN
position_distribution = run_simulations(df_simulation, premierleague, n_sim=20000)

In [50]:
position_distribution_t = position_distribution.T

In [51]:
position_distribution_pct = position_distribution_t.div(
    position_distribution_t.sum(axis=1),
    axis=0
) * 100


## 9. Preview and present the results graphically

In [52]:
# Build label mapping: "position  team" (extra space for 1-9)
team_labels = (
    premierleague[["team", "position"]]
    .set_index("team")["position"]
    .map(lambda pos: f"{pos}{'  ' if pos < 10 else ' '}")
)

# Join position and team name into one label
team_labels = (
    premierleague[["team", "position"]]
    .assign(
        label=lambda df: df.apply(
            lambda r: f"{r['position']}{'&nbsp;&nbsp;&nbsp;&nbsp;' if r['position'] < 10 else '&nbsp;&nbsp;'}{r['team']}",
            axis=1
        )
    )
    .set_index("team")["label"]
)


# Apply labels to your table index
position_distribution_pct.index = position_distribution_pct.index.map(team_labels)

# Drop position column if present
position_distribution_pct = position_distribution_pct.drop(columns=["position"], errors="ignore")

# Remove index name
position_distribution_pct.index.name = None


In [74]:
position_distribution_pct.style\
    .background_gradient(
        cmap=green_cmap,
        vmin=0,
        vmax=vmax
    )\
    .applymap(zero_style)\
    .format("{:.2f}%")\
    .set_table_styles([
        {"selector": "th", "props": [
            ("background-color", "#e6edf4"),
            ("color", "#333"),
            ("text-align", "center"),
            ("font-family", "Inter, Roboto, Arial, sans-serif"),
            ("font-size", "13px"),
            ("font-weight", "600")
        ]},

        # <-- LEFT align only the "position" column header
        {"selector": "th.col_heading:nth-child(1)", "props": [
            ("text-align", "left")
        ]},

        {"selector": "th.row_heading", "props": [
            ("text-align", "left"),
            ("font-size", "13px"),
            ("font-weight", "600"),
            ("white-space", "nowrap"),
            ("max-width", "250px"),
            ("overflow", "hidden"),
            ("text-overflow", "ellipsis")
        ]},

        {"selector": "tr:nth-child(odd) th.row_heading", "props": [
            ("background-color", "#fbfcfe")
        ]},
        {"selector": "tr:nth-child(even) th.row_heading", "props": [
            ("background-color", "#e6edf4")
        ]},

        {"selector": "td", "props": [
            ("text-align", "center"),
            ("font-family", "Inter, Roboto, Arial, sans-serif"),
            ("font-size", "12px"),
            ("font-weight", "500"),
            ("color", "#000")
        ]}
    ])


position,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
1 Arsenal,89.58%,9.51%,0.81%,0.08%,0.01%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
2 Manchester City,9.12%,65.23%,18.25%,5.27%,1.51%,0.43%,0.14%,0.02%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
3 Aston Villa,1.01%,16.43%,42.09%,22.10%,10.02%,4.41%,2.23%,0.98%,0.40%,0.21%,0.08%,0.03%,0.03%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
4 Liverpool,0.21%,6.42%,21.79%,30.12%,18.52%,10.13%,5.64%,3.55%,1.75%,0.88%,0.44%,0.33%,0.16%,0.06%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%
5 Manchester United,0.01%,0.15%,1.36%,4.71%,9.10%,12.81%,13.69%,13.39%,11.83%,10.03%,7.71%,5.90%,4.44%,2.53%,1.48%,0.69%,0.16%,0.00%,0.00%,0.00%
6 Chelsea,0.03%,1.63%,9.98%,18.84%,23.36%,17.06%,10.56%,6.87%,4.76%,3.09%,1.81%,0.98%,0.53%,0.26%,0.17%,0.05%,0.01%,0.00%,0.00%,0.00%
7 Brentford,0.01%,0.24%,2.04%,6.30%,11.09%,14.82%,15.40%,13.49%,10.77%,8.45%,6.45%,4.62%,3.06%,1.82%,1.01%,0.32%,0.11%,0.00%,0.00%,0.00%
8 Newcastle United,0.01%,0.33%,2.47%,7.32%,13.05%,15.47%,15.30%,12.83%,10.34%,8.29%,5.66%,3.64%,2.61%,1.60%,0.74%,0.30%,0.04%,0.00%,0.00%,0.00%
9 Sunderland,0.00%,0.03%,0.40%,1.74%,4.28%,7.15%,9.50%,11.87%,12.16%,12.29%,11.39%,9.88%,7.76%,5.41%,3.65%,1.85%,0.61%,0.02%,0.00%,0.00%
10 Everton,0.00%,0.01%,0.15%,0.87%,2.23%,4.32%,6.44%,8.70%,10.37%,11.65%,12.35%,11.76%,10.45%,9.06%,6.49%,3.89%,1.23%,0.05%,0.00%,0.00%
