## ðŸ“ˆ Predicting Premier League Final Positions Using Betting Odds & Simulation

**Competition:** English Premier League 2025/26  
**Purpose:** Estimate probabilities of final league positions using betting market information and simulation  
**Methods:** Odds-implied probabilities, Monte Carlo simulation, scenario analysis  
**Author:** [Victoria Friss de Kereki](https://www.linkedin.com/in/victoria-friss-de-kereki/)  

---

**Notebook first written:** `17/01/2026`  
**Last updated:** `17/01/2026`  

> This notebook develops a probabilistic framework to predict final Premier League final positions using betting odds as market-based expectations.
>
> Betting odds are transformed into implied probabilities and adjusted for bookmaker margin. These probabilities are then used to simulate the remainder of the season via Monte Carlo methods, generating distributions over final points totals and league positions.
>
> The analysis focuses on estimating the likelihood of key outcomes such as title wins, top-four finishes, relegation, and mid-table placements. Results are presented at team level with uncertainty intervals, and the framework can be extended to incorporate form, fixture difficulty, or alternative predictive inputs beyond betting markets.


In [152]:
import os
from datetime import datetime, timedelta
import numpy as np
import pandas as pd
import requests
from scipy.stats import poisson
from dotenv import load_dotenv
import soccerdata as sd

## 1. Premier League Final Standings (ESPN Scraping)
##### Using the ESPN scraper I built in my previous project.

In [182]:
year = 2025  # current Premier League season start year

url = f"https://www.espn.com/soccer/standings/_/league/ENG.1/season/{year}"
tables = pd.read_html(url)

teams_raw = tables[0]
stats = tables[1]

teams = pd.DataFrame()
teams["position"] = teams_raw.iloc[:, 0].str.extract(r"^(\d+)").astype(int)
teams["team"] = (
    teams_raw.iloc[:, 0]
    .str.replace(r"^\d+", "", regex=True)
    .str.replace(r"^[A-Z]{2,3}", "", regex=True)
    .str.strip()
)

stats.columns = ["gp", "w", "d", "l", "gf", "ga", "gd", "pts"]
stats = stats.apply(lambda c: c.astype(str)
                              .str.replace("+", "", regex=False)
                              .astype(int))

premierleague = pd.concat([teams, stats], axis=1)
# premierleague["season"] = f"{year}-{year+1}"

premierleague


Unnamed: 0,position,team,gp,w,d,l,gf,ga,gd,pts
0,1,Arsenal,22,15,5,2,40,14,26,50
1,2,Manchester City,22,13,4,5,45,21,24,43
2,3,Aston Villa,22,13,4,5,33,25,8,43
3,4,Liverpool,22,10,6,6,33,29,4,36
4,5,Manchester United,22,9,8,5,38,32,6,35
5,6,Chelsea,22,9,7,6,36,24,12,34
6,7,Brentford,22,10,3,9,35,30,5,33
7,8,Newcastle United,22,9,6,7,32,27,5,33
8,9,Sunderland,22,8,9,5,23,23,0,33
9,10,Everton,22,9,5,8,24,25,-1,32


## 2. Get betting odds using API

In [234]:
# Load variables from API_KEY.env
load_dotenv("API_KEY.env")

API_KEY = os.getenv("ODDS_DATA_API_KEY")

if API_KEY is None:
    raise ValueError("API_KEY not found. Check API_KEY.env")

print("API key loaded successfully")

API key loaded successfully


In [235]:
url = "https://api.the-odds-api.com/v4/sports/soccer_epl/odds"

params = {
    "apiKey": API_KEY,
    "regions": "uk",
    "markets": "h2h",
    "oddsFormat": "decimal",
    "dateFormat": "iso",
    "days": 365  # get all upcoming matches for the next year
}

response = requests.get(url, params=params)
response.raise_for_status()

odds_data = response.json()
print("Total upcoming matches:", len(odds_data))

Total upcoming matches: 21


In [236]:
def flatten_odds(data):
    rows = []

    for match in data:
        match_id = match["id"]
        home = match["home_team"]
        away = match["away_team"]
        time = match["commence_time"]

        for book in match["bookmakers"]:
            bookmaker = book["title"]

            # Find head-to-head (h2h) market. Find the market where key == 'h2h' (win/draw/win odds). If not found, skip this bookmaker.
            h2h = next((m for m in book["markets"] if m["key"] == "h2h"), None)
            if not h2h:
                continue

            outcomes = {o["name"]: o["price"] for o in h2h["outcomes"]}

            rows.append({
                "match_id": match_id,
                "commence_time": time,
                "home_team": home,
                "away_team": away,
                "bookmaker": bookmaker,
                "home_odds": outcomes.get(home),
                "draw_odds": outcomes.get("Draw"),
                "away_odds": outcomes.get(away),
            })

    return pd.DataFrame(rows)

df = flatten_odds(odds_data)
df.head()

Unnamed: 0,match_id,commence_time,home_team,away_team,bookmaker,home_odds,draw_odds,away_odds
0,342788786c22e570ed2da53a9608113f,2026-01-19T20:00:00Z,Brighton and Hove Albion,Bournemouth,Unibet (UK),1.88,4.0,3.75
1,342788786c22e570ed2da53a9608113f,2026-01-19T20:00:00Z,Brighton and Hove Albion,Bournemouth,Paddy Power,1.83,3.8,3.8
2,342788786c22e570ed2da53a9608113f,2026-01-19T20:00:00Z,Brighton and Hove Albion,Bournemouth,Sky Bet,1.83,3.8,3.75
3,342788786c22e570ed2da53a9608113f,2026-01-19T20:00:00Z,Brighton and Hove Albion,Bournemouth,Smarkets,1.91,4.1,4.0
4,342788786c22e570ed2da53a9608113f,2026-01-19T20:00:00Z,Brighton and Hove Albion,Bournemouth,Betway,1.88,4.0,3.6


In [237]:
betting_odds_avg = (
    df.groupby(["match_id", "home_team", "away_team"])
      .agg({
          "home_odds": "mean",
          "draw_odds": "mean",
          "away_odds": "mean"
      })
      .reset_index()
)

betting_odds_avg.head()

Unnamed: 0,match_id,home_team,away_team,home_odds,draw_odds,away_odds
0,1ca6d3d9cde3e58a39211feb9188530c,Newcastle United,Aston Villa,1.975789,3.618421,3.523684
1,1e811fa7ead0a3e6ef920b15b2bbb95d,Burnley,Tottenham Hotspur,3.786842,3.423684,1.974211
2,342788786c22e570ed2da53a9608113f,Brighton and Hove Albion,Bournemouth,1.843684,3.907895,3.815789
3,36820753efb36739a83c6e5e440827b2,Brighton and Hove Albion,Everton,1.844444,3.158889,3.6
4,38a3cb5e295f55e274d589fc646cf2dd,Tottenham Hotspur,Manchester City,4.635,3.57875,1.6


In [238]:
# Convert odds -> raw probabilities
betting_odds_avg["p_home_raw"] = 1 / betting_odds_avg["home_odds"]
betting_odds_avg["p_draw_raw"] = 1 / betting_odds_avg["draw_odds"]
betting_odds_avg["p_away_raw"] = 1 / betting_odds_avg["away_odds"]

#2) Normalize (remove bookmaker margin)
betting_odds_avg["total_raw"] = (
    betting_odds_avg["p_home_raw"] +
    betting_odds_avg["p_draw_raw"] +
    betting_odds_avg["p_away_raw"]
)

betting_odds_avg["p_home_book"] = betting_odds_avg["p_home_raw"] / betting_odds_avg["total_raw"]
betting_odds_avg["p_draw_book"] = betting_odds_avg["p_draw_raw"] / betting_odds_avg["total_raw"]
betting_odds_avg["p_away_book"] = betting_odds_avg["p_away_raw"] / betting_odds_avg["total_raw"]

Unnamed: 0,home_team,away_team,p_home_book,p_draw_book,p_away_book,home_odds,draw_odds,away_odds
0,Newcastle United,Aston Villa,0.474664,0.259184,0.266152,1.975789,3.618421,3.523684
1,Burnley,Tottenham Hotspur,0.248495,0.274853,0.476652,3.786842,3.423684,1.974211
2,Brighton and Hove Albion,Bournemouth,0.51152,0.241327,0.247152,1.843684,3.907895,3.815789
3,Brighton and Hove Albion,Everton,0.477046,0.278542,0.244412,1.844444,3.158889,3.6
4,Tottenham Hotspur,Manchester City,0.192603,0.249449,0.557948,4.635,3.57875,1.6


In [240]:
# Keep only useful columns
betting_odds_avg = betting_odds_avg[[
#match_id",
    "home_team",
    "away_team",
    "p_home_book",
    "p_draw_book",
    "p_away_book"
]]

betting_odds_avg.head()

Unnamed: 0,home_team,away_team,p_home_book,p_draw_book,p_away_book
0,Newcastle United,Aston Villa,0.474664,0.259184,0.266152
1,Burnley,Tottenham Hotspur,0.248495,0.274853,0.476652
2,Brighton and Hove Albion,Bournemouth,0.51152,0.241327,0.247152
3,Brighton and Hove Albion,Everton,0.477046,0.278542,0.244412
4,Tottenham Hotspur,Manchester City,0.192603,0.249449,0.557948


## 3. Get fixtures for upcoming EPL games

In [241]:
# Load variables from API_KEY.env
load_dotenv("API_KEY.env")

API_KEY = os.getenv("FOOTBALL_DATA_API_KEY")

if API_KEY is None:
    raise ValueError("API_KEY not found. Check API_KEY.env")

print("API key loaded successfully")

API key loaded successfully


In [242]:
url = "https://api.football-data.org/v4/competitions/PL/matches"

headers = {
    "X-Auth-Token": API_KEY
}

today = datetime.utcnow().date()
end_of_season = today + timedelta(days=365)  # big range to cover all remaining games

params = {
    "status": "SCHEDULED",
    "dateFrom": today.isoformat(),
    "dateTo": end_of_season.isoformat()
}

response = requests.get(url, headers=headers, params=params)
response.raise_for_status()

data = response.json()
fixtures = data["matches"]

df_fixtures = pd.DataFrame(fixtures)

df_fixtures_clean = df_fixtures[[
    "utcDate",
    "status",
    "homeTeam",
    "awayTeam"
]]

df_fixtures_clean.head()
print("Total scheduled matches:", len(df_fixtures_clean))


Total scheduled matches: 161


In [243]:
df_fixtures_clean["homeTeam"] = df_fixtures_clean["homeTeam"].apply(lambda x: x["name"])
df_fixtures_clean["awayTeam"] = df_fixtures_clean["awayTeam"].apply(lambda x: x["name"])

In [244]:
df_fixtures_clean

Unnamed: 0,utcDate,status,homeTeam,awayTeam
0,2026-01-19T20:00:00Z,TIMED,Brighton & Hove Albion FC,AFC Bournemouth
1,2026-01-24T12:30:00Z,TIMED,West Ham United FC,Sunderland AFC
2,2026-01-24T15:00:00Z,TIMED,Burnley FC,Tottenham Hotspur FC
3,2026-01-24T15:00:00Z,TIMED,Fulham FC,Brighton & Hove Albion FC
4,2026-01-24T15:00:00Z,TIMED,Manchester City FC,Wolverhampton Wanderers FC
...,...,...,...,...
156,2026-05-24T15:00:00Z,TIMED,Liverpool FC,Brentford FC
157,2026-05-24T15:00:00Z,TIMED,Manchester City FC,Aston Villa FC
158,2026-05-24T15:00:00Z,TIMED,Nottingham Forest FC,AFC Bournemouth
159,2026-05-24T15:00:00Z,TIMED,Tottenham Hotspur FC,Everton FC


## 4. Get this season (2025/26) and last season (2024/25) results

In [245]:
url = "https://api.football-data.org/v4/competitions/PL/matches"
params = {
    "season": 2025,   # season year
    "status": "FINISHED"
}

response = requests.get(url, headers=headers, params=params)
response.raise_for_status()
past_matches = response.json()["matches"]

In [247]:
clean_rows = []

for m in past_matches:
    row = {
        "utcDate": m["utcDate"],
        "matchday": m["matchday"],
        "status": m["status"],
        "homeTeam": m["homeTeam"]["name"],
        "awayTeam": m["awayTeam"]["name"],
        "homeGoals": m["score"]["fullTime"]["home"],
        "awayGoals": m["score"]["fullTime"]["away"],
        "winner": m["score"]["winner"]
    }
    clean_rows.append(row)

past_matches_25_clean = pd.DataFrame(clean_rows)
past_matches_25_clean.tail()

Unnamed: 0,utcDate,matchday,status,homeTeam,awayTeam,homeGoals,awayGoals,winner
214,2026-01-17T15:00:00Z,22,FINISHED,Leeds United FC,Fulham FC,1,0,HOME_TEAM
215,2026-01-17T15:00:00Z,22,FINISHED,Tottenham Hotspur FC,West Ham United FC,1,2,AWAY_TEAM
216,2026-01-17T17:30:00Z,22,FINISHED,Nottingham Forest FC,Arsenal FC,0,0,DRAW
217,2026-01-18T14:00:00Z,22,FINISHED,Wolverhampton Wanderers FC,Newcastle United FC,0,0,DRAW
218,2026-01-18T16:30:00Z,22,FINISHED,Aston Villa FC,Everton FC,0,1,AWAY_TEAM


In [248]:
url = "https://api.football-data.org/v4/competitions/PL/matches"
params = {
    "season": 2024,   # season year
    "status": "FINISHED"
}

response = requests.get(url, headers=headers, params=params)
response.raise_for_status()
past_matches_24 = response.json()["matches"]

In [249]:
clean_rows = []

for m in past_matches_24:
    row = {
        "utcDate": m["utcDate"],
        "matchday": m["matchday"],
        "status": m["status"],
        "homeTeam": m["homeTeam"]["name"],
        "awayTeam": m["awayTeam"]["name"],
        "homeGoals": m["score"]["fullTime"]["home"],
        "awayGoals": m["score"]["fullTime"]["away"],
        "winner": m["score"]["winner"]
    }
    clean_rows.append(row)

past_matches_24_clean = pd.DataFrame(clean_rows)
past_matches_24_clean.head()

Unnamed: 0,utcDate,matchday,status,homeTeam,awayTeam,homeGoals,awayGoals,winner
0,2024-08-16T19:00:00Z,1,FINISHED,Manchester United FC,Fulham FC,1,0,HOME_TEAM
1,2024-08-17T11:30:00Z,1,FINISHED,Ipswich Town FC,Liverpool FC,0,2,AWAY_TEAM
2,2024-08-17T14:00:00Z,1,FINISHED,Arsenal FC,Wolverhampton Wanderers FC,2,0,HOME_TEAM
3,2024-08-17T14:00:00Z,1,FINISHED,Everton FC,Brighton & Hove Albion FC,0,3,AWAY_TEAM
4,2024-08-17T14:00:00Z,1,FINISHED,Newcastle United FC,Southampton FC,1,0,HOME_TEAM


## 5. Combine and calculate probabilities of W/D/L for each match

In [250]:
# Load Dataframes
df_current = past_matches_25_clean
df_prev = past_matches_24_clean
df_future = df_fixtures_clean

# Combine all past fixtures together
df_all = pd.concat([df_prev, df_current], ignore_index=True)

In [251]:
# Add weights: more recent games = more weight
df_all["date"] = pd.to_datetime(df_all["utcDate"])
df_all["weight"] = np.linspace(1, 2, len(df_all))  # simple linear weighting

In [252]:
df_all.tail()

Unnamed: 0,utcDate,matchday,status,homeTeam,awayTeam,homeGoals,awayGoals,winner,date,weight
594,2026-01-17T15:00:00Z,22,FINISHED,Leeds United FC,Fulham FC,1,0,HOME_TEAM,2026-01-17 15:00:00+00:00,1.993311
595,2026-01-17T15:00:00Z,22,FINISHED,Tottenham Hotspur FC,West Ham United FC,1,2,AWAY_TEAM,2026-01-17 15:00:00+00:00,1.994983
596,2026-01-17T17:30:00Z,22,FINISHED,Nottingham Forest FC,Arsenal FC,0,0,DRAW,2026-01-17 17:30:00+00:00,1.996656
597,2026-01-18T14:00:00Z,22,FINISHED,Wolverhampton Wanderers FC,Newcastle United FC,0,0,DRAW,2026-01-18 14:00:00+00:00,1.998328
598,2026-01-18T16:30:00Z,22,FINISHED,Aston Villa FC,Everton FC,0,1,AWAY_TEAM,2026-01-18 16:30:00+00:00,2.0


In [253]:
# Compute home advantage
# Home advantage = average home goals - average away goals
home_avg = df_all["homeGoals"].mean()
away_avg = df_all["awayGoals"].mean()
home_advantage = home_avg - away_avg
home_advantage

0.1936560934891487

In [254]:
#  Calculate attack & defense strengths
teams = pd.unique(df_all[["homeTeam", "awayTeam"]].values.ravel("K"))

attack = pd.Series(1.0, index=teams)
defense = pd.Series(1.0, index=teams)

# Initialize with goals per match
team_stats = {}

for team in teams:
    home_games = df_all[df_all["homeTeam"] == team]
    away_games = df_all[df_all["awayTeam"] == team]

    goals_scored = (home_games["homeGoals"] * home_games["weight"]).sum() + \
                   (away_games["awayGoals"] * away_games["weight"]).sum()

    goals_against = (home_games["awayGoals"] * home_games["weight"]).sum() + \
                    (away_games["homeGoals"] * away_games["weight"]).sum()

    matches = home_games["weight"].sum() + away_games["weight"].sum()

    team_stats[team] = {
        "scored": goals_scored / matches,
        "against": goals_against / matches
    }

# Strengths = relative to league average
league_avg_scored = df_all["homeGoals"].mean() + df_all["awayGoals"].mean()
league_avg_scored /= 2

for team in teams:
    attack[team] = team_stats[team]["scored"] / league_avg_scored
    defense[team] = team_stats[team]["against"] / league_avg_scored

In [255]:
# Predict probabilities for each future match

def match_probabilities(home, away):
    # expected goals
    exp_home = np.exp(np.log(league_avg_scored) + np.log(attack[home]) + np.log(defense[away]) + home_advantage)
    exp_away = np.exp(np.log(league_avg_scored) + np.log(attack[away]) + np.log(defense[home]))

    # compute probabilities up to 6 goals
    max_goals = 6
    p_home = poisson.pmf(range(max_goals + 1), exp_home)
    p_away = poisson.pmf(range(max_goals + 1), exp_away)

    # result probabilities
    p_win = 0
    p_draw = 0
    p_loss = 0

    for i in range(max_goals + 1):
        for j in range(max_goals + 1):
            prob = p_home[i] * p_away[j]
            if i > j:
                p_win += prob
            elif i == j:
                p_draw += prob
            else:
                p_loss += prob

    return p_win, p_draw, p_loss

In [256]:
# Apply to all fixtures

results = []

for _, row in df_future.iterrows():
    home = row["homeTeam"]
    away = row["awayTeam"]

    p_win, p_draw, p_loss = match_probabilities(home, away)

    results.append({
        "utcDate": row["utcDate"],
        "homeTeam": home,
        "awayTeam": away,
        "p_home_win": p_win,
        "p_draw": p_draw,
        "p_away_win": p_loss,
    })

df_odds = pd.DataFrame(results)
df_odds.head()


Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_win,p_draw,p_away_win
0,2026-01-19T20:00:00Z,Brighton & Hove Albion FC,AFC Bournemouth,0.48977,0.209468,0.293454
1,2026-01-24T12:30:00Z,West Ham United FC,Sunderland AFC,0.300618,0.279891,0.419021
2,2026-01-24T15:00:00Z,Burnley FC,Tottenham Hotspur FC,0.264408,0.215611,0.514667
3,2026-01-24T15:00:00Z,Fulham FC,Brighton & Hove Albion FC,0.405525,0.227518,0.363391
4,2026-01-24T15:00:00Z,Manchester City FC,Wolverhampton Wanderers FC,0.775258,0.12204,0.07175


## 6. Compare calculated probabilities to bookmaker ones

In [257]:
unique_bet_home = betting_odds_avg["home_team"].unique()
unique_model_home = df_odds["homeTeam"].unique()

In [258]:
print(unique_bet_home)
print(unique_model_home)

['Newcastle United' 'Burnley' 'Brighton and Hove Albion'
 'Tottenham Hotspur' 'Crystal Palace' 'Sunderland' 'Arsenal' 'Bournemouth'
 'Brentford' 'Liverpool' 'Aston Villa' 'West Ham United' 'Chelsea'
 'Manchester City' 'Wolverhampton Wanderers' 'Nottingham Forest' 'Fulham'
 'Manchester United' 'Leeds United' 'Everton']
['Brighton & Hove Albion FC' 'West Ham United FC' 'Burnley FC' 'Fulham FC'
 'Manchester City FC' 'AFC Bournemouth' 'Crystal Palace FC' 'Brentford FC'
 'Newcastle United FC' 'Arsenal FC' 'Everton FC' 'Leeds United FC'
 'Wolverhampton Wanderers FC' 'Chelsea FC' 'Liverpool FC' 'Aston Villa FC'
 'Manchester United FC' 'Nottingham Forest FC' 'Tottenham Hotspur FC'
 'Sunderland AFC']


In [259]:
def normalize_team(name):
    name = name.lower()
    name = name.replace(" fc", "")
    name = name.replace(" afc", "")
    name = name.replace("&", "and")
    name = name.replace("afc ", "")   # <--- this removes AFC from start
    name = name.strip()
    return name


In [260]:
df_odds["home_norm"] = df_odds["homeTeam"].apply(normalize_team)
df_odds["away_norm"] = df_odds["awayTeam"].apply(normalize_team)

betting_odds_avg["home_norm"] = betting_odds_avg["home_team"].apply(normalize_team)
betting_odds_avg["away_norm"] = betting_odds_avg["away_team"].apply(normalize_team)


In [262]:
unique_model_norm = df_odds["home_norm"].unique()
unique_bet_norm = betting_odds_avg["home_norm"].unique()

set(unique_model_norm) == set(unique_bet_norm)

True

In [264]:
df_compare = df_odds.merge(
    betting_odds_avg,
    left_on=["home_norm", "away_norm"],
    right_on=["home_norm", "away_norm"],
    how="inner"
)

print("Matched rows:", len(df_compare))
df_compare.head()

Matched rows: 21


Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_win,p_draw,p_away_win,home_norm,away_norm,home_team,away_team,p_home_book,p_draw_book,p_away_book
0,2026-01-19T20:00:00Z,Brighton & Hove Albion FC,AFC Bournemouth,0.48977,0.209468,0.293454,brighton and hove albion,bournemouth,Brighton and Hove Albion,Bournemouth,0.51152,0.241327,0.247152
1,2026-01-24T12:30:00Z,West Ham United FC,Sunderland AFC,0.300618,0.279891,0.419021,west ham united,sunderland,West Ham United,Sunderland,0.392545,0.287363,0.320091
2,2026-01-24T15:00:00Z,Burnley FC,Tottenham Hotspur FC,0.264408,0.215611,0.514667,burnley,tottenham hotspur,Burnley,Tottenham Hotspur,0.248495,0.274853,0.476652
3,2026-01-24T15:00:00Z,Fulham FC,Brighton & Hove Albion FC,0.405525,0.227518,0.363391,fulham,brighton and hove albion,Fulham,Brighton and Hove Albion,0.372327,0.280594,0.347079
4,2026-01-24T15:00:00Z,Manchester City FC,Wolverhampton Wanderers FC,0.775258,0.12204,0.07175,manchester city,wolverhampton wanderers,Manchester City,Wolverhampton Wanderers,0.801188,0.131999,0.066813


In [265]:
df_compare["diff_home"] = df_compare["p_home_win"] - df_compare["p_home_book"]
df_compare["diff_draw"] = df_compare["p_draw"] - df_compare["p_draw_book"]
df_compare["diff_away"] = df_compare["p_away_win"] - df_compare["p_away_book"]

df_compare[["homeTeam", "awayTeam", "diff_home", "diff_draw", "diff_away"]].head()

Unnamed: 0,homeTeam,awayTeam,diff_home,diff_draw,diff_away
0,Brighton & Hove Albion FC,AFC Bournemouth,-0.021751,-0.03186,0.046302
1,West Ham United FC,Sunderland AFC,-0.091928,-0.007472,0.098929
2,Burnley FC,Tottenham Hotspur FC,0.015913,-0.059242,0.038015
3,Fulham FC,Brighton & Hove Albion FC,0.033198,-0.053076,0.016312
4,Manchester City FC,Wolverhampton Wanderers FC,-0.025931,-0.009958,0.004937


In [266]:
import numpy as np

rmse_home = np.sqrt(np.mean((df_compare["p_home_win"] - df_compare["p_home_book"])**2))
rmse_draw = np.sqrt(np.mean((df_compare["p_draw"] - df_compare["p_draw_book"])**2))
rmse_away = np.sqrt(np.mean((df_compare["p_away_win"] - df_compare["p_away_book"])**2))

rmse_home, rmse_draw, rmse_away


(0.05100487260647347, 0.04102906855472363, 0.04520684042346975)

In [267]:
rmse_total = np.sqrt(np.mean((
    df_compare["p_home_win"] - df_compare["p_home_book"]
)**2 + (
    df_compare["p_draw"] - df_compare["p_draw_book"]
)**2 + (
    df_compare["p_away_win"] - df_compare["p_away_book"]
)**2 ))

rmse_total


0.07955212075830448

In [268]:
df_compare["abs_diff"] = (
    abs(df_compare["diff_home"]) +
    abs(df_compare["diff_draw"]) +
    abs(df_compare["diff_away"])
)

df_compare.sort_values("abs_diff", ascending=False).head(10)[
    ["homeTeam", "awayTeam", "diff_home", "diff_draw", "diff_away"]
]


Unnamed: 0,homeTeam,awayTeam,diff_home,diff_draw,diff_away
9,Arsenal FC,Manchester United FC,0.098318,-0.055831,-0.050009
1,West Ham United FC,Sunderland AFC,-0.091928,-0.007472,0.098929
17,Manchester United FC,Fulham FC,-0.071928,-0.026557,0.095943
10,Everton FC,Leeds United FC,0.094614,-0.047619,-0.048367
7,Brentford FC,Nottingham Forest FC,0.083309,-0.062134,-0.025298
12,Leeds United FC,Arsenal FC,-0.018449,-0.05788,0.070279
20,Sunderland AFC,Burnley FC,0.069955,-0.045358,-0.026288
2,Burnley FC,Tottenham Hotspur FC,0.015913,-0.059242,0.038015
13,Wolverhampton Wanderers FC,AFC Bournemouth,-0.008271,-0.0478,0.051499
14,Chelsea FC,West Ham United FC,0.047286,-0.041664,-0.018358


## 7. Replace my estimates probabilities with the ones I have from odds

In [282]:
df_odds.head(2)

Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_win,p_draw,p_away_win,home_norm,away_norm
0,2026-01-19T20:00:00Z,Brighton & Hove Albion FC,AFC Bournemouth,0.48977,0.209468,0.293454,brighton and hove albion,bournemouth
1,2026-01-24T12:30:00Z,West Ham United FC,Sunderland AFC,0.300618,0.279891,0.419021,west ham united,sunderland


In [283]:
betting_odds_avg.head(2)

Unnamed: 0,home_team,away_team,p_home_book,p_draw_book,p_away_book,home_norm,away_norm
0,Newcastle United,Aston Villa,0.474664,0.259184,0.266152,newcastle united,aston villa
1,Burnley,Tottenham Hotspur,0.248495,0.274853,0.476652,burnley,tottenham hotspur


In [284]:
df_final_probabilities = df_odds.merge(
    betting_odds_avg,
    left_on=["home_norm", "away_norm"],
    right_on=["home_norm", "away_norm"],
    how="left"
)

In [285]:
df_final_probabilities = df_final_probabilities[[
    "utcDate",
    "homeTeam",
    "awayTeam",
    "p_home_win",
    "p_draw",
    "p_away_win",
    "p_home_book",
    "p_draw_book",
    "p_away_book",
]]

df_final_probabilities

Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_win,p_draw,p_away_win,p_home_book,p_draw_book,p_away_book
0,2026-01-19T20:00:00Z,Brighton & Hove Albion FC,AFC Bournemouth,0.489770,0.209468,0.293454,0.511520,0.241327,0.247152
1,2026-01-24T12:30:00Z,West Ham United FC,Sunderland AFC,0.300618,0.279891,0.419021,0.392545,0.287363,0.320091
2,2026-01-24T15:00:00Z,Burnley FC,Tottenham Hotspur FC,0.264408,0.215611,0.514667,0.248495,0.274853,0.476652
3,2026-01-24T15:00:00Z,Fulham FC,Brighton & Hove Albion FC,0.405525,0.227518,0.363391,0.372327,0.280594,0.347079
4,2026-01-24T15:00:00Z,Manchester City FC,Wolverhampton Wanderers FC,0.775258,0.122040,0.071750,0.801188,0.131999,0.066813
...,...,...,...,...,...,...,...,...,...
156,2026-05-24T15:00:00Z,Liverpool FC,Brentford FC,0.564805,0.197370,0.228051,,,
157,2026-05-24T15:00:00Z,Manchester City FC,Aston Villa FC,0.582138,0.210767,0.201876,,,
158,2026-05-24T15:00:00Z,Nottingham Forest FC,AFC Bournemouth,0.412269,0.236427,0.348870,,,
159,2026-05-24T15:00:00Z,Tottenham Hotspur FC,Everton FC,0.420289,0.258114,0.320570,,,


In [286]:
df_final_probabilities["p_home_final"] = np.where(
    df_final_probabilities["p_home_book"].notna(),
    df_final_probabilities["p_home_book"],
    df_final_probabilities["p_home_win"]
)

df_final_probabilities["p_draw_final"] = np.where(
    df_final_probabilities["p_draw_book"].notna(),
    df_final_probabilities["p_draw_book"],
    df_final_probabilities["p_draw"]
)

df_final_probabilities["p_away_final"] = np.where(
    df_final_probabilities["p_away_book"].notna(),
    df_final_probabilities["p_away_book"],
    df_final_probabilities["p_away_win"]
)

In [287]:
print("Used betting odds:", df_final_probabilities["p_home_book"].notna().sum())
print("Used model:", df_final_probabilities["p_home_book"].isna().sum())


Used betting odds: 21
Used model: 140


In [288]:
df_final_probabilities = df_final_probabilities[[
    "utcDate",
    "homeTeam",
    "awayTeam",
    "p_home_final",
    "p_draw_final",
    "p_away_final"
]]

In [289]:
df_final_probabilities

Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_final,p_draw_final,p_away_final
0,2026-01-19T20:00:00Z,Brighton & Hove Albion FC,AFC Bournemouth,0.511520,0.241327,0.247152
1,2026-01-24T12:30:00Z,West Ham United FC,Sunderland AFC,0.392545,0.287363,0.320091
2,2026-01-24T15:00:00Z,Burnley FC,Tottenham Hotspur FC,0.248495,0.274853,0.476652
3,2026-01-24T15:00:00Z,Fulham FC,Brighton & Hove Albion FC,0.372327,0.280594,0.347079
4,2026-01-24T15:00:00Z,Manchester City FC,Wolverhampton Wanderers FC,0.801188,0.131999,0.066813
...,...,...,...,...,...,...
156,2026-05-24T15:00:00Z,Liverpool FC,Brentford FC,0.564805,0.197370,0.228051
157,2026-05-24T15:00:00Z,Manchester City FC,Aston Villa FC,0.582138,0.210767,0.201876
158,2026-05-24T15:00:00Z,Nottingham Forest FC,AFC Bournemouth,0.412269,0.236427,0.348870
159,2026-05-24T15:00:00Z,Tottenham Hotspur FC,Everton FC,0.420289,0.258114,0.320570


In [290]:
df_final_probabilities["homeTeam"].unique()

array(['Brighton & Hove Albion FC', 'West Ham United FC', 'Burnley FC',
       'Fulham FC', 'Manchester City FC', 'AFC Bournemouth',
       'Crystal Palace FC', 'Brentford FC', 'Newcastle United FC',
       'Arsenal FC', 'Everton FC', 'Leeds United FC',
       'Wolverhampton Wanderers FC', 'Chelsea FC', 'Liverpool FC',
       'Aston Villa FC', 'Manchester United FC', 'Nottingham Forest FC',
       'Tottenham Hotspur FC', 'Sunderland AFC'], dtype=object)

In [291]:
name_map = {
    "Aston Villa FC": "Aston Villa",
    "Brighton & Hove Albion FC": "Brighton & Hove Albion",
    "AFC Bournemouth": "AFC Bournemouth",   # keep as is
    "Bournemouth": "AFC Bournemouth",
    "Sunderland AFC": "Sunderland",
    "Newcastle United FC": "Newcastle United",
    "Manchester City FC": "Manchester City",
    "Manchester United FC": "Manchester United",
    "West Ham United FC": "West Ham United",
    "Wolverhampton Wanderers FC": "Wolverhampton Wanderers",
    "Tottenham Hotspur FC": "Tottenham Hotspur",
    "Crystal Palace FC": "Crystal Palace",
    "Brentford FC": "Brentford",
    "Everton FC": "Everton",
    "Leeds United FC": "Leeds United",
    "Chelsea FC": "Chelsea",
    "Liverpool FC": "Liverpool",
    "Nottingham Forest FC": "Nottingham Forest",
    "Burnley FC": "Burnley",
    "Fulham FC": "Fulham",
    "Arsenal FC": "Arsenal"
}

df_final_probabilities["home_team_norm"] = df_final_probabilities["homeTeam"].replace(name_map)
df_final_probabilities["away_team_norm"] = df_final_probabilities["awayTeam"].replace(name_map)

premierleague["team_norm"] = premierleague["team"].replace({
    "Brighton & Hove Albion": "Brighton & Hove Albion",
    "AFC Bournemouth": "AFC Bournemouth"
})


In [292]:
set(df_final_probabilities["home_team_norm"].unique()) - set(premierleague["team_norm"].unique())


set()

In [293]:
df_simulation = df_final_probabilities.copy()

In [297]:
# Normalize probabilities so they sum to 1
prob_cols = ["p_home_final", "p_draw_final", "p_away_final"]
df_simulation[prob_cols] = df_simulation[prob_cols].div(df_simulation[prob_cols].sum(axis=1), axis=0)

In [298]:
df_simulation.head()

Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_final,p_draw_final,p_away_final,home_team_norm,away_team_norm
0,2026-01-19T20:00:00Z,Brighton & Hove Albion FC,AFC Bournemouth,0.51152,0.241327,0.247152,Brighton & Hove Albion,AFC Bournemouth
1,2026-01-24T12:30:00Z,West Ham United FC,Sunderland AFC,0.392545,0.287363,0.320091,West Ham United,Sunderland
2,2026-01-24T15:00:00Z,Burnley FC,Tottenham Hotspur FC,0.248495,0.274853,0.476652,Burnley,Tottenham Hotspur
3,2026-01-24T15:00:00Z,Fulham FC,Brighton & Hove Albion FC,0.372327,0.280594,0.347079,Fulham,Brighton & Hove Albion
4,2026-01-24T15:00:00Z,Manchester City FC,Wolverhampton Wanderers FC,0.801188,0.131999,0.066813,Manchester City,Wolverhampton Wanderers


In [299]:
def simulate_once(fixtures, table):
    table_sim = table.copy()

    # Use normalized team name column
    points = dict(zip(table_sim["team_norm"], table_sim["pts"]))

    for _, row in fixtures.iterrows():
        home = row["home_team_norm"]
        away = row["away_team_norm"]

        # choose outcome
        probs = [row["p_home_final"], row["p_draw_final"], row["p_away_final"]]
        outcome = np.random.choice(["H", "D", "A"], p=probs)

        if outcome == "H":
            points[home] += 3
        elif outcome == "D":
            points[home] += 1
            points[away] += 1
        else:
            points[away] += 3

    result_df = table_sim.copy()
    result_df["pts"] = result_df["team_norm"].map(points)

    # sort by points and goal difference
    result_df = result_df.sort_values(["pts", "gd"], ascending=[False, False])
    result_df["position"] = np.arange(1, len(result_df)+1)

    return result_df


def run_simulations(fixtures, table, n_sim=10000):
    position_counts = {team: np.zeros(len(table)) for team in table["team_norm"]}

    for _ in range(n_sim):
        final_table = simulate_once(fixtures, table)

        for _, row in final_table.iterrows():
            position_counts[row["team_norm"]][row["position"]-1] += 1

    pos_df = pd.DataFrame(position_counts, index=np.arange(1, len(table)+1))
    pos_df.index.name = "position"
    return pos_df

In [None]:
# RUN
position_distribution = run_simulations(df_simulation, premierleague, n_sim=20000)

In [None]:
position_distribution_t = position_distribution.T

In [None]:
position_distribution_pct = position_distribution_t.div(
    position_distribution_t.sum(axis=1),
    axis=0
) * 100


In [None]:
vmax = 40

position_distribution_pct.style \
    .background_gradient(
        cmap=green_cmap,
        vmin=0,
        vmax=vmax
    ) \
    .applymap(lambda x: "background-color: #ffdddd" if x == 0 else "") \
    .format("{:.2f}")
