## ðŸ“ˆ Predicting Premier League Final Positions Using Betting Odds & Simulation

**Competition:** English Premier League 2025/26  
**Purpose:** Estimate probabilities of final league positions using betting market information and simulation  
**Methods:** Odds-implied probabilities, Monte Carlo simulation, scenario analysis  
**Author:** [Victoria Friss de Kereki](https://www.linkedin.com/in/victoria-friss-de-kereki/)  

---

**Notebook first written:** `17/01/2026`  
**Last updated:** `17/01/2026`  

> This notebook develops a probabilistic framework to predict final Premier League final positions using betting odds as market-based expectations.
>
> Betting odds are transformed into implied probabilities and adjusted for bookmaker margin. These probabilities are then used to simulate the remainder of the season via Monte Carlo methods, generating distributions over final points totals and league positions.
>
> The analysis focuses on estimating the likelihood of key outcomes such as title wins, top-four finishes, relegation, and mid-table placements. Results are presented at team level with uncertainty intervals, and the framework can be extended to incorporate form, fixture difficulty, or alternative predictive inputs beyond betting markets.


In [4]:
import soccerdata as sd

## 1. Premier League Final Standings (ESPN Scraping)
##### Using the ESPN scraper I built in my previous project.

In [7]:
import pandas as pd

year = 2025  # current Premier League season start year

url = f"https://www.espn.com/soccer/standings/_/league/ENG.1/season/{year}"
tables = pd.read_html(url)

teams_raw = tables[0]
stats = tables[1]

teams = pd.DataFrame()
teams["position"] = teams_raw.iloc[:, 0].str.extract(r"^(\d+)").astype(int)
teams["team"] = (
    teams_raw.iloc[:, 0]
    .str.replace(r"^\d+", "", regex=True)
    .str.replace(r"^[A-Z]{2,3}", "", regex=True)
    .str.strip()
)

stats.columns = ["gp", "w", "d", "l", "gf", "ga", "gd", "pts"]
stats = stats.apply(lambda c: c.astype(str)
                              .str.replace("+", "", regex=False)
                              .astype(int))

premierleague = pd.concat([teams, stats], axis=1)
# premierleague["season"] = f"{year}-{year+1}"

premierleague


Unnamed: 0,position,team,gp,w,d,l,gf,ga,gd,pts
0,1,Arsenal,22,15,5,2,40,14,26,50
1,2,Manchester City,22,13,4,5,45,21,24,43
2,3,Aston Villa,21,13,4,4,33,24,9,43
3,4,Liverpool,22,10,6,6,33,29,4,36
4,5,Manchester United,22,9,8,5,38,32,6,35
5,6,Chelsea,22,9,7,6,36,24,12,34
6,7,Brentford,22,10,3,9,35,30,5,33
7,8,Sunderland,22,8,9,5,23,23,0,33
8,9,Newcastle United,21,9,5,7,32,27,5,32
9,10,Fulham,22,9,4,9,30,31,-1,31


## 2. Get betting odds using API

In [24]:
from dotenv import load_dotenv
import os

# Load variables from API_KEY.env
load_dotenv("API_KEY.env")

API_KEY = os.getenv("ODDS_DATA_API_KEY")

if api_key is None:
    raise ValueError("API_KEY not found. Check API_KEY.env")

print("API key loaded successfully")

API key loaded successfully


In [26]:
import requests
import pandas as pd
from datetime import datetime

url = "https://api.the-odds-api.com/v4/sports/soccer_epl/odds"

params = {
    "apiKey": API_KEY,
    "regions": "uk",
    "markets": "h2h",
    "oddsFormat": "decimal",
    "dateFormat": "iso",
    "days": 365  # get all upcoming matches for the next year
}

response = requests.get(url, params=params)
response.raise_for_status()

odds_data = response.json()
print("Total upcoming matches:", len(odds_data))

Total upcoming matches: 23


In [27]:
import pandas as pd

def flatten_odds(data):
    rows = []

    for match in data:
        match_id = match["id"]
        home = match["home_team"]
        away = match["away_team"]
        time = match["commence_time"]

        for book in match["bookmakers"]:
            bookmaker = book["title"]

            # Find h2h market
            h2h = next((m for m in book["markets"] if m["key"] == "h2h"), None)
            if not h2h:
                continue

            outcomes = {o["name"]: o["price"] for o in h2h["outcomes"]}

            rows.append({
                "match_id": match_id,
                "commence_time": time,
                "home_team": home,
                "away_team": away,
                "bookmaker": bookmaker,
                "home_odds": outcomes.get(home),
                "draw_odds": outcomes.get("Draw"),
                "away_odds": outcomes.get(away),
            })

    return pd.DataFrame(rows)

df = flatten_odds(odds_data)
df.head()

Unnamed: 0,match_id,commence_time,home_team,away_team,bookmaker,home_odds,draw_odds,away_odds
0,e15eb2362921b16a6b6a0397ce607a11,2026-01-18T14:00:00Z,Wolverhampton Wanderers,Newcastle United,Smarkets,5.9,3.4,1.81
1,e15eb2362921b16a6b6a0397ce607a11,2026-01-18T14:00:00Z,Wolverhampton Wanderers,Newcastle United,Paddy Power,5.0,3.25,1.73
2,e15eb2362921b16a6b6a0397ce607a11,2026-01-18T14:00:00Z,Wolverhampton Wanderers,Newcastle United,Sky Bet,5.0,3.3,1.75
3,e15eb2362921b16a6b6a0397ce607a11,2026-01-18T14:00:00Z,Wolverhampton Wanderers,Newcastle United,Betway,4.75,3.1,1.83
4,e15eb2362921b16a6b6a0397ce607a11,2026-01-18T14:00:00Z,Wolverhampton Wanderers,Newcastle United,888sport,4.6,3.25,1.75


In [57]:
betting_odds_avg = (
    df.groupby(["match_id", "home_team", "away_team"])
      .agg({
          "home_odds": "mean",
          "draw_odds": "mean",
          "away_odds": "mean"
      })
      .reset_index()
)

betting_odds_avg.head()

Unnamed: 0,match_id,home_team,away_team,home_odds,draw_odds,away_odds
0,1ca6d3d9cde3e58a39211feb9188530c,Newcastle United,Aston Villa,2.017647,3.608824,3.435294
1,1e811fa7ead0a3e6ef920b15b2bbb95d,Burnley,Tottenham Hotspur,3.802778,3.45,1.965556
2,342788786c22e570ed2da53a9608113f,Brighton and Hove Albion,Bournemouth,1.855,3.933333,3.808333
3,36820753efb36739a83c6e5e440827b2,Brighton and Hove Albion,Everton,1.8,3.11,3.49625
4,38a3cb5e295f55e274d589fc646cf2dd,Tottenham Hotspur,Manchester City,4.61,3.58125,1.6


In [58]:
# 1) Convert odds -> raw probabilities
betting_odds_avg["p_home_raw"] = 1 / betting_odds_avg["home_odds"]
betting_odds_avg["p_draw_raw"] = 1 / betting_odds_avg["draw_odds"]
betting_odds_avg["p_away_raw"] = 1 / betting_odds_avg["away_odds"]

# 2) Normalize (remove bookmaker margin)
betting_odds_avg["total_raw"] = (
    betting_odds_avg["p_home_raw"] +
    betting_odds_avg["p_draw_raw"] +
    betting_odds_avg["p_away_raw"]
)

betting_odds_avg["p_home_book"] = betting_odds_avg["p_home_raw"] / betting_odds_avg["total_raw"]
betting_odds_avg["p_draw_book"] = betting_odds_avg["p_draw_raw"] / betting_odds_avg["total_raw"]
betting_odds_avg["p_away_book"] = betting_odds_avg["p_away_raw"] / betting_odds_avg["total_raw"]

# 3) Keep only useful columns
betting_odds_avg = betting_odds_avg[[
#match_id",
    "home_team",
    "away_team",
    "p_home_book",
    "p_draw_book",
    "p_away_book",
    "home_odds",
    "draw_odds",
    "away_odds"
]]

betting_odds_avg.head()

Unnamed: 0,home_team,away_team,p_home_book,p_draw_book,p_away_book,home_odds,draw_odds,away_odds
0,Newcastle United,Aston Villa,0.465893,0.260475,0.273632,2.017647,3.608824,3.435294
1,Burnley,Tottenham Hotspur,0.247711,0.27304,0.479249,3.802778,3.45,1.965556
2,Brighton and Hove Albion,Bournemouth,0.510543,0.240777,0.24868,1.855,3.933333,3.808333
3,Brighton and Hove Albion,Everton,0.477643,0.276449,0.245908,1.8,3.11,3.49625
4,Tottenham Hotspur,Manchester City,0.193479,0.249058,0.557462,4.61,3.58125,1.6


## 3. Get fixtures.

In [29]:
# Load variables from API_KEY.env
load_dotenv("API_KEY.env")

API_KEY = os.getenv("FOOTBALL_DATA_API_KEY")

if api_key is None:
    raise ValueError("API_KEY not found. Check API_KEY.env")

print("API key loaded successfully")

API key loaded successfully


In [32]:
import requests
import pandas as pd
from datetime import datetime, timedelta

url = "https://api.football-data.org/v4/competitions/PL/matches"

headers = {
    "X-Auth-Token": API_KEY
}

today = datetime.utcnow().date()
end_of_season = today + timedelta(days=365)  # big range to cover all remaining games

params = {
    "status": "SCHEDULED",
    "dateFrom": today.isoformat(),
    "dateTo": end_of_season.isoformat()
}

response = requests.get(url, headers=headers, params=params)
response.raise_for_status()

data = response.json()
fixtures = data["matches"]

df_fixtures = pd.DataFrame(fixtures)

df_fixtures_clean = df_fixtures[[
    "utcDate",
    "status",
    "homeTeam",
    "awayTeam"
]]

df_fixtures_clean.head()
print("Total scheduled matches:", len(df_fixtures_clean))


Total scheduled matches: 162


In [33]:
df_fixtures_clean["homeTeam"] = df_fixtures_clean["homeTeam"].apply(lambda x: x["name"])
df_fixtures_clean["awayTeam"] = df_fixtures_clean["awayTeam"].apply(lambda x: x["name"])

In [34]:
df_fixtures_clean

Unnamed: 0,utcDate,status,homeTeam,awayTeam
0,2026-01-18T16:30:00Z,TIMED,Aston Villa FC,Everton FC
1,2026-01-19T20:00:00Z,TIMED,Brighton & Hove Albion FC,AFC Bournemouth
2,2026-01-24T12:30:00Z,TIMED,West Ham United FC,Sunderland AFC
3,2026-01-24T15:00:00Z,TIMED,Burnley FC,Tottenham Hotspur FC
4,2026-01-24T15:00:00Z,TIMED,Fulham FC,Brighton & Hove Albion FC
...,...,...,...,...
157,2026-05-24T15:00:00Z,TIMED,Liverpool FC,Brentford FC
158,2026-05-24T15:00:00Z,TIMED,Manchester City FC,Aston Villa FC
159,2026-05-24T15:00:00Z,TIMED,Nottingham Forest FC,AFC Bournemouth
160,2026-05-24T15:00:00Z,TIMED,Tottenham Hotspur FC,Everton FC


## Get this season results (2025/26)

In [35]:
url = "https://api.football-data.org/v4/competitions/PL/matches"
params = {
    "season": 2025,   # season year
    "status": "FINISHED"
}

response = requests.get(url, headers=headers, params=params)
response.raise_for_status()
past_matches = response.json()["matches"]

In [39]:
clean_rows = []

for m in past_matches:
    row = {
        "utcDate": m["utcDate"],
        "matchday": m["matchday"],
        "status": m["status"],
        "homeTeam": m["homeTeam"]["name"],
        "awayTeam": m["awayTeam"]["name"],
        "homeGoals": m["score"]["fullTime"]["home"],
        "awayGoals": m["score"]["fullTime"]["away"],
        "winner": m["score"]["winner"]
    }
    clean_rows.append(row)

past_matches_clean = pd.DataFrame(clean_rows)
past_matches_clean.head()

Unnamed: 0,utcDate,matchday,status,homeTeam,awayTeam,homeGoals,awayGoals,winner
0,2025-08-15T19:00:00Z,1,FINISHED,Liverpool FC,AFC Bournemouth,4,2,HOME_TEAM
1,2025-08-16T11:30:00Z,1,FINISHED,Aston Villa FC,Newcastle United FC,0,0,DRAW
2,2025-08-16T14:00:00Z,1,FINISHED,Brighton & Hove Albion FC,Fulham FC,1,1,DRAW
3,2025-08-16T14:00:00Z,1,FINISHED,Sunderland AFC,West Ham United FC,3,0,HOME_TEAM
4,2025-08-16T14:00:00Z,1,FINISHED,Tottenham Hotspur FC,Burnley FC,3,0,HOME_TEAM


## Get past season results (2024/25)

In [40]:
url = "https://api.football-data.org/v4/competitions/PL/matches"
params = {
    "season": 2024,   # season year
    "status": "FINISHED"
}

response = requests.get(url, headers=headers, params=params)
response.raise_for_status()
past_matches_24 = response.json()["matches"]

In [41]:
clean_rows = []

for m in past_matches_24:
    row = {
        "utcDate": m["utcDate"],
        "matchday": m["matchday"],
        "status": m["status"],
        "homeTeam": m["homeTeam"]["name"],
        "awayTeam": m["awayTeam"]["name"],
        "homeGoals": m["score"]["fullTime"]["home"],
        "awayGoals": m["score"]["fullTime"]["away"],
        "winner": m["score"]["winner"]
    }
    clean_rows.append(row)

past_matches_24_clean = pd.DataFrame(clean_rows)
past_matches_24_clean.head()

Unnamed: 0,utcDate,matchday,status,homeTeam,awayTeam,homeGoals,awayGoals,winner
0,2024-08-16T19:00:00Z,1,FINISHED,Manchester United FC,Fulham FC,1,0,HOME_TEAM
1,2024-08-17T11:30:00Z,1,FINISHED,Ipswich Town FC,Liverpool FC,0,2,AWAY_TEAM
2,2024-08-17T14:00:00Z,1,FINISHED,Arsenal FC,Wolverhampton Wanderers FC,2,0,HOME_TEAM
3,2024-08-17T14:00:00Z,1,FINISHED,Everton FC,Brighton & Hove Albion FC,0,3,AWAY_TEAM
4,2024-08-17T14:00:00Z,1,FINISHED,Newcastle United FC,Southampton FC,1,0,HOME_TEAM


## Combine and predict

In [43]:
import pandas as pd
import numpy as np
from scipy.stats import poisson

# ----------------------------
# 1. Load your dataframes
# ----------------------------
df_current = past_matches_clean
df_prev = past_matches_24_clean
df_future = df_fixtures_clean


# ----------------------------
# 2. Combine and weight games
# ----------------------------
df_all = pd.concat([df_prev, df_current], ignore_index=True)

# Add weights: more recent games = more weight
df_all["date"] = pd.to_datetime(df_all["utcDate"])
df_all["weight"] = np.linspace(1, 2, len(df_all))  # simple linear weighting


# ----------------------------
# 3. Compute home advantage
# ----------------------------
# Home advantage = average home goals - average away goals
home_avg = df_all["homeGoals"].mean()
away_avg = df_all["awayGoals"].mean()
home_advantage = home_avg - away_avg


# ----------------------------
# 4. Calculate attack & defense strengths
# ----------------------------
teams = pd.unique(df_all[["homeTeam", "awayTeam"]].values.ravel("K"))

attack = pd.Series(1.0, index=teams)
defense = pd.Series(1.0, index=teams)

# Initialize with goals per match
team_stats = {}

for team in teams:
    home_games = df_all[df_all["homeTeam"] == team]
    away_games = df_all[df_all["awayTeam"] == team]

    goals_scored = (home_games["homeGoals"] * home_games["weight"]).sum() + \
                   (away_games["awayGoals"] * away_games["weight"]).sum()

    goals_against = (home_games["awayGoals"] * home_games["weight"]).sum() + \
                    (away_games["homeGoals"] * away_games["weight"]).sum()

    matches = home_games["weight"].sum() + away_games["weight"].sum()

    team_stats[team] = {
        "scored": goals_scored / matches,
        "against": goals_against / matches
    }

# Strengths = relative to league average
league_avg_scored = df_all["homeGoals"].mean() + df_all["awayGoals"].mean()
league_avg_scored /= 2

for team in teams:
    attack[team] = team_stats[team]["scored"] / league_avg_scored
    defense[team] = team_stats[team]["against"] / league_avg_scored


# ----------------------------
# 5. Predict probabilities for each future match
# ----------------------------
def match_probabilities(home, away):
    # expected goals
    exp_home = np.exp(np.log(league_avg_scored) + np.log(attack[home]) + np.log(defense[away]) + home_advantage)
    exp_away = np.exp(np.log(league_avg_scored) + np.log(attack[away]) + np.log(defense[home]))

    # compute probabilities up to 6 goals
    max_goals = 6
    p_home = poisson.pmf(range(max_goals + 1), exp_home)
    p_away = poisson.pmf(range(max_goals + 1), exp_away)

    # result probabilities
    p_win = 0
    p_draw = 0
    p_loss = 0

    for i in range(max_goals + 1):
        for j in range(max_goals + 1):
            prob = p_home[i] * p_away[j]
            if i > j:
                p_win += prob
            elif i == j:
                p_draw += prob
            else:
                p_loss += prob

    return p_win, p_draw, p_loss


# ----------------------------
# 6. Apply to all fixtures
# ----------------------------
results = []

for _, row in df_future.iterrows():
    home = row["homeTeam"]
    away = row["awayTeam"]

    p_win, p_draw, p_loss = match_probabilities(home, away)

    results.append({
        "utcDate": row["utcDate"],
        "homeTeam": home,
        "awayTeam": away,
        "p_home_win": p_win,
        "p_draw": p_draw,
        "p_away_win": p_loss,
        "odds_home_win": 1 / p_win,
        "odds_draw": 1 / p_draw,
        "odds_away_win": 1 / p_loss
    })

df_odds = pd.DataFrame(results)
df_odds.head()


Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_win,p_draw,p_away_win,odds_home_win,odds_draw,odds_away_win
0,2026-01-18T16:30:00Z,Aston Villa FC,Everton FC,0.50007,0.258856,0.240025,1.999718,3.863149,4.166234
1,2026-01-19T20:00:00Z,Brighton & Hove Albion FC,AFC Bournemouth,0.490528,0.209531,0.292664,2.03862,4.772555,3.416887
2,2026-01-24T12:30:00Z,West Ham United FC,Sunderland AFC,0.301156,0.280243,0.418137,3.320543,3.568332,2.391558
3,2026-01-24T15:00:00Z,Burnley FC,Tottenham Hotspur FC,0.265143,0.215993,0.513622,3.771552,4.629779,1.946958
4,2026-01-24T15:00:00Z,Fulham FC,Brighton & Hove Albion FC,0.406227,0.227696,0.362539,2.461677,4.391822,2.758323


## Compare calculated probabilitie to bookmaker ones

In [68]:
unique_bet_home = betting_odds_avg["home_team"].unique()
unique_model_home = df_odds["homeTeam"].unique()

In [69]:
print(unique_bet_home)
print(unique_model_home)

['Newcastle United' 'Burnley' 'Brighton and Hove Albion'
 'Tottenham Hotspur' 'Crystal Palace' 'Sunderland' 'Arsenal' 'Bournemouth'
 'Aston Villa' 'Brentford' 'Liverpool' 'West Ham United' 'Chelsea'
 'Manchester City' 'Wolverhampton Wanderers' 'Nottingham Forest' 'Fulham'
 'Manchester United' 'Leeds United' 'Everton']
['Aston Villa FC' 'Brighton & Hove Albion FC' 'West Ham United FC'
 'Burnley FC' 'Fulham FC' 'Manchester City FC' 'AFC Bournemouth'
 'Crystal Palace FC' 'Brentford FC' 'Newcastle United FC' 'Arsenal FC'
 'Everton FC' 'Leeds United FC' 'Wolverhampton Wanderers FC' 'Chelsea FC'
 'Liverpool FC' 'Manchester United FC' 'Nottingham Forest FC'
 'Tottenham Hotspur FC' 'Sunderland AFC']


In [85]:
def normalize_team(name):
    name = name.lower()
    name = name.replace(" fc", "")
    name = name.replace(" afc", "")
    name = name.replace("&", "and")
    name = name.replace("afc ", "")   # <--- this removes AFC from start
    name = name.strip()
    return name


In [86]:
df_odds["home_norm"] = df_odds["homeTeam"].apply(normalize_team)
df_odds["away_norm"] = df_odds["awayTeam"].apply(normalize_team)

betting_odds_avg["home_norm"] = betting_odds_avg["home_team"].apply(normalize_team)
betting_odds_avg["away_norm"] = betting_odds_avg["away_team"].apply(normalize_team)


In [87]:
unique_model_norm = df_odds["home_norm"].unique()
unique_bet_norm = betting_odds_avg["home_norm"].unique()

print(unique_model_norm)
print(unique_bet_norm)


['aston villa' 'brighton and hove albion' 'west ham united' 'burnley'
 'fulham' 'manchester city' 'bournemouth' 'crystal palace' 'brentford'
 'newcastle united' 'arsenal' 'everton' 'leeds united'
 'wolverhampton wanderers' 'chelsea' 'liverpool' 'manchester united'
 'nottingham forest' 'tottenham hotspur' 'sunderland']
['newcastle united' 'burnley' 'brighton and hove albion'
 'tottenham hotspur' 'crystal palace' 'sunderland' 'arsenal' 'bournemouth'
 'aston villa' 'brentford' 'liverpool' 'west ham united' 'chelsea'
 'manchester city' 'wolverhampton wanderers' 'nottingham forest' 'fulham'
 'manchester united' 'leeds united' 'everton']


In [88]:
df_compare = df_odds.merge(
    betting_odds_avg,
    left_on=["home_norm", "away_norm"],
    right_on=["home_norm", "away_norm"],
    how="inner"
)

print("Matched rows:", len(df_compare))
df_compare.head()


Matched rows: 22


Unnamed: 0,utcDate,homeTeam,awayTeam,p_home_win,p_draw,p_away_win,odds_home_win,odds_draw,odds_away_win,home_norm,...,home_team,away_team,p_home_book,p_draw_book,p_away_book,home_odds,draw_odds,away_odds,home_team_norm,away_team_norm
0,2026-01-18T16:30:00Z,Aston Villa FC,Everton FC,0.50007,0.258856,0.240025,1.999718,3.863149,4.166234,aston villa,...,Aston Villa,Everton,0.589267,0.239674,0.171059,1.611111,3.961111,5.55,,
1,2026-01-19T20:00:00Z,Brighton & Hove Albion FC,AFC Bournemouth,0.490528,0.209531,0.292664,2.03862,4.772555,3.416887,brighton and hove albion,...,Brighton and Hove Albion,Bournemouth,0.510543,0.240777,0.24868,1.855,3.933333,3.808333,,
2,2026-01-24T12:30:00Z,West Ham United FC,Sunderland AFC,0.301156,0.280243,0.418137,3.320543,3.568332,2.391558,west ham united,...,West Ham United,Sunderland,0.395125,0.286174,0.3187,2.379412,3.285294,2.95,,
3,2026-01-24T15:00:00Z,Burnley FC,Tottenham Hotspur FC,0.265143,0.215993,0.513622,3.771552,4.629779,1.946958,burnley,...,Burnley,Tottenham Hotspur,0.247711,0.27304,0.479249,3.802778,3.45,1.965556,,
4,2026-01-24T15:00:00Z,Fulham FC,Brighton & Hove Albion FC,0.406227,0.227696,0.362539,2.461677,4.391822,2.758323,fulham,...,Fulham,Brighton and Hove Albion,0.373931,0.279367,0.346702,2.511111,3.361111,2.708333,,


In [89]:
df_compare["diff_home"] = df_compare["p_home_win"] - df_compare["p_home_book"]
df_compare["diff_draw"] = df_compare["p_draw"] - df_compare["p_draw_book"]
df_compare["diff_away"] = df_compare["p_away_win"] - df_compare["p_away_book"]

df_compare[["homeTeam", "awayTeam", "diff_home", "diff_draw", "diff_away"]].head()

Unnamed: 0,homeTeam,awayTeam,diff_home,diff_draw,diff_away
0,Aston Villa FC,Everton FC,-0.089197,0.019182,0.068966
1,Brighton & Hove Albion FC,AFC Bournemouth,-0.020015,-0.031246,0.043984
2,West Ham United FC,Sunderland AFC,-0.09397,-0.005931,0.099437
3,Burnley FC,Tottenham Hotspur FC,0.017432,-0.057047,0.034373
4,Fulham FC,Brighton & Hove Albion FC,0.032296,-0.051671,0.015837


In [90]:
import numpy as np

rmse_home = np.sqrt(np.mean((df_compare["p_home_win"] - df_compare["p_home_book"])**2))
rmse_draw = np.sqrt(np.mean((df_compare["p_draw"] - df_compare["p_draw_book"])**2))
rmse_away = np.sqrt(np.mean((df_compare["p_away_win"] - df_compare["p_away_book"])**2))

rmse_home, rmse_draw, rmse_away


(0.0540078651504647, 0.040807298118621556, 0.04646685926216694)

In [91]:
rmse_total = np.sqrt(np.mean((
    df_compare["p_home_win"] - df_compare["p_home_book"]
)**2 + (
    df_compare["p_draw"] - df_compare["p_draw_book"]
)**2 + (
    df_compare["p_away_win"] - df_compare["p_away_book"]
)**2 ))

rmse_total


0.0821051404453026

In [92]:
df_compare["abs_diff"] = (
    abs(df_compare["diff_home"]) +
    abs(df_compare["diff_draw"]) +
    abs(df_compare["diff_away"])
)

df_compare.sort_values("abs_diff", ascending=False).head(10)[
    ["homeTeam", "awayTeam", "diff_home", "diff_draw", "diff_away"]
]


Unnamed: 0,homeTeam,awayTeam,diff_home,diff_draw,diff_away
10,Arsenal FC,Manchester United FC,0.096537,-0.05582,-0.048222
2,West Ham United FC,Sunderland AFC,-0.09397,-0.005931,0.099437
11,Everton FC,Leeds United FC,0.09777,-0.048794,-0.050379
18,Manchester United FC,Fulham FC,-0.071474,-0.027814,0.09676
0,Aston Villa FC,Everton FC,-0.089197,0.019182,0.068966
8,Brentford FC,Nottingham Forest FC,0.083126,-0.061765,-0.02547
13,Leeds United FC,Arsenal FC,-0.018294,-0.059596,0.071927
21,Sunderland AFC,Burnley FC,0.06598,-0.044644,-0.023023
15,Chelsea FC,West Ham United FC,0.059098,-0.041742,-0.030061
14,Wolverhampton Wanderers FC,AFC Bournemouth,-0.007951,-0.051655,0.054515
