## üìà Predicting English Rugby Premiership Final Positions Using Probabilistic Modelling & Simulation

**Competition:** English Rugby Premiership 2025/26  
**Purpose:** Estimate probabilities of final league positions using simulation  
**Methods:** Monte Carlo simulation  
**Author:** [Victoria Friss de Kereki](https://www.linkedin.com/in/victoria-friss-de-kereki/)   
**Medium Article:** 

---

**Notebook first written:** `17/01/2026`  
**Last updated:** `01/02/2026`  

> This notebook develops a probabilistic framework to predict final English Rugby Premiership final positions using past matches breakdown to build probabilities of each outcome for each remaining match, which are then used to simulate the remainder of the season via Monte Carlo methods, generating distributions over final points totals and league positions.
>
> The analysis focuses on estimating the likelihood of key outcomes such as title wins, top-four finishes (qualification to play-offs), etc. Results are presented at team level with uncertainty intervals, and the framework can be extended to incorporate form, fixture difficulty, or alternative predictive inputs beyond betting markets.


<div style="text-align: left;">
    <img src="Images and others for Medium/Predicting English Rugby Premiership Final Positions Using Probabilistic Modelling & Simulation.png"  
         alt="Predicting English Rugby Premiership Final Positions Using Probabilistic Modelling & Simulation"  
         width="600">
</div>

In [1]:
# Core
from datetime import datetime, timedelta
import os

# Data manipulation
import numpy as np
import pandas as pd

# APIs & environment
import requests
from dotenv import load_dotenv

# Statistics
from scipy.stats import poisson

# Visualisation
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap

## 1. Premiership Final Standings (ESPN Scraping)
##### Adapting the ESPN scraper I built in my previous project to be used for Rugby Premiership instead of Football Premier League.

In [2]:
url = "https://www.espn.com/rugby/standings/_/league/267979"
tables = pd.read_html(url)

# Fix ESPN header issue
teams_raw = tables[0].copy()

# Move column names into first row
teams_raw.loc[-1] = teams_raw.columns
teams_raw.index = teams_raw.index + 1
teams_raw = teams_raw.sort_index().reset_index(drop=True)

# Stats table is fine as-is
stats = tables[1]

# Parse team table
teams = pd.DataFrame()

teams["position"] = (
    teams_raw.iloc[:, 0]
    .astype(str)
    .str.extract(r"^(\d+)")
    .astype(int)
)

teams["team"] = (
    teams_raw.iloc[:, 0]
    .astype(str)
    .str.replace(r"^\d+", "", regex=True)        # remove position
    .str.replace(r"^[A-Z]{2,3}", "", regex=True) # remove abbreviation
    .str.strip()
)

# Parse stats table
stats.columns = [
    "gp", "w", "d", "l", "bye",
    "pf", "pa", "tf", "ta",
    "tbp", "lbp", "bp",
    "pd", "pts"
]

stats = stats.apply(
    lambda c: (
        c.astype(str)
         .str.replace("+", "", regex=False)
         .astype(int)
    )
)

# Combine
premiership = pd.concat([teams, stats], axis=1)

premiership

Unnamed: 0,position,team,gp,w,d,l,bye,pf,pa,tf,ta,tbp,lbp,bp,pd,pts
0,1,Northampton Saints,10,8,1,1,0,390,282,58,39,9,0,9,108,43
1,2,Bath Rugby,10,8,0,2,0,343,237,50,34,8,1,9,106,41
2,3,Bristol Rugby,10,8,0,2,0,290,235,41,34,5,0,5,55,37
3,4,Leicester Tigers,10,7,0,3,0,301,238,41,31,7,1,8,63,36
4,5,Exeter Chiefs,10,6,1,3,0,272,179,37,24,6,3,9,93,35
5,6,Saracens,10,5,0,5,0,383,248,56,35,9,3,12,135,32
6,7,Sale Sharks,10,3,0,7,0,285,297,39,43,5,3,8,-12,20
7,8,Gloucester Rugby,10,1,0,9,0,214,335,32,45,4,3,7,-121,11
8,9,Harlequins,10,2,0,8,0,196,351,26,52,2,0,2,-155,10
9,10,Newcastle Falcons,10,1,0,9,0,167,439,23,66,1,0,1,-272,5


In [None]:
url = "https://www.espn.com/rugby/standings/_/league/267979"
tables = pd.read_html(url)

# Fix ESPN header issue
teams_raw = tables[0].copy()

# Move column names into first row
teams_raw.loc[-1] = teams_raw.columns
teams_raw.index = teams_raw.index + 1
teams_raw = teams_raw.sort_index().reset_index(drop=True)

# Stats table is fine as-is
stats = tables[1]

# Parse team table
teams = pd.DataFrame()

teams["position"] = (
    teams_raw.iloc[:, 0]
    .astype(str)
    .str.extract(r"^(\d+)")
    .astype(int)
)

teams["team"] = (
    teams_raw.iloc[:, 0]
    .astype(str)
    .str.replace(r"^\d+", "", regex=True)        # remove position
    .str.replace(r"^[A-Z]{2,3}", "", regex=True) # remove abbreviation
    .str.strip()
)

# Parse stats table
stats.columns = [
    "gp", "w", "d", "l", "bye",
    "pf", "pa", "tf", "ta",
    "tbp", "lbp", "bp",
    "pd", "pts"
]

stats = stats.apply(
    lambda c: (
        c.astype(str)
         .str.replace("+", "", regex=False)
         .astype(int)
    )
)

# Combine
premiership = pd.concat([teams, stats], axis=1)

premiership

In [466]:
url = "https://www.espn.com/rugby/standings/_/league/267979"
tables = pd.read_html(url)

# Fix ESPN header issue

teams_raw= tables[0].copy()
teams_raw

Unnamed: 0,1NORNorthampton Saints
0,2BATBath Rugby
1,3BRIBristol Rugby
2,4LEILeicester Tigers
3,5EXEExeter Chiefs
4,6SARSaracens
5,7SALSale Sharks
6,8GLOGloucester Rugby
7,9HARHarlequins
8,10NEWNewcastle Falcons


## 2. Get betting odds using API

#### No matches betting odds available as there is still over a month to go until the next match.

## 3. Get fixtures for upcoming EPL games
#### No API available for Premiership Rugby, so performing web scraping from https://www.skysports.com/rugby-union/competitions/gallagher-prem/fixtures instead.

In [280]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "https://www.skysports.com/rugby-union/competitions/gallagher-prem/fixtures"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
}

response = requests.get(url, headers=headers)
response.raise_for_status()

soup = BeautifulSoup(response.text, "html.parser")

fixtures_container = soup.find("div", class_="fixres__body")

month = None
date = None

dates = []
months = []
home_teams = []
away_teams = []
times = []

for elem in fixtures_container.children:
    # Skip empty elements or strings
    if not hasattr(elem, "name"):
        continue

    # Month header
    if elem.name == "h3" and "fixres__header1" in elem.get("class", []):
        month = elem.get_text(strip=True)

    # Date header
    elif elem.name == "h4" and "fixres__header2" in elem.get("class", []):
        date = elem.get_text(strip=True)

    # Match item
    elif elem.name == "div" and "fixres__item" in elem.get("class", []):
        home = elem.select_one("span.matches__participant--side1 span.swap-text__target").get_text(strip=True)
        away = elem.select_one("span.matches__participant--side2 span.swap-text__target").get_text(strip=True)
        time = elem.select_one("span.matches__date").get_text(strip=True)

        months.append(month)
        dates.append(date)
        home_teams.append(home)
        away_teams.append(away)
        times.append(time)

# Build DataFrame
df_future_1 = pd.DataFrame({
    "month": months,
    "date": dates,
    "time": times,
    "home_team": home_teams,
    "away_team": away_teams
})

print(df_future_1.head())
print("Total fixtures scraped:", len(df_future_1))

        month                 date   time         home_team  \
0  March 2026    Friday 20th March  19:45              Bath   
1  March 2026  Saturday 21st March  15:00        Harlequins   
2  March 2026  Saturday 21st March  15:00       Northampton   
3  March 2026  Saturday 21st March  15:05            Exeter   
4  March 2026    Sunday 22nd March  15:00  Leicester Tigers   

             away_team  
0             Saracens  
1           Gloucester  
2  Newcastle Red Bulls  
3          Sale Sharks  
4        Bristol Bears  
Total fixtures scraped: 43


In [321]:
# Remove fixtures with TBC in either team
df_future_1 = df_future_1[~df_future_1["home_team"].str.contains("TBC") & ~df_future_1["away_team"].str.contains("TBC")].reset_index(drop=True)
df_future = df_future_1.copy()
print("Total fixtures after removing TBC matches:", len(df_future))

Total fixtures after removing TBC matches: 40


In [322]:
df_future

Unnamed: 0,month,date,time,home_team,away_team
0,March 2026,Friday 20th March,19:45,Bath,Saracens
1,March 2026,Saturday 21st March,15:00,Harlequins,Gloucester
2,March 2026,Saturday 21st March,15:00,Northampton,Newcastle Red Bulls
3,March 2026,Saturday 21st March,15:05,Exeter,Sale Sharks
4,March 2026,Sunday 22nd March,15:00,Leicester Tigers,Bristol Bears
5,March 2026,Friday 27th March,19:45,Newcastle Red Bulls,Exeter
6,March 2026,Saturday 28th March,13:00,Gloucester,Leicester Tigers
7,March 2026,Saturday 28th March,15:30,Bristol Bears,Harlequins
8,March 2026,Saturday 28th March,18:00,Saracens,Northampton
9,March 2026,Sunday 29th March,15:00,Sale Sharks,Bath


## 4. Get this season (2025/26) and last 2 seasons (2024/25 & 2023/24) results
### Again, no API available here. Will do web scraping from https://www.tntsports.co.uk/rugby/premiership/calendar-results.shtml to get tries and full breakdown.

In [209]:
import re

def scrape_match_stats(url):
    response = requests.get(url)
    if response.status_code != 200:
        print(f"Failed to load {url}")
        return None
    
    soup = BeautifulSoup(response.text, "html.parser")
    
    # Extract season from URL
    match = re.search(r'/premiership/(\d{4}-\d{4})/', url)
    season = match.group(1) if match else None
    
    # Team names
    team_tags = soup.select('a[data-testid="atom-participant-banner"] div.overflow-hidden')
    if len(team_tags) != 2:
        print(f"Couldn't find team names for {url}")
        return None
    
    home_team = team_tags[0].text.strip()
    away_team = team_tags[1].text.strip()
    
    # Team scores
    score_tags = soup.select('div[data-testid="team-match-header-score-atom-container"] div[data-testid="match-score-atom-score-content-box-score"]')
    if len(score_tags) == 2:
        home_score = int(score_tags[0].text.strip())
        away_score = int(score_tags[1].text.strip())
    else:
        home_score = away_score = None
    
    # Match date
    date_tag = soup.select_one('div[data-testid="molecule-match-header-top"] div.caption-2.text-neutral-05')
    if date_tag:
        date_text = date_tag.text.strip()
        match_date = date_text.split("/")[-1].strip() if "/" in date_text else None
    else:
        match_date = None
    
    # Stats rows
    stats_rows = soup.select('div[data-testid="molecule-team-action-row"]')
    stats_dict = {}
    
    for row in stats_rows:
        stat_name_tag = row.find('span')
        if not stat_name_tag:
            continue
        stat_name = stat_name_tag.text.strip()
        
        participants = row.select('div[data-testid="molecule-team-action-row-participant"]')
        if len(participants) != 2:
            continue
        
        home_val = int(participants[0].find('div', class_='caps-s6-fx').text.strip())
        away_val = int(participants[1].find('div', class_='caps-s6-fx').text.strip())
        
        stats_dict[stat_name] = [home_val, away_val]
    
    return {
        "season": season,
        "date": match_date,
        "home_team": home_team,
        "away_team": away_team,
        "home_score": home_score,
        "away_score": away_score,
        **stats_dict
    }

##### Generate list of URLs

In [194]:
season_match_ranges = {
    "2023-2024": {"start": 1457678, "end": 1457767},  # 90
    "2024-2025": {"start": 1548080, "end": 1548169},  # 90
    "2025-2026": {"start": 1628800, "end": 1628844},  # 45 - first half of current season
}


urls = []
season_counters = {s: 0 for s in season_match_ranges}

for _, row in all_results.iterrows():
    season = row["season"]
    start_year, end_year = season.split("-")
    season_str = f"{start_year}-20{end_year}"

    if season_str not in season_match_ranges:
        continue

    start = season_match_ranges[season_str]["start"]
    end = season_match_ranges[season_str]["end"]

    match_number = start + season_counters[season_str]

    # stop once season is full
    if match_number > end:
        continue

    url = (
        f"https://www.tntsports.co.uk/rugby/premiership/"
        f"{season_str}/live-match_mtc{match_number}/live.shtml"
    )

    urls.append(url)
    season_counters[season_str] += 1

In [212]:
import time

all_matches_stats = []

total = len(urls)

for i, url in enumerate(urls, start=1):
    print(f"[{i}/{total}] Scraping: {url}")

    try:
        match = scrape_match_stats(url)

        if match:
            all_matches_stats.append(match)
            print(f"    ‚úÖ Success: {match['home_team']} vs {match['away_team']}")
        else:
            print("    ‚ö†Ô∏è  No data returned")

    except Exception as e:
        print(f"    ‚ùå Error scraping {url}: {e}")

    # polite delay to avoid hammering the site
    time.sleep(1)

print("\nFinished scraping.")
print(f"Total successful matches: {len(all_matches_stats)}")

[1/225] Scraping: https://www.tntsports.co.uk/rugby/premiership/2023-2024/live-match_mtc1457678/live.shtml
    ‚úÖ Success: Bristol Bears vs Leicester Tigers
[2/225] Scraping: https://www.tntsports.co.uk/rugby/premiership/2023-2024/live-match_mtc1457679/live.shtml
    ‚úÖ Success: Exeter Chiefs vs Saracens
[3/225] Scraping: https://www.tntsports.co.uk/rugby/premiership/2023-2024/live-match_mtc1457680/live.shtml
    ‚úÖ Success: Bath Rugby vs Newcastle Red Bulls
[4/225] Scraping: https://www.tntsports.co.uk/rugby/premiership/2023-2024/live-match_mtc1457681/live.shtml
    ‚úÖ Success: Gloucester Rugby vs Harlequins
[5/225] Scraping: https://www.tntsports.co.uk/rugby/premiership/2023-2024/live-match_mtc1457682/live.shtml
    ‚úÖ Success: Sale Sharks vs Northampton Saints
[6/225] Scraping: https://www.tntsports.co.uk/rugby/premiership/2023-2024/live-match_mtc1457683/live.shtml
    ‚úÖ Success: Newcastle Red Bulls vs Gloucester Rugby
[7/225] Scraping: https://www.tntsports.co.uk/rugby/premi

In [214]:
# Convert list of dicts to DataFrame
df_matches = pd.DataFrame(all_matches_stats)

# Split stats lists into separate home/away columns
for stat in ['Tries', 'Conversions', 'Penalties', 'Drops']:
    df_matches[[f'{stat}_home', f'{stat}_away']] = pd.DataFrame(df_matches[stat].tolist(), index=df_matches.index)
    df_matches.drop(columns=[stat], inplace=True)

# Reorder columns
df_matches = df_matches[
    ['season', 'date', 'home_team', 'away_team', 
     'home_score', 'away_score',
     'Tries_home', 'Tries_away',
     'Conversions_home', 'Conversions_away',
     'Penalties_home', 'Penalties_away',
     'Drops_home', 'Drops_away']
]

# Save to CSV
df_matches.to_csv('rugby_matches_stats.csv', index=False)

df_matches['season'].value_counts().sort_index().head()

Unnamed: 0,season,date,home_team,away_team,home_score,away_score,Tries_home,Tries_away,Conversions_home,Conversions_away,Penalties_home,Penalties_away,Drops_home,Drops_away
0,2023-2024,13.10.2023,Bristol Bears,Leicester Tigers,25,14,3,2,2,2,2,0,0,0
1,2023-2024,14.10.2023,Exeter Chiefs,Saracens,65,10,11,2,5,0,0,0,0,0
2,2023-2024,14.10.2023,Bath Rugby,Newcastle Red Bulls,34,26,6,3,2,1,0,3,0,0
3,2023-2024,14.10.2023,Gloucester Rugby,Harlequins,29,28,4,3,3,2,1,3,0,0
4,2023-2024,15.10.2023,Sale Sharks,Northampton Saints,20,15,3,2,1,1,1,1,0,0


##### Some basic quality checks

In [217]:
df_matches['season'].value_counts().sort_index()

season
2023-2024    90
2024-2025    90
2025-2026    45
Name: count, dtype: int64

In [218]:
# Count home matches
home_counts = df_matches['home_team'].value_counts()

# Count away matches
away_counts = df_matches['away_team'].value_counts()

# Combine them
total_counts = home_counts.add(away_counts, fill_value=0).astype(int).sort_values(ascending=False)

print(total_counts)

Bath Rugby             45
Bristol Bears          45
Exeter Chiefs          45
Gloucester Rugby       45
Harlequins             45
Leicester Tigers       45
Newcastle Red Bulls    45
Northampton Saints     45
Sale Sharks            45
Saracens               45
Name: count, dtype: int32


## 5. Combine and calculate probabilities of TRIES, CONVERSIONS, PENALTIES, TRY BONUS POINT, & LOSING BONUS POINT

In [225]:
past_premierships = df_matches.copy()

In [226]:
# Add weights: more recent games = more weight
past_premierships["weight"] = np.linspace(1, 2, len(past_premierships))  # simple linear weighting

In [227]:
past_premierships.tail()

Unnamed: 0,season,date,home_team,away_team,home_score,away_score,Tries_home,Tries_away,Conversions_home,Conversions_away,Penalties_home,Penalties_away,Drops_home,Drops_away,weight
220,2025-2026,02.01.2026,Bristol Bears,Sale Sharks,19,17,3,2,2,2,0,1,0,0,1.982143
221,2025-2026,02.01.2026,Newcastle Red Bulls,Gloucester Rugby,25,19,3,3,2,2,2,0,0,0,1.986607
222,2025-2026,03.01.2026,Bath Rugby,Exeter Chiefs,33,26,5,4,4,3,0,0,0,0,1.991071
223,2025-2026,03.01.2026,Northampton Saints,Harlequins,66,21,10,3,8,3,0,0,0,0,1.995536
224,2025-2026,04.01.2026,Leicester Tigers,Saracens,36,28,5,4,4,4,1,0,0,0,2.0


In [240]:
home_advantage_tries = past_premierships['Tries_home'].mean() - past_premierships['Tries_away'].mean()
print("Home advantage (average tries):", home_advantage_tries)

home_advantage_penalties = past_premierships['Penalties_home'].mean() - past_premierships['Penalties_away'].mean()
print("Home advantage (average penalties):", home_advantage_penalties)      
# NO HOME ADVANTAGE FOR PENALTIES. Value is small and even negative, will be ignored.

Home advantage (average tries): 1.1511111111111108
Home advantage (average penalties): -0.06222222222222218


In [377]:
import pandas as pd

# All teams
teams = pd.unique(past_premierships[["home_team", "away_team"]].values.ravel("K"))

# Initialize
team_stats = {}
attack_points = pd.Series(1.0, index=teams)
defense_points = pd.Series(1.0, index=teams)
attack_tries = pd.Series(1.0, index=teams)
defense_tries = pd.Series(1.0, index=teams)

for team in teams:
    home_games = past_premierships[past_premierships["home_team"] == team]
    away_games = past_premierships[past_premierships["away_team"] == team]

    # Points scored/conceded
    points_scored = (home_games["home_score"]*home_games["weight"]).sum() + \
                    (away_games["away_score"]*away_games["weight"]).sum()
    points_conceded = (home_games["away_score"]*home_games["weight"]).sum() + \
                      (away_games["home_score"]*away_games["weight"]).sum()

    # Tries scored/conceded
    tries_scored = (home_games["Tries_home"]*home_games["weight"]).sum() + \
                   (away_games["Tries_away"]*away_games["weight"]).sum()
    tries_conceded = (home_games["Tries_away"]*home_games["weight"]).sum() + \
                     (away_games["Tries_home"]*away_games["weight"]).sum()

    # Weighted matches
    matches = home_games["weight"].sum() + away_games["weight"].sum()

    # Save per game stats
    team_stats[team] = {
        "points_scored_per_game": points_scored/matches,
        "points_conceded_per_game": points_conceded/matches,
        "tries_scored_per_game": tries_scored/matches,
        "tries_conceded_per_game": tries_conceded/matches
    }

# Total points and tries
#total_points = past_premierships["home_score"].sum() + past_premierships["away_score"].sum()
total_tries  = past_premierships["Tries_home"].sum() + past_premierships["Tries_away"].sum()
# Total games (each row = 1 game)
total_games = len(past_premierships)
# League average per team per game
#league_avg_points = total_points / (2 * total_games)
league_avg_tries  = total_tries / (2 * total_games)

# Relative scores
for team in teams:
#    attack_points[team] = team_stats[team]["points_scored_per_game"]
#    defense_points[team] = team_stats[team]["points_conceded_per_game"]
    attack_tries[team] = team_stats[team]["tries_scored_per_game"]/league_avg_tries
    defense_tries[team] = team_stats[team]["tries_conceded_per_game"]/league_avg_tries

# Weighted conversion rate per team
teams_conv = pd.concat([
    past_premierships[['home_team','Tries_home','Conversions_home','weight']].rename(
        columns={'home_team':'team','Tries_home':'tries','Conversions_home':'conversions'}
    ),
    past_premierships[['away_team','Tries_away','Conversions_away','weight']].rename(
        columns={'away_team':'team','Tries_away':'tries','Conversions_away':'conversions'}
    )
])
team_conversion = teams_conv.groupby('team').apply(
    lambda x: (x['conversions']*x['weight']).sum() / (x['tries']*x['weight']).sum()
)

# League average per team per game - penalties and drops
total_penalties = past_premierships["Penalties_home"].sum() + past_premierships["Penalties_away"].sum()
league_avg_penalties  = total_penalties / (2 * total_games) 
total_drops = past_premierships["Drops_home"].sum() + past_premierships["Drops_away"].sum()
league_avg_drops  = total_drops / (2 * total_games)


# --- Relative penalties and drops (offensive) ---
penalties_per_game_rel = team_penalty_drop['penalties_per_game'] / league_avg_penalties
drops_per_game_rel     = team_penalty_drop['drops_per_game'] / league_avg_drops

# --- Relative penalties and drops (defensive) ---
penalties_conceded_per_game_rel = team_penalty_drop_def['penalties_conceded_per_game'] / league_avg_penalties
drops_conceded_per_game_rel     = team_penalty_drop_def['drops_conceded_per_game'] / league_avg_drops

# --- Combine all into team_strengths ---
team_strengths = pd.DataFrame({
    "attack_tries": attack_tries,
    "defense_tries": defense_tries,
    "conversion_rate": team_conversion,
    "attack_penalties": penalties_per_game_rel,
    "attack_drops": drops_per_game_rel,
    "defense_penalties": penalties_conceded_per_game_rel,
    "defense_drops": drops_conceded_per_game_rel
})

team_strengths = team_strengths.sort_values("attack_tries", ascending=False)
team_strengths

  team_conversion = teams_conv.groupby('team').apply(


Unnamed: 0,attack_tries,defense_tries,conversion_rate,attack_penalties,attack_drops,defense_penalties,defense_drops
Bristol Bears,1.291643,1.048457,0.74387,0.869004,0.0,1.027919,1.190066
Bath Rugby,1.256366,0.833847,0.752044,0.710982,0.565252,1.03627,2.481158
Northampton Saints,1.158368,1.016774,0.755482,0.879905,1.224368,1.039991,0.53777
Saracens,1.124291,0.893809,0.650784,1.168249,2.159143,0.986109,0.745489
Gloucester Rugby,1.007253,1.142476,0.704966,0.630689,0.0,0.969686,0.539897
Exeter Chiefs,0.972519,0.921898,0.737683,0.88835,0.0,0.850845,2.481665
Leicester Tigers,0.972191,0.850438,0.754449,1.388427,1.371564,1.147339,1.329942
Sale Sharks,0.953807,0.882135,0.729418,1.322508,3.351867,1.128676,0.565925
Harlequins,0.906493,1.118217,0.760148,0.809378,0.539575,0.778414,0.0
Newcastle Red Bulls,0.542903,1.478593,0.748248,0.885343,0.663181,0.587885,0.0


## 6. Model each component with Poisson

In [375]:
# --- Team name mapping ---
team_name_map = {
    "Bath": "Bath Rugby",
    "Bath Rugby": "Bath Rugby",
    "Exeter": "Exeter Chiefs",
    "Exeter Chiefs": "Exeter Chiefs",
    "Northampton": "Northampton Saints",
    "Northampton Saints": "Northampton Saints",
    "Newcastle Red Bulls": "Newcastle Red Bulls",
    "Newcastle Falcons": "Newcastle Red Bulls", # for premiership table
    "Bristol Bears": "Bristol Bears",
    "Bristol Rugby": "Bristol Bears",
    "Saracens": "Saracens",
    "Leicester Tigers": "Leicester Tigers",
    "Sale Sharks": "Sale Sharks",
    "Gloucester": "Gloucester Rugby",
    "Gloucester Rugby": "Gloucester Rugby",
    "Harlequins": "Harlequins"
}

# --- Apply mapping to df_future ---
df_future['home_team'] = df_future['home_team'].map(team_name_map)
df_future['away_team'] = df_future['away_team'].map(team_name_map)

# --- Apply mapping to premiership table
premiership['team'] = premiership['team'].map(team_name_map)

In [380]:
import pandas as pd
from scipy.stats import poisson

# --- FULL JOINT OUTCOMES SIMULATION ---
def simulate_match_joint(home_team, away_team, team_strengths, sims=10000):
    """
    Simulate a match multiple times and return joint probabilities for
    all 20 mutually exclusive outcomes:
    - 16 for win combinations (winner's bonus √ó loser's bonus)
    - 4 for draws (draw with/without bonus combinations)
    """
    # Get team stats
    home = team_strengths.loc[home_team]
    away = team_strengths.loc[away_team]

    home_tries_exp = home['attack_tries'] * away['defense_tries'] * home_advantage_tries * league_avg_tries
    away_tries_exp = away['attack_tries'] * home['defense_tries'] * league_avg_tries

    home_pen_exp = home['attack_penalties'] * away['defense_penalties'] * league_avg_penalties
    away_pen_exp = away['attack_penalties'] * home['defense_penalties'] * league_avg_penalties

    home_drop_exp = home['attack_drops'] * away['defense_drops'] * league_avg_drops
    away_drop_exp = away['attack_drops'] * home['defense_drops'] * league_avg_drops
    
    exp = {
        'home_tries': home_tries_exp,
        'away_tries': away_tries_exp,
        'home_converted': home_tries_exp * home['conversion_rate'],
        'away_converted': away_tries_exp * away['conversion_rate'],
        'home_pen': home_pen_exp,
        'away_pen': away_pen_exp,
        'home_drop': home_drop_exp,
        'away_drop': away_drop_exp
    }

    # Initialize all 20 outcomes
    outcome_counts = {f'{h}_{a}': 0 for h in
                      ['win_no_bonus','win_bonus','lose_no_bonus','lose_try_bonus','lose_close_bonus','lose_bonus2','draw_no_bonus','draw_bonus']
                      for a in
                      ['win_no_bonus','win_bonus','lose_no_bonus','lose_try_bonus','lose_close_bonus','lose_bonus2','draw_no_bonus','draw_bonus']}
    
    # Keep only the 20 valid combinations
    valid_keys = [
        'win_no_bonus_lose_no_bonus','win_no_bonus_lose_try_bonus','win_no_bonus_lose_close_bonus','win_no_bonus_lose_bonus2',
        'win_bonus_lose_no_bonus','win_bonus_lose_try_bonus','win_bonus_lose_close_bonus','win_bonus_lose_bonus2',
        'lose_no_bonus_win_no_bonus','lose_no_bonus_win_bonus','lose_try_bonus_win_no_bonus','lose_try_bonus_win_bonus',
        'lose_close_bonus_win_no_bonus','lose_close_bonus_win_bonus','lose_bonus2_win_no_bonus','lose_bonus2_win_bonus',
        'draw_no_bonus_draw_no_bonus','draw_no_bonus_draw_bonus','draw_bonus_draw_no_bonus','draw_bonus_draw_bonus'
    ]
    outcome_counts = {k: 0 for k in valid_keys}

    # --- Run simulations ---
    for _ in range(sims):
        home_tries = poisson.rvs(exp['home_tries'])
        away_tries = poisson.rvs(exp['away_tries'])
        home_converted = min(home_tries, poisson.rvs(exp['home_converted']))
        away_converted = min(away_tries, poisson.rvs(exp['away_converted']))
        home_pen = poisson.rvs(exp['home_pen'])
        away_pen = poisson.rvs(exp['away_pen'])
        home_drop = poisson.rvs(exp['home_drop'])
        away_drop = poisson.rvs(exp['away_drop'])

        home_points = home_tries*5 + home_converted*2 + home_pen*3 + home_drop*3
        away_points = away_tries*5 + away_converted*2 + away_pen*3 + away_drop*3

        home_bonus_try = home_tries >= 4
        away_bonus_try = away_tries >= 4
        home_bonus_close = (away_points - home_points) <= 7 and home_points < away_points
        away_bonus_close = (home_points - away_points) <= 7 and away_points < home_points

        # Determine outcomes
        if home_points > away_points:
            home_outcome = 'win_bonus' if home_bonus_try else 'win_no_bonus'
            if away_bonus_try and away_bonus_close:
                away_outcome = 'lose_bonus2'
            elif away_bonus_try:
                away_outcome = 'lose_try_bonus'
            elif away_bonus_close:
                away_outcome = 'lose_close_bonus'
            else:
                away_outcome = 'lose_no_bonus'
        elif home_points < away_points:
            away_outcome = 'win_bonus' if away_bonus_try else 'win_no_bonus'
            if home_bonus_try and home_bonus_close:
                home_outcome = 'lose_bonus2'
            elif home_bonus_try:
                home_outcome = 'lose_try_bonus'
            elif home_bonus_close:
                home_outcome = 'lose_close_bonus'
            else:
                home_outcome = 'lose_no_bonus'
        else:  # draw
            home_outcome = 'draw_bonus' if home_bonus_try else 'draw_no_bonus'
            away_outcome = 'draw_bonus' if away_bonus_try else 'draw_no_bonus'

        outcome_counts[f'{home_outcome}_{away_outcome}'] += 1

    # Convert counts to probabilities
    return {k: v / sims for k, v in outcome_counts.items()}

In [381]:
simulate_match_joint("Bath Rugby", "Bristol Bears", team_strengths, sims=10000)

{'win_no_bonus_lose_no_bonus': 0.016,
 'win_no_bonus_lose_try_bonus': 0.0,
 'win_no_bonus_lose_close_bonus': 0.0238,
 'win_no_bonus_lose_bonus2': 0.0017,
 'win_bonus_lose_no_bonus': 0.3291,
 'win_bonus_lose_try_bonus': 0.2239,
 'win_bonus_lose_close_bonus': 0.0217,
 'win_bonus_lose_bonus2': 0.0968,
 'lose_no_bonus_win_no_bonus': 0.0079,
 'lose_no_bonus_win_bonus': 0.0794,
 'lose_try_bonus_win_no_bonus': 0.0,
 'lose_try_bonus_win_bonus': 0.0667,
 'lose_close_bonus_win_no_bonus': 0.0179,
 'lose_close_bonus_win_bonus': 0.0181,
 'lose_bonus2_win_no_bonus': 0.0016,
 'lose_bonus2_win_bonus': 0.0734,
 'draw_no_bonus_draw_no_bonus': 0.0062,
 'draw_no_bonus_draw_bonus': 0.001,
 'draw_bonus_draw_no_bonus': 0.0006,
 'draw_bonus_draw_bonus': 0.0142}

In [382]:
# List of all 20 joint outcome keys
joint_keys = [
    'win_no_bonus_lose_no_bonus','win_no_bonus_lose_try_bonus','win_no_bonus_lose_close_bonus','win_no_bonus_lose_bonus2',
    'win_bonus_lose_no_bonus','win_bonus_lose_try_bonus','win_bonus_lose_close_bonus','win_bonus_lose_bonus2',
    'lose_no_bonus_win_no_bonus','lose_no_bonus_win_bonus','lose_try_bonus_win_no_bonus','lose_try_bonus_win_bonus',
    'lose_close_bonus_win_no_bonus','lose_close_bonus_win_bonus','lose_bonus2_win_no_bonus','lose_bonus2_win_bonus',
    'draw_no_bonus_draw_no_bonus','draw_no_bonus_draw_bonus','draw_bonus_draw_no_bonus','draw_bonus_draw_bonus'
]

def predict_fixtures_joint(df_future, team_strengths, sims=5000):
    """
    Run simulate_match_joint for each future fixture, 
    returning a DataFrame where each of the 20 outcomes is a column.
    """
    results = []

    for _, row in df_future.iterrows():
        home = row['home_team']
        away = row['away_team']

        # Run joint simulation
        outcome_probs = simulate_match_joint(home, away, team_strengths, sims)

        # Make sure all keys are present
        row_data = row.to_dict()
        for k in joint_keys:
            row_data[k] = outcome_probs.get(k, 0.0)

        results.append(row_data)

    return pd.DataFrame(results)

In [452]:
np.random.seed(14)
df_predictions = predict_fixtures_joint(df_future, team_strengths, sims=10000)

In [453]:
df_predictions.head()

Unnamed: 0,month,date,time,home_team,away_team,win_no_bonus_lose_no_bonus,win_no_bonus_lose_try_bonus,win_no_bonus_lose_close_bonus,win_no_bonus_lose_bonus2,win_bonus_lose_no_bonus,...,lose_try_bonus_win_no_bonus,lose_try_bonus_win_bonus,lose_close_bonus_win_no_bonus,lose_close_bonus_win_bonus,lose_bonus2_win_no_bonus,lose_bonus2_win_bonus,draw_no_bonus_draw_no_bonus,draw_no_bonus_draw_bonus,draw_bonus_draw_no_bonus,draw_bonus_draw_bonus
0,March 2026,Friday 20th March,19:45,Bath Rugby,Saracens,0.0318,0.0001,0.0438,0.0036,0.3376,...,0.0005,0.0538,0.0394,0.0209,0.0062,0.0624,0.0086,0.0009,0.0026,0.0123
1,March 2026,Saturday 21st March,15:00,Harlequins,Gloucester Rugby,0.0291,0.0001,0.0397,0.0042,0.2323,...,0.0001,0.0795,0.0318,0.0339,0.0015,0.085,0.0137,0.0007,0.0015,0.0164
2,March 2026,Saturday 21st March,15:00,Northampton Saints,Newcastle Red Bulls,0.0243,0.0001,0.0178,0.0003,0.7838,...,0.0,0.0012,0.0055,0.0028,0.0005,0.0053,0.0019,0.0001,0.0002,0.0015
3,March 2026,Saturday 21st March,15:05,Exeter Chiefs,Sale Sharks,0.0546,0.0001,0.0758,0.004,0.2402,...,0.0009,0.0469,0.0753,0.031,0.0094,0.0536,0.0175,0.0015,0.0031,0.0091
4,March 2026,Sunday 22nd March,15:00,Leicester Tigers,Bristol Bears,0.0361,0.0006,0.0437,0.0062,0.2347,...,0.0001,0.0751,0.032,0.0347,0.0019,0.0763,0.0114,0.0021,0.0015,0.0154


## 7. Run simulations to build the Premiership table probabilities

In [454]:
def simulate_once_rugby(fixtures, table):
    table_sim = table.copy()
    points = dict(zip(table_sim["team"], table_sim["pts"]))

    # List of all 20 outcome columns in your fixtures DataFrame
    outcome_keys = [
    'win_no_bonus_lose_no_bonus','win_no_bonus_lose_try_bonus','win_no_bonus_lose_close_bonus','win_no_bonus_lose_bonus2',
    'win_bonus_lose_no_bonus','win_bonus_lose_try_bonus','win_bonus_lose_close_bonus','win_bonus_lose_bonus2',
    'lose_no_bonus_win_no_bonus','lose_no_bonus_win_bonus','lose_try_bonus_win_no_bonus','lose_try_bonus_win_bonus',
    'lose_close_bonus_win_no_bonus','lose_close_bonus_win_bonus','lose_bonus2_win_no_bonus','lose_bonus2_win_bonus',
    'draw_no_bonus_draw_no_bonus','draw_no_bonus_draw_bonus','draw_bonus_draw_no_bonus','draw_bonus_draw_bonus'
    ]


    # Mapping outcome columns to points (home points, away points)
    points_map = {
        # Home wins
        'win_no_bonus_lose_no_bonus': (4, 0),
        'win_no_bonus_lose_try_bonus': (4, 1),
        'win_no_bonus_lose_close_bonus': (4, 1),
        'win_no_bonus_lose_bonus2': (4, 2),
        'win_bonus_lose_no_bonus': (5, 0),
        'win_bonus_lose_try_bonus': (5, 1),
        'win_bonus_lose_close_bonus': (5, 1),
        'win_bonus_lose_bonus2': (5, 2),
    
        # Home loses
        'lose_no_bonus_win_no_bonus': (0, 4),
        'lose_no_bonus_win_bonus': (0, 5),
        'lose_try_bonus_win_no_bonus': (1, 4),
        'lose_try_bonus_win_bonus': (1, 5),
        'lose_close_bonus_win_no_bonus': (1, 4),
        'lose_close_bonus_win_bonus': (1, 5),
        'lose_bonus2_win_no_bonus': (2, 4),
        'lose_bonus2_win_bonus': (2, 5),
    
        # Draws
        'draw_no_bonus_draw_no_bonus': (2, 2),
        'draw_no_bonus_draw_bonus': (2, 3),
        'draw_bonus_draw_no_bonus': (3, 2),
        'draw_bonus_draw_bonus': (3, 3)
    }

    for _, row in fixtures.iterrows():
        home = row["home_team"]
        away = row["away_team"]
        probs = [row[k] for k in outcome_keys]
        outcome = np.random.choice(outcome_keys, p=probs)

        home_pts, away_pts = points_map[outcome]
        points[home] += home_pts
        points[away] += away_pts

    result_df = table_sim.copy()
    result_df["pts"] = result_df["team"].map(points)
    result_df = result_df.sort_values(["pts"], ascending=False)
    result_df["position"] = np.arange(1, len(result_df)+1)

    return result_df


# -------------------------------
# RUN MULTIPLE SIMULATIONS
# -------------------------------
def run_simulations_rugby(fixtures, table, n_sim=10000):
    """
    Run many simulations to get probability distribution of finishing positions.
    Returns a DataFrame: rows=positions, columns=teams, values=counts.
    """
    position_counts = {team: np.zeros(len(table)) for team in table["team"]}

    for _ in range(n_sim):
        final_table = simulate_once_rugby(fixtures, table)
        for _, row in final_table.iterrows():
            position_counts[row["team"]][row["position"]-1] += 1

    pos_df = pd.DataFrame(position_counts, index=np.arange(1, len(table)+1))
    pos_df.index.name = "position"
    return pos_df


In [455]:
np.random.seed(14)

n_sim=10000

# Run simulations
position_counts = run_simulations_rugby(df_predictions, premiership, n_sim=n_sim)

# Convert counts to probabilities
position_probs = position_counts / n_sim

In [456]:
position_probs.index.name = "TEAM"
position_distribution_t = position_probs.T

In [457]:
position_distribution_pct = position_distribution_t.div(
    position_distribution_t.sum(axis=1),
    axis=0
) * 100


In [458]:
position_distribution_pct

TEAM,1,2,3,4,5,6,7,8,9,10
Northampton Saints,33.98,32.43,17.74,10.05,4.25,1.49,0.06,0.0,0.0,0.0
Bath Rugby,49.65,28.29,12.88,5.95,2.56,0.66,0.01,0.0,0.0,0.0
Bristol Bears,10.37,21.25,29.99,20.7,12.06,5.35,0.28,0.0,0.0,0.0
Leicester Tigers,3.55,8.72,17.94,25.49,24.92,17.66,1.72,0.0,0.0,0.0
Exeter Chiefs,1.51,5.13,11.61,20.03,28.45,29.5,3.76,0.01,0.0,0.0
Saracens,0.94,4.18,9.81,17.41,26.01,36.61,5.02,0.02,0.0,0.0
Sale Sharks,0.0,0.0,0.03,0.37,1.75,8.7,83.53,5.16,0.46,0.0
Gloucester Rugby,0.0,0.0,0.0,0.0,0.0,0.03,3.96,57.74,38.2,0.07
Harlequins,0.0,0.0,0.0,0.0,0.0,0.0,1.66,37.05,60.91,0.38
Newcastle Red Bulls,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.43,99.55


## 8. Preview and present the results graphically

In [459]:
# Build label mapping: "position  team" (extra space for 1-9)
team_labels = (
    premiership[["team", "position"]]
    .set_index("team")["position"]
    .map(lambda pos: f"{pos}{'  ' if pos < 10 else ' '}")
)

# Join position and team name into one label
team_labels = (
    premiership[["team", "position"]]
    .assign(
        label=lambda df: df.apply(
            lambda r: f"{r['position']}{'&nbsp;&nbsp;&nbsp;&nbsp;' if r['position'] < 10 else '&nbsp;&nbsp;'}{r['team']}",
            axis=1
        )
    )
    .set_index("team")["label"]
)


# Apply labels to your table index
position_distribution_pct.index = position_distribution_pct.index.map(team_labels)

# Drop position column if present
position_distribution_pct = position_distribution_pct.drop(columns=["position"], errors="ignore")

# Remove index name
position_distribution_pct.index.name = None


In [460]:
# Copy to avoid modifying original
position_distribution_pct = position_distribution_pct.copy()

# Prepend points from premiership table to team names
position_distribution_pct.index = [
    f"{team} ({pts})" 
    for team, pts in zip(position_distribution_pct.index, premiership["pts"])
]

In [462]:
greens = plt.cm.Greens
green_cmap = LinearSegmentedColormap.from_list(
    "Greens_soft",
    greens(np.linspace(0.03, 0.65, 256))
)

vmax = 45

def zero_style(val):
    if val < 0.005:
        return "background-color: white !important;"
    return ""

# ---- transform ONLY for colouring ----
color_data = position_distribution_pct.copy()
color_data = (color_data / vmax).pow(0.65) * vmax

position_distribution_pct.style \
    .background_gradient(
        cmap=green_cmap,
        vmin=0,
        vmax=vmax,
        gmap=color_data,
        axis=None          # üîë THIS FIXES THE ERROR
    ) \
    .applymap(zero_style) \
    .format("{:.2f}%") \
    .set_table_styles([
        {"selector": "th", "props": [
            ("background-color", "#e6edf4"),
            ("color", "#333"),
            ("text-align", "center"),
            ("font-family", "Inter, Roboto, Arial, sans-serif"),
            ("font-size", "13px"),
            ("font-weight", "600")
        ]},

        {"selector": "th.col_heading", "props": [
            ("text-align", "center")
        ]},

        {"selector": "th.row_heading", "props": [
            ("text-align", "left"),
            ("font-size", "13px"),
            ("font-weight", "600"),
            ("white-space", "nowrap"),
            ("max-width", "250px"),
            ("overflow", "hidden"),
            ("text-overflow", "ellipsis")
        ]},

        {"selector": "tr:nth-child(odd) th.row_heading", "props": [
            ("background-color", "#fbfcfe")
        ]},
        {"selector": "tr:nth-child(even) th.row_heading", "props": [
            ("background-color", "#e6edf4")
        ]},

        {"selector": "td", "props": [
            ("text-align", "center"),
            ("font-family", "Inter, Roboto, Arial, sans-serif"),
            ("font-size", "12px"),
            ("font-weight", "500"),
            ("color", "#000")
        ]}
    ])


  .applymap(zero_style) \


TEAM,1,2,3,4,5,6,7,8,9,10
1 Northampton Saints (43),33.98%,32.43%,17.74%,10.05%,4.25%,1.49%,0.06%,0.00%,0.00%,0.00%
2 Bath Rugby (41),49.65%,28.29%,12.88%,5.95%,2.56%,0.66%,0.01%,0.00%,0.00%,0.00%
3 Bristol Bears (37),10.37%,21.25%,29.99%,20.70%,12.06%,5.35%,0.28%,0.00%,0.00%,0.00%
4 Leicester Tigers (36),3.55%,8.72%,17.94%,25.49%,24.92%,17.66%,1.72%,0.00%,0.00%,0.00%
5 Exeter Chiefs (35),1.51%,5.13%,11.61%,20.03%,28.45%,29.50%,3.76%,0.01%,0.00%,0.00%
6 Saracens (32),0.94%,4.18%,9.81%,17.41%,26.01%,36.61%,5.02%,0.02%,0.00%,0.00%
7 Sale Sharks (20),0.00%,0.00%,0.03%,0.37%,1.75%,8.70%,83.53%,5.16%,0.46%,0.00%
8 Gloucester Rugby (11),0.00%,0.00%,0.00%,0.00%,0.00%,0.03%,3.96%,57.74%,38.20%,0.07%
9 Harlequins (10),0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,1.66%,37.05%,60.91%,0.38%
10 Newcastle Red Bulls (5),0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.02%,0.43%,99.55%
