<h1>Matildas Result Prediction SVR Model <i>(February 2025 Update)</i></h1>

<h3>Function of the model</h3>
<p>This is an updated version of the regression model built using a support vector machine to predict the outcome of the Australian women's national football team's future matches, given all home and away results over the past five years (2020 through Dec 7th, 2024).</p>

<h2>Model training code with explanations</h2>

<p>Carry out imports of required libraries providing functionality:</p>

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.metrics import mean_absolute_error, mean_squared_error, root_mean_squared_error, r2_score

<p>Read home and away win-loss data over the past 5 years from Excel files, and concatenate separate data files together:</p>

In [None]:
home_data = pd.read_excel('matildas_winlossHomeFeb2025Update.xlsx')
away_data = pd.read_excel('matildas_winlossAwayLast5Years.xlsx')

home_data['location'] = 'Home'
away_data['location'] = 'Away'

data = pd.concat([home_data, away_data], ignore_index = True)

In [None]:
data['result'] = data['win'] * 1 + data['draw'] * 0.5 + data['lose'] * 0
data = data.drop(['win', 'draw', 'lose'], axis = 1)

In [None]:
features = data[['home_team', 'away_team', 'home_score', 'away_score', 'tournament', 'city', 'country', 'location']]
target = data['result']

In [None]:
num_features = ['home_score', 'away_score']
cat_features = ['home_team', 'away_team', 'tournament', 'city', 'country', 'location']

In [None]:
preprocessor = ColumnTransformer(
    transformers = [
        ('num', StandardScaler(), num_features),
        ('cat', OneHotEncoder(handle_unknown = 'ignore', drop = 'first'), cat_features)
        ])

In [None]:
pipeline = Pipeline(steps = [
    ('preprocessor', preprocessor),
    ('regressor', SVR(kernel = 'linear'))
])

In [None]:
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size = 0.2, random_state = 42)

In [None]:
pipeline.fit(X_train, y_train)

In [None]:
y_pred = pipeline.predict(X_test)

In [None]:
home_teams = ['Japan', 'United States', 'Australia']
away_teams = ['Australia', 'Australia', 'Colombia']
cities = ['Houston', 'Glendale', 'San Diego']
country = 'United States'
tournament = 'SheBelieves Cup'
venues = ['Shell Energy Stadium', 'State Farm Stadium', 'Snapdragon Stadium']

In [None]:
future_matches = pd.DataFrame({
    'home_team': home_teams,
    'away_team': away_teams,
    'tournament': [tournament] * 3,
    'city': cities,
    'country': [country] * 3,
    'location': venues,
    'home_score': [0] * 3,
    'away_score': [0] * 3
})

In [None]:
predicted_results = pipeline.predict(future_matches)

In [None]:
future_matches['raw_predictiondata'] = predicted_results

In [None]:
def classify_result(predicted):
    if predicted > 1.0:
        return 'Win'
    elif predicted < 1.0:
        return 'Lose'
    else:
        return 'Draw'

In [None]:
future_matches['predicted_result'] = future_matches['raw_predictiondata'].apply(classify_result)

In [None]:
future_matches = future_matches.drop(columns = ['home_score', 'away_score'])

In [None]:
future_matches

In [None]:
mae = mean_absolute_error(y_test, y_pred)
print("Mean absolute error:", mae)

In [None]:
mse = mean_squared_error(y_test, y_pred)
print("Mean squared error:", mse)

In [None]:
rmse = root_mean_squared_error(y_test, y_pred)
print("Root mean squared error:", rmse)

In [None]:
r2 = r2_score(y_test, y_pred)
print("R-squared value:", r2)

In [None]:
y_pred_binary = [1 if pred >= 0.5 else 0 for pred in y_pred]
accuracy = (y_pred_binary == y_test).mean()
print("Accuracy:", accuracy)