# Team Ranking based on Colley's method
Resources used:
- https://www.colleyrankings.com/matrate.pdf ($\textbf{Colley's own description of the method}$ - reliable but bad notation and lengthy)
- https://www.dcs.bbk.ac.uk/~ale/dsta+dsat/dsta+dsat-3/lm-ch3-colley.pdf (watch out for small errors)

###  Method Description
One can imagine a simple ranking system based on the winning percentage (of team $i$)

$$r_i = \frac{w_i}{t_i}$$

the teams. Colley proposed a simple modification which
- takes into account the opponent players strength
- makes rankings "zero sum" (teams steal "ranking score", rankings does not really sum to 0, but $n_{\text{teams}}/2$)
- initially more stabler (a team who loses once does not have 0 ranking, but moves more slowly away from the teams' equilibrium)

by instead using the (Laplace modified) winning percentage

$$r_i = \frac{w_i+1}{t_i+2}.$$

Consider that we may write

$$ w_i = \frac{w_i-l_i}{2} + \frac{w_i+l_i}{2} = \frac{w_i-l_i}{2} + \frac{t_i}{2}$$

where we may estimate the $\frac{t_i}{2}$ as

$$ \sum_{i=1}^{t_i} \frac{1}{2} \approx \sum_{j\in \mathcal{O}_i} n_{i,j} r_j $$

where $\mathcal{O}_i$ is the set of opponents of team i and $n_{i,j}$ is the number of matches between $i$ and $j$ (note that the approximation is justified by the fact that the equilibrium of ratings is $\frac{1}{2}$, but the approximation may be bad if there are multiple teams who only wins and loses - should not be a problem in fair match ups). We may then write

$$ r_i \approx \frac{\frac{w_i-l_i}{2} - \sum_{j\in \mathcal{O}_i} n_{i,j} r_j +1}{t_i+2} $$

where, using this approximation, we now have that the rating of one team is dependent on the other teams. Further, we see that the equation is a system of $n_{\text{teams}}$ equations with $n_{\text{teams}}$ variables as $i \in \{ 1,2,...,n_{\text{teams}} \}$:

$$ (t_i + 2)r_i - \sum_{j\in \mathcal{O}_i} n_{i,j} r_j = \frac{w_i-l_i}{2} + 1 $$

or, in matrix form

$$ Cr = b $$

where $b_i = \frac{w_i-l_i}{2} +1 $ and the diagonal elements of $C$ are $C_{i,i}=t_i+2$ and the off-diagonal elements are $C_{i,j} = -n_{i,j}$ for $i \neq j$.

Note that $C$ is
- symmetric
- real
- positive definite (discussed by Colley)

hence we may solve the matrix equation to rate teams by first carrying out a Cholesky decomposition of $C$.

### Applying the method
We will apply the method to the soccer league 'Allsvenskan' results of 2021 to rank the teams. We can then compare the rankings to to the offical results found at https://www.svenskfotboll.se/serier-cuper/tabell-och-resultat/allsvenskan-2021/88307/.

In [19]:
import pandas as pd
import numpy as np
# Import train & test data 
data_temp = pd.read_csv('C:/Users/xsoni/Desktop/allsvenskan.txt')


# Date reformating to be able to easily select dates and then pick 2021
data_temp["Date"]=(data_temp["Date"].str.replace("-","")).astype(int)
df = data_temp[20210000<data_temp['Date']]
df = df[df['Date']<20220000]


# We define the "winner" columns as =1 if the home team won, 0 if draw and -1 if the away team won
def winner(row):
    if row['FTHG']>row['FTAG']:
        return 'H'
    elif row['FTHG']==row['FTAG']:
        return 'D'
    else:
        return 'A'

# Add result in terms of who won/draw
df['Result'] = df.apply (lambda row: winner(row), axis=1)

# Pick only what we need: H_team, A_team, Date (in case) and Result
allsvenskan2021 = df[['Date', 'H_team', 'A_team', 'Result']]

allsvenskan2021.head()

Unnamed: 0,Date,H_team,A_team,Result
1922,20210410,MalmoFF,Hammarby,H
1923,20210410,Orebro,Goteborg,D
1924,20210411,Mjallby,Varberg,D
1925,20210411,Halmstad,Hacken,H
1926,20210411,Norrkoping,Sirius,D


In [65]:
A_teams = allsvenskan2021['A_team'].unique()
H_teams = allsvenskan2021['H_team'].unique()
teams = set(np.concatenate((A_teams, H_teams), axis=0))
n_teams = len(teams)

# Initialize Colley matrix C and vector b
C = np.zeros([n_teams, n_teams])
b = np.zeros(n_teams)

# https://github.com/PercyJaiswal/Colley_Rankings/blob/master/Colley.py

- Draw: Half a win, half a loss