# TrueSkill

Purpose: Create a feature that rates teams with a system called TrueSkill.

### Notebook Outline

- Import Libraries
- Upload Data
- Create Rating Function
- Create TrueSkill Ratings
- Create TrueSkill Dataframe

### Import Libraries

In [8]:
import pandas as pd
from trueskill import Rating, rate_1vs1
from collections import defaultdict

### Upload Data

In [9]:
results = pd.read_csv('Data/Stage2DataFiles/RegularSeasonCompactResults.csv')
tourney = pd.read_csv('Data/Stage2DataFiles/NCAATourneyCompactResults.csv')
seeds = pd.read_csv('Data/Stage2DataFiles/NCAATourneySeeds.csv')
teams = pd.read_csv('Data/Stage2DataFiles/Teams.csv')

### Create Function to Create TrueSkill Rating

In [12]:
def get_ratings(season):
    '''
    Take a season in your dataset and create a TrueSkill rating for every team. All teams start with a default
    rating and is then updated with each game played. The function then takes the season input into the function 
    and filters the results dataframe for only that one season. The function then loops through every game and 
    updates the ratings based on which team wins and which team loses. The more games that are played, the more 
    accurate the rating is. A dictionary with the TrueSkill ratings is returned.
    -
    Inputs:
    Season
    -
    Outputs: 
    TrueSkill Rating Dictionary
    '''
    # start all teams with a default rating
    ratings = defaultdict(Rating)         
    # get data for season
    current_results = results[results['Season'] == season]                                           
    # at the start, all teams are equal which is not realistic so we loop
    # through the season's games several times to get better starting ratings
    for epoch in range(10):                                 
        # loop through the games in order
        for _, row in current_results.sort_values('DayNum').iterrows():                                                    
            wteamid = row['WTeamID']                                                                 
            lteamid = row['LTeamID']    
            # have TrueSkill compute new ratings based on the game result
            ratings[wteamid], ratings[lteamid] = rate_1vs1(ratings[wteamid], ratings[lteamid])       
    # just keep the mean rating
    return {team_id: rating.mu for team_id, rating in ratings.items()}

### Create TrueSkill Ratings

The above function is slow, therefore we use multiprocessing to compute all team ratings in parallel.

In [13]:
from multiprocessing import Pool

p = Pool()    
seasons = results['Season'].unique()
ratings = p.map(get_ratings, seasons)                                                                
p.close()                                                                                            
p.join() 

# put ratings into a dict for easy access
ratings = dict(zip(seasons, ratings))

# lets take a look at 2019 rankings
team_names = dict(zip(teams['TeamID'], teams['TeamName']))

Create ratings Dataframe for 2019.

In [14]:
ratings_2019 = [(team_names[t], r) for t, r in ratings[2019].items()]
pd.DataFrame(ratings_2019, columns=['TeamID', 'Rating']).sort_values('Rating', ascending=False)

Unnamed: 0,TeamID,Rating
58,Virginia,43.571669
104,Duke,42.585001
44,North Carolina,41.289900
249,Tennessee,40.798078
318,Houston,40.511739
...,...,...
149,MS Valley St,10.735895
141,MD E Shore,10.562677
5,Alabama A&M,10.394607
103,Delaware St,9.887045


Create a list of dictionaries that includes each team's TrueSkill rating for each year.

In [15]:
all_ratings_list = []
for y in list(ratings.keys()):
    for t in list(ratings[y].keys()):
        rank = ratings[y][t]
        all_ratings = {'year' : y, 'team' : t, 'ts_rank' : rank}
        all_ratings_list.append(all_ratings)

### Create TrueSkill dataframe.

In [16]:
true_skill_df = pd.DataFrame(all_ratings_list)

In [17]:
true_skill_df.head()

Unnamed: 0,year,team,ts_rank
0,1985,1228,37.405463
1,1985,1328,37.848866
2,1985,1417,30.283133
3,1985,1225,18.579489
4,1985,1412,33.268603


In [18]:
# true_skill_df.to_csv("Data/TrueSkill.csv", index = None)

Source: [Kaggle Notebook](https://www.kaggle.com/gkoundry/rating-teams-using-microsoft-s-trueskill-algorithm)