
# Midterm 1, Fall 2021: Chess Ratings #

_Version 1.0_

[Solution](https://gatech.instructure.com/courses/411942/files/52015055?wrap=1)

Change Log:
1.0 - Initial Release

This problem builds on your knowledge of **Python data structures, string processing, and implementing mathematical functions**.

For other preliminaries and pointers, refer back to the Piazza post titled **"Midterm 1 Release Notes"**.
- Total Exercises: **8**  
- Total Points: **16**
- Time Limit: **3 Hours**

Each exercise builds logically on the previous one, but you may **solve them in any order**. That is, if you can't solve an exercise, you can still move on and try the next one. **However, if you see a code cell introduced by the phrase, "Sample result for ...", please run it.** Some demo cells in the notebook may depend on these precomputed results.

The point values of individual exercises are as follows:

- Exercise 0: 3 points
- Exercise 1: 2 points
- Exercise 2: 1 points
- Exercise 3: 2 points
- Exercise 4: 1 points
- Exercise 5: 3 points
- Exercise 6: 2 points
- Exercise 7: 2 points


**Good luck!**

## Elo Ratings

The Elo (rhymes with "Hello") rating system is a widely used method for quantifying relative skill levels of players in a game or sport. The method was originally used to rate chess players and is named for its creator, Arpad Elo. This system is very simple but is able to rate players much more effectively than a win/loss record.

On a high level, the winning player in a game takes rating points away from the losing player. How many points change hands is determined by the difference in the initial ratings of each player. For example, if a highly rated player records a victory over a lower rated player, then they would gain only a few points. This is reflective of the highly rated player being expected to win. However, if the lower rated player is able to pull off an upset, a larger quantity of points would be exchanged. The idea is that over time the system will adjust players' ratings to their true relative skill levels. Additionally, the difference in Elo ratings between two players can be used to calculate the expectation for the number of wins each player would accrue, which is often expressed as "win probability". 

Here we will extract data from a recent chess tournament that captures players' ratings at the start of the tournament and the outcome of all games played. We will then use that data to calculate expected wins based on the matchups and compare our expectation with the observed results. Finally we will determine the updated Elo ratings for the players. There are many variations on this system, but here we will use the original version. You can find more information about the Elo rating system [here](https://en.wikipedia.org/wiki/Elo_rating_system)

Let's get started by taking a look at the data!

In [1]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###

import run_tests as test_utils
raw_data = test_utils.read_raw_data('Bucharest2021.pgn')
test_utils.get_mem_usage_str()

'47.6 MiB'

Take note of how the data is **split** into sections by **blank lines** (`'\n\n'`); this fact might be useful later on! _(hint! hint!)_ Here are the first 4 sections.

In [2]:
demo_raw_data = '\n\n'.join(raw_data.split('\n\n')[:4])
print(demo_raw_data)

[Event "Superbet Classic 2021"]
[Site "Bucharest ROU"]
[Date "2021.06.05"]
[Round "1.5"]
[White "Deac,Bogdan-Daniel"]
[Black "Giri,A"]
[Result "1/2-1/2"]
[WhiteElo "2627"]
[BlackElo "2780"]
[ECO "D43"]

1.d4 d5 2.c4 c6 3.Nc3 Nf6 4.Nf3 e6 5.Bg5 h6 6.Bh4 dxc4 7.e4 g5 8.Bg3 b5 9.Be2 Bb7
10.Qc2 Nh5 11.Rd1 Nxg3 12.hxg3 Na6 13.a3 Bg7 14.e5 Qe7 15.Ne4 O-O-O 16.Nd6+ Rxd6
17.exd6 Qxd6 18.O-O g4 19.Ne5 Bxe5 20.dxe5 Qxe5 21.Bxg4 h5 22.Rfe1 Qf6 23.Bf3 h4
24.b3 cxb3 25.Qxb3 hxg3 26.fxg3 Qg7 27.Qd3 Nc7 28.Qd6 c5 29.Qd7+ Kb8 30.Bxb7 Kxb7
31.Rxe6 Qxg3 32.Qc6+ Kb8 33.Qd6 Qxd6 34.Rexd6 Kb7 35.Rf6 Rh7 36.Rd7 b4 37.axb4 cxb4
38.Kf2 a5 39.Ke2 Rg7 40.Rfxf7 Rxg2+ 41.Kd1 Rg1+ 42.Kc2 Rg2+ 43.Kb1 Rg1+ 44.Kb2 Rg2+
45.Kb1 Rg1+ 46.Kb2 Rg2+ 47.Kb1 Rg1+  1/2-1/2

[Event "Superbet Classic 2021"]
[Site "Bucharest ROU"]
[Date "2021.06.05"]
[Round "1.4"]
[White "Lupulescu,C"]
[Black "Aronian,L"]
[Result "1/2-1/2"]
[WhiteElo "2656"]
[BlackElo "2781"]
[ECO "E39"]

1.d4 Nf6 2.c4 e6 3.Nc3 Bb4 4.Qc2 c5 5.dxc5 O-O 6.Nf3 Na6 7

The sections in the raw data alternate between **metadata** and **moves data**. The metadata is information about the game, such as who is playing with what pieces, the ratings of each player, and the results of the game. The moves data contains a record of each chess move executed in the game. Since players' Elo ratings are only affected by the outcomes of the games, we are primarily concerned with the metadata.

## Exercise 0 (3 points)

The first thing we need to do in our analysis is get the data in a more structured form. 

Fill out the function `extract_games(raw_data)` in the code cell below with the following requirements:

Given a string read from a text file `raw_data`, extract the following information about each game and store in a **list of dictionaries** `games`. Below are details for what one of these dictionaries should look like: 
* `games[i]['white_player']` - String - Name of the player assigned the white pieces.
  * Example from `raw_data`: [White "Deac,Bogdan-Daniel"]
  * Example value: `'Deac,Bogdan-Daniel'`  
  * Value type: `str`  
  
  
* `games[i]['black_player']` - String - Name of the player assigned the black pieces.
  * Example from `raw_data`: [Black "Giri,A"]
  * Example value: `'Giri,A'`  
  * Value type: `str`
    

* `games[i]['white_rating']` - Integer - Pre-tournament rating of the white player.
  * Example from `raw_data`: [WhiteElo "2627"]
  * Example value: `2627`  
  * Value type: `int`
    
    
* `games[i]['black_rating']` - Integer - Pre-tournament rating of the black player.
  * Example from `raw_data`: [BlackElo "2780"]
  * Example value: `2780`  
  * Value type: `int`
    
    
* `games[i]['result']` - String - Result of the game.
  * Example from `raw_data`: [Result "1/2-1/2"]
  * Example value: `'1/2-1/2'`
  * Value type: `str`

You may assume that the required metadata is included, that sections are separated by blank lines, and that the sections alternate between metadata and moves data (starting with metadata). Additional metadata tags (beyond the 5 you are tasked with extracting) may be present, but they should be ignored. The ordering of the metadata **may be different** from the example above. Additionally, the moves data sections **may not be formatted** the same way as the example above.

A demo of your function run on the `demo_raw_data` defined above is included in the solution cell. The result should be:
```
[{  'white_player': 'Deac,Bogdan-Daniel',
    'black_player': 'Giri,A',
    'result': '1/2-1/2',
    'white_rating': 2627,
    'black_rating': 2780},
  { 'white_player': 'Lupulescu,C', 
    'black_player': 'Aronian,L', 
    'result': '1/2-1/2', 
    'white_rating': 2656, 
    'black_rating': 2781}]
```

To help you get started, consider the following snippet, which converts `demo_raw_data` into a nested list of lists. A similar strategy may be helpful in processing the `raw_data` parameter in the exercise.

In [3]:
demo_metadata_list = [metadata.splitlines() for metadata in demo_raw_data.split('\n\n')[::2]]
print(f'type(demo_metadata_list[0]): {type(demo_metadata_list[0])}') # outer list items are lists
print(f'type(demo_metadata_list[0][0]): {type(demo_metadata_list[0][0])}') # inner list items are strings
demo_metadata_list

type(demo_metadata_list[0]): <class 'list'>
type(demo_metadata_list[0][0]): <class 'str'>


[['[Event "Superbet Classic 2021"]',
  '[Site "Bucharest ROU"]',
  '[Date "2021.06.05"]',
  '[Round "1.5"]',
  '[White "Deac,Bogdan-Daniel"]',
  '[Black "Giri,A"]',
  '[Result "1/2-1/2"]',
  '[WhiteElo "2627"]',
  '[BlackElo "2780"]',
  '[ECO "D43"]'],
 ['[Event "Superbet Classic 2021"]',
  '[Site "Bucharest ROU"]',
  '[Date "2021.06.05"]',
  '[Round "1.4"]',
  '[White "Lupulescu,C"]',
  '[Black "Aronian,L"]',
  '[Result "1/2-1/2"]',
  '[WhiteElo "2656"]',
  '[BlackElo "2781"]',
  '[ECO "E39"]']]

In [5]:
def extract_games(raw_data):
    import re
    ###
    data = raw_data.split("\n")
    White = []
    Black = []
    result = []
    whiteRating = []
    blackRating = []
    output = []
    
    for i in data:
        m = re.search('((White ")(.*)("))', i) #White Player
        if type(m) is re.Match:
            ext_string = m.group(3)
            White.append(ext_string)
            
        n = re.search('((Black ")(.*)("))', i) #Black Player 
        if type(n) is re.Match:
            ext_string = n.group(3)
            Black.append(ext_string)        
            
        o = re.search('((Result ")(.*)("))', i) #Result 
        if type(o) is re.Match:
            ext_string = o.group(3)
            result.append(ext_string)    
            
        p = re.search('((WhiteElo ")(.*)("))', i) #White Rating 
        if type(p) is re.Match:
            ext_string = p.group(3)
            whiteRating.append(int(ext_string))
            
        q = re.search('((BlackElo ")(.*)("))', i) #Black Rating 
        if type(q) is re.Match:
            ext_string = q.group(3)
            blackRating.append(int(ext_string))
            
    for i in range(len(White)):
        newDict = {"white_player": White[i], "black_player": Black[i], "result": result[i],
                   "white_rating": whiteRating[i], "black_rating": blackRating[i]}
        output.append(newDict)
        
    return output
    ###

# Demo
extract_games(demo_raw_data)


[{'white_player': 'Deac,Bogdan-Daniel',
  'black_player': 'Giri,A',
  'result': '1/2-1/2',
  'white_rating': 2627,
  'black_rating': 2780},
 {'white_player': 'Lupulescu,C',
  'black_player': 'Aronian,L',
  'result': '1/2-1/2',
  'white_rating': 2656,
  'black_rating': 2781}]

The test cell below runs your function **many times**. Remove or comment out any `print` statements to avoid generating excessive output.

In [6]:
# `ex0_test`: Test cell
from run_tests import ex0_test
for _ in range(100):
    ex0_test(10, 4, extract_games)
print('Passed!')

###
### AUTOGRADER TEST - DO NOT REMOVE
###
test_utils.get_mem_usage_str()

Passed!


'47.9 MiB'

**Run the following cell, even if you skipped Exercise 0.**

We are loading a pre-computed solution that will be used in the following sections. The first two sections items in the list are displayed.

In [7]:
# Sample result for ex0
games_metadata = test_utils.read_pickle('games_metadata')
print(games_metadata[:2])
test_utils.get_mem_usage_str()

[{'white_player': 'Deac,Bogdan-Daniel', 'black_player': 'Giri,A', 'result': '1/2-1/2', 'white_rating': 2627, 'black_rating': 2780}, {'white_player': 'Lupulescu,C', 'black_player': 'Aronian,L', 'result': '1/2-1/2', 'white_rating': 2656, 'black_rating': 2781}]


'47.9 MiB'

## Exercise 1 (2 points)

The next bit of information we will need in our analysis is the outcome of each player's games paired with their opponent.

Fill out the function `extract_player_results(games)` in the code cell below with the following requirements:

Given `games`, a list of dictionaries containing the metadata for each game, create dictionary `player_results` mapping each player's name to a list of the outcomes of that player's games. Each outcome should include the opponent's name (String) and the number of points that the player received (Float) as the outcome of the game as a Tuple. 

The order of tuples in the list associated with each player should be the **same as the order of the matchups in `games`**. 

You should interpret the value associated with `'result'` as `"<white player points>-<black player points>"` separated by a dash "-". There are three possible outcomes of a game of chess: White wins (`'1-0'`), black wins (`'0-1'`), or draw (`'1/2-1/2'`).

For example, if the input is:

`[{'white_player': 'Dwight Schrute', 'black_player: 'Jim Halpert', 'result': '1-0'}, {'white_player': 'Stanley Hudson', 'black_player': 'Dwight Schrute', 'result': '1/2-1/2'}]`

Then the output should be:

`{'Dwight Schrute': [('Jim Halpert', 1.0), ('Stanley Hudson', 0.5)], 'Jim Halpert': [('Dwight Schrute', 0.0)], 'Stanley Hudson': [('Dwight Schrute', 0.5)]}`

You can assume that each dictionary in `games` will have the keys `'white_player'`, `'black_player'`, and `'result'` and that the values associated with each of those keys are Strings. There may be duplicated matchups where the same two players are paired in the tournament more than once. These cases should be handled the same as any other game and do not require any special treatment.

In [8]:
demo_games_metadata = [{'white_player': 'Dwight Schrute', 'black_player': 'Jim Halpert', 'result': '1-0'}, {'white_player': 'Stanley Hudson', 'black_player': 'Dwight Schrute', 'result': '1/2-1/2'}]

In [9]:
def extract_player_results(games):
    ###
    from collections import defaultdict
    results = defaultdict(list)
    for game in games:
        white, black = game['white_player'], game['black_player']
        if game['result'] == "1-0":
            w_pts = 1.0
        elif game['result'] == "0-1":
            w_pts = 0.0
        else:
            w_pts = 0.5
        b_pts = 1-w_pts
        results[white].append((black,w_pts))
        results[black].append((white,b_pts))
    return results
    ###

# Demo
extract_player_results(demo_games_metadata)

defaultdict(list,
            {'Dwight Schrute': [('Jim Halpert', 1.0), ('Stanley Hudson', 0.5)],
             'Jim Halpert': [('Dwight Schrute', 0.0)],
             'Stanley Hudson': [('Dwight Schrute', 0.5)]})

The test cell below runs your function **many times**. Remove or comment out any `print` statements to avoid generating excessive output.

In [10]:
# `ex1_test`: Test cell
from run_tests import ex1_test
for _ in range(100):
    ex1_test(10, 4, extract_player_results)
print('Passed!')

###
### AUTOGRADER TEST - DO NOT REMOVE
###
test_utils.get_mem_usage_str()

Passed!


'48.1 MiB'

**Run the following cell, even if you skipped Exercise 1.**

We are loading a pre-computed solution that will be used in the following sections. The first two entries are displayed.

In [11]:
# Sample result for ex1
player_results = test_utils.read_pickle('player_results')
{k:v for k, v in list(player_results.items())[:2]}

{'Deac,Bogdan-Daniel': [('Giri,A', 0.5),
  ('Vachier', 1.0),
  ('Mamedyarov,S', 0.5),
  ('Grischuk,A', 0.0),
  ('So,W', 0.5),
  ('Radjabov,T', 0.5),
  ('Lupulescu,C', 0.5),
  ('Aronian,L', 0.0),
  ('Caruana,F', 0.5)],
 'Giri,A': [('Deac,Bogdan-Daniel', 0.5),
  ('Radjabov,T', 0.5),
  ('Lupulescu,C', 0.0),
  ('Aronian,L', 0.5),
  ('Caruana,F', 0.5),
  ('So,W', 0.5),
  ('Vachier', 1.0),
  ('Grischuk,A', 0.5)]}

## Exercise 2 (1 point)

Our next task is to compute the total tournament score for each player.

Fill in the function `calculate_score(player_results)` satisfying the following requirements:

Given a dictionary `player_results` mapping player names to their tournament results (similar to the output of Excercise 1), create a **new** dictionary `player_scores` that maps each player (String) to their total score for the tournament (Float).

For example, given the following input: 

`{'Angela Martin': [('Oscar Martinez', 1.0), ('Kevin Malone', 0.5), ('Andy Bernard', 0.0)], 'Michael Scott': [('Pam Halpert', 0.0), ('Toby Flenderson', 0.0), ('Todd Packer', 0.0)]}`

Your function should output:

`{'Angela Martin': 1.5, 'Michael Scott': 0.0}`

(Michael isn't exactly a chess prodigy...)

You can assume that the lists keyed to each String in the input will be of the form (String, Float). You do not need to worry about verifying that all of the games implied by the input are present. If you look closely at the example, you will see that this is **not** the case.


In [12]:

demo_player_results = {'Angela Martin': [('Oscar Martinez', 1.0), ('Kevin Malone', 0.5), ('Andy Bernard', 0.0)], 'Michael Scott': [('Pam Halpert', 0.0), ('Toby Flenderson', 0.0), ('Todd Packer', 0.0)]}

In [25]:
def calculate_score(player_results):
    ###
    from collections import defaultdict
    results = defaultdict(int)
    lstPlayer = list(player_results.keys())
    totalValue = 0

    for i in lstPlayer:
        results[i]
        
    for keys in player_results:
        x = player_results.get(keys)
        print(x)
        for j in x:
            totalValue+=j[1]
        results[keys]=totalValue
        totalValue=0

    return results
    ###

# Demo
calculate_score(demo_player_results)

[('Oscar Martinez', 1.0), ('Kevin Malone', 0.5), ('Andy Bernard', 0.0)]
[('Pam Halpert', 0.0), ('Toby Flenderson', 0.0), ('Todd Packer', 0.0)]


defaultdict(int, {'Angela Martin': 1.5, 'Michael Scott': 0.0})

The test cell below runs your function **many times**. Remove or comment out any `print` statements to avoid generating excessive output.

In [26]:
# `ex2_test`: Test cell
from run_tests import ex2_test
for _ in range(200):
    ex2_test(10, 4, calculate_score)
print('Passed!')

###
### AUTOGRADER TEST - DO NOT REMOVE
###
test_utils.get_mem_usage_str()

[('Azkdyeb,P', 0.5), ('Hrsegjxyj,Chqcitr', 1.0), ('Gcnfluwzm,Vnyxslvjo', 0.0), ('Tpwpho,Efgxgkvk', 0.0)]
[('Czqlbmszr,Yh', 0.0), ('Putmwzabb,B', 0.0), ('Ukisdsxmvnwl,Fimp', 1.0), ('Eyjqfekd,Fefgb', 0.5)]
[('Cjchhaqyfislp,Avrdll', 1.0), ('Uxnuhy,Mibprfdgs', 0.0), ('Erigto,I', 0.0), ('Jooelhryq,Sggobmqys', 1.0)]
[('Waibncfrzsld,Pey', 0.5), ('Uozedaskd,Ecsik', 0.5), ('Hklpqfeaswydp,Xvzxvylje', 0.5), ('Lupulbzowvft,Kzwhryhr', 0.0)]
[('Zwlpnawqibnh,Z', 0.5), ('Uqbgdcmrz,Nzvrnz', 0.5), ('Xebshbvxdv,Dnl', 0.5), ('Tsdhldabbgq,Qnqkuuon', 1.0)]
[('Ztzhle,Qz', 0.0), ('Ltsbftbsqw,Nj', 0.0), ('Dudant,Iyly', 0.5), ('Mokwjlkdzrz,Xsdzut', 1.0)]
[('Azfvihc,Kuf', 0.0), ('Xigrrwcycvo,Ohxisvt', 1.0), ('Zteqdyr,Acg', 0.0), ('Hbztnwtvu,Qpn', 1.0)]
[('Akvucptnab,Tigwxq', 1.0), ('Rcbbidohia,Ilmsthgy', 0.5), ('Rvcfubfo,Ztzmbxr', 0.5), ('Wqdaonpruyf,Rd', 0.0)]
[('Vfrnnk,Siehwpz', 0.5), ('Fnoykgxn,Ylmovrzh', 0.5), ('Vebynkelxjsq,Upv', 0.5), ('Ozzmlydcv,Gbnmov', 0.5)]
[('Fjmofggt,Edl', 0.5), ('Jjwhjgjic,Gwrabe', 

[('Yawhxlcedq,Rvdf', 0.0), ('Kcdzcjhsqpqum,C', 1.0), ('Vvqfzht,Ffqjlm', 1.0), ('Lxjqmlksxakuv,I', 1.0)]
[('Pnxnbquf,Lgqqn', 0.5), ('Qmsalzsqccw,Awxquipkr', 0.0), ('Kafmoydtxbp,Dh', 0.5), ('Hwzgceaowhbie,Izcs', 1.0)]
[('Ansyabuwkt,V', 0.0), ('Eiokqczh,Jmz', 1.0), ('Gjsqbp,Orpb', 0.0), ('Wesfruakvrgfd,Tl', 1.0)]
[('Eqymkfesr,Qgrup', 0.5), ('Lxuwxn,Gdmfxzwnj', 0.5), ('Divsjc,Fc', 0.0), ('Ifeptf,Ncllbaku', 0.0)]
[('Lwzsoveet,Eugbnomx', 0.0), ('Hsckdcucmsbyn,Dwqop', 0.5), ('Dbbkbmx,Pvszh', 0.0), ('Kxgoiuhxu,Bkrrnpcor', 1.0)]
[('Ovhppf,Wwayfoukt', 0.5), ('Xcymjhbob,Wvgef', 1.0), ('Ojlbfxzwkmgm,Y', 0.0), ('Rgfkhg,Xhcujozg', 1.0)]
[('Csyiumgh,Gqxdvck', 0.0), ('Kfzxnmtmny,Cmnjnxqw', 1.0), ('Xvsvfvmfnhvke,J', 1.0), ('Ukcgbmtmovs,Me', 0.5)]
[('Gveyhrssoc,Tikyajzrx', 0.5), ('Wtxntlxjbsx,Ofbqdy', 1.0), ('Gvwkfxgyl,Bmareyohx', 0.5), ('Tirmauxvag,Ooucy', 1.0)]
[('Ejktxn,Gujdczlw', 0.5), ('Ygupqrwo,Hbnum', 0.5), ('Pxjtqryut,X', 1.0), ('Vopmbdkfhlpfx,Igtuxr', 1.0)]
[('Vdlwrqrper,Li', 0.5), ('Epvnondrtn

'79.5 MiB'

**Run the following cell, even if you skipped Exercise 2.**

We are loading a pre-computed solution that will be used in the following sections. The first two entries are displayed.

In [27]:
# Sample result for ex2
player_scores = test_utils.read_pickle('player_scores')
{k:v for k, v in list(player_scores.items())[:2]}

{'Deac,Bogdan-Daniel': 4.0, 'Giri,A': 4.0}

## Exercise 3 (2 points)

Our next task is to extract the Elo rating of each player from the metadata.

Fill in the function `extract_ratings(games)` to satisfy the following requirements:

Given a list of dictionaries, `games`, create a dictionary `player_ratings` that maps each player to their Elo rating before the tournament. You can assume that each dictionary in `games` will have the following keys and value types: `'white_player'`: (String), `'black_player'`: (String), `'white_rating'`: (Integer), and `'black_rating'`: (Integer).

Additionally, if the same player has different ratings in the input, your function should raise a `ValueError`.

For example:

Input : `[{'white_player': 'Jim Halpert', 'black_player': 'Darryl Philbin', 'white_rating': 1600, 'black_rating': 1800}, {'white_player': 'Darryl Philbin', 'black_player': 'Phyllis Vance', 'white_rating': 1800, 'black_rating': 1700}]`

Output : `{'Darryl Philbin': 1800, 'Jim Halpert': 1600, 'Phyllis Vance': 1700}`

Input : `[{'white_player': 'Jim Halpert', 'black_player': 'Darryl Philbin', 'white_rating': 1600, 'black_rating': 1800}, {'white_player': 'Darryl Philbin', 'black_player': 'Phyllis Vance', 'white_rating': 1850, 'black_rating': 1700}]`

Here `'Darryl Philbin'` has two ratings: 1800 in his first game and 1850 in his second. Your function should raise a `ValueError`!

In [30]:
demo_metadata_good = [{'white_player': 'Jim Halpert', 'black_player': 'Darryl Philbin', 'white_rating': 1600, 'black_rating': 1800}, {'white_player': 'Darryl Philbin', 'black_player': 'Phyllis Vance', 'white_rating': 1800, 'black_rating': 1700}]
demo_metadata_bad = [{'white_player': 'Jim Halpert', 'black_player': 'Darryl Philbin', 'white_rating': 1600, 'black_rating': 1800}, {'white_player': 'Darryl Philbin', 'black_player': 'Phyllis Vance', 'white_rating': 1850, 'black_rating': 1700}]

In [45]:
def extract_ratings(games):
    ###
    from collections import defaultdict
    ratings = defaultdict(set)
    finalOutput = defaultdict(int)

    for i in games:
        whitePlayer, blackPlayer = i['white_player'], i['black_player']
        whiteRating, blackRating = i['white_rating'], i['black_rating']
        ratings[whitePlayer].add(whiteRating)
        ratings[blackPlayer].add(blackRating)

    for keys in ratings:
        if len(ratings[keys]) != 1:
            raise ValueError
        else:
            finalOutput[keys] = ratings[keys].pop()

    return finalOutput
    ###

# Demo
try:
    extract_ratings(demo_metadata_bad)
    print('This should raise a ValueError')
except ValueError:
    print('Correctly raised ValueError')
extract_ratings(demo_metadata_good)

Correctly raised ValueError


defaultdict(int,
            {'Jim Halpert': 1600,
             'Darryl Philbin': 1800,
             'Phyllis Vance': 1700})

The test cell below runs your function **many times**. Remove or comment out any `print` statements to avoid generating excessive output.

In [46]:
# `ex3_test`: Test cell
from run_tests import ex3_test
for _ in range(200):
    ex3_test(10, 4, extract_ratings)
print('Passed!')

###
### AUTOGRADER TEST - DO NOT REMOVE
###


Passed!


**Run the following cell, even if you skipped Exercise 3.**

We are loading a pre-computed solution that will be used in the following sections. The first two entries are displayed.

In [47]:
# Sample result for ex3
player_ratings = test_utils.read_pickle('player_ratings')
{k:v for k, v in list(player_ratings.items())[:2]}

{'Deac,Bogdan-Daniel': 2627, 'Giri,A': 2780}

## Exercise 4 (1 point)

The last task before we begin analysis is to implement some functionality to calculate the expected result of a match based on the Elo ratings of each player.

Fill out the function `expected_match_score(r_player, r_opponent)` to satisfy the following requirements:

Given a player's rating (Integer) and their opponent's rating (Integer), compute the player's expected score in a game against that opponent. The formula for the expected score is:

$$\text{Expected Score} =  \frac{1}{1 + 10^{d}}$$
where 
$$d = \frac{r_{\text{opponent}} - r_{\text{player}}}{400}$$

Output the expected score as a Float. **Do not round**.

For example:

`expected_match_score(1900, 1500)` should return about `0.909`  
`expected_match_score(1500, 1500)` should return about `0.5`  
`expected_match_score(1900, 1700)` should return about `0.76`

In [49]:
demo_ratings = [(1900, 1500), (1500, 1500), (1900, 1700)]

In [50]:
def expected_match_score(r_player, r_opponent):
    ###
    d = (r_opponent - r_player)/400
    expectedScore = 1/(1+10**d)
    return expectedScore
    ###

# Demo
for rp, ro in demo_ratings:
    print(f'expected_match_score({rp}, {ro}) = {expected_match_score(rp, ro)}')

expected_match_score(1900, 1500) = 0.9090909090909091
expected_match_score(1500, 1500) = 0.5
expected_match_score(1900, 1700) = 0.7597469266479578


The test cell below runs your function **many times**. Remove or comment out any `print` statements to avoid generating excessive output.

In [51]:
# `ex4_test`: Test cell
###
### AUTOGRADER TEST - DO NOT REMOVE
###
from run_tests import ex4_test
for _ in range(200):
    ex4_test(expected_match_score)
print('Passed!')

###
### AUTOGRADER TEST - DO NOT REMOVE
###
test_utils.get_mem_usage_str()

Passed!


'79.5 MiB'

## Aside - Functional Programming

It is often useful to write functions which take other functions as arguments. Inside of your function, the functional argument is called in a consistent way. This allows the caller of your function to customize it's behavior. 

Here is an over-engineered arithmetic calculator as an example. These functions define mathematical operations.

In [52]:
# add
def a(a, b):
    return a+b
# subtract
def s(a, b):
    return a-b
# multiply
def m(a, b):
    return a*b
# divide
def d(a,b):
    return a/b

This function, `calc`, takes the two numbers as an argument and a third argument which determines how they are combined.

In [53]:
def calc(a, b, opp):
    return opp(a,b)

Now we can use any function that takes two arguments, like the 4 defined above to determine the behavior of `calc`.

In [54]:
calc(3,5,a)

8

In [55]:
calc(3,5,d)

0.6

# Exercise 5 (3 points)

Our next task is to write some functionality to determine each player's expected tournament score.

Fill in the function `expected_tournament_score(player_results, player_ratings, es_func)` to satisfy the following requirements:

Given a dictionary, `player_results`, mapping players to their tournament results as a list of tuples (similar to the output from Exercise 1) and a dictionary, `player_ratings`, mapping players to their Elo ratings, compute the **total** expected score for each player (you only need to compute total expected score for players that are keys in `player_results`). The total expected score is simply the sum of the expected scores for each of that players games. Output the results as a dictionary mapping players (String) to their expected tournament score (Float).

The third argument `es_func` is a function that takes two arguments (the player's rating and opponent's rating respectively) and returns an "expected score". You should use it to compute the expected scores for this exercise. **It might not be the same as the solution to Exercise 4!**

A call to `es_func(1450, 1575)` inside of your function would compute the "expected score" for the 1450-rated player against a 1575-rated player.

For example given:

`player_results = {'Angela Martin': [('Dwight Schrute', 1.0), ('Stanley Hudson', 0.5)], 'Dwight Schrute': [('Angela Martin', 0.0), ('Jim Halpert', 0.5)]}`

`player_ratings = {'Angela Martin': 1600, 'Dwight Schrute': 1750, 'Stanley Hudson': 1800, 'Jim Halpert': 1700}`

`es_func = lambda r_player, r_opponent: float(r_player - r_opponent)`

The output would be:

`{'Angela Martin': -350.0, 'Dwight Schrute': 200.0}`

In [56]:
demo_player_results = {'Angela Martin': [('Dwight Schrute', 1.0), ('Stanley Hudson', 0.5)], 'Dwight Schrute': [('Angela Martin', 0.0), ('Jim Halpert', 0.5)]}
demo_player_ratings = {'Angela Martin': 1600, 'Dwight Schrute': 1750, 'Stanley Hudson': 1800, 'Jim Halpert': 1700}
demo_es_func = lambda r_player, r_opponent: float(r_player - r_opponent)

In [85]:
def expected_tournament_score(player_results, player_ratings, es_func):
    ###
    from collections import defaultdict
    scoresUpdate = defaultdict(float)

    print(player_results)

    for key in player_results:
        for j in player_results[key]:
            esScore = es_func(player_ratings[key], player_ratings[j[0]])
            scoresUpdate[key] += esScore
            
    return scoresUpdate
    ###

# Demo
expected_tournament_score(demo_player_results, demo_player_ratings, demo_es_func)

{'Angela Martin': [('Dwight Schrute', 1.0), ('Stanley Hudson', 0.5)], 'Dwight Schrute': [('Angela Martin', 0.0), ('Jim Halpert', 0.5)]}


defaultdict(float, {'Angela Martin': -350.0, 'Dwight Schrute': 200.0})

The test cell below runs your function **many times**. Remove or comment out any `print` statements to avoid generating excessive output.

In [86]:
# `ex5_test`: Test cell
from run_tests import ex5_test
for _ in range(200):
    ex5_test(10, 4, expected_tournament_score)
print('Passed!')

###
### AUTOGRADER TEST - DO NOT REMOVE
###
test_utils.get_mem_usage_str()

defaultdict(<class 'list'>, {'Dcxsio,Q': [('Ltttfxhawrm,Qznch', 1.0), ('Xfuvczaikib,Mh', 0.0), ('Eskfagnic,Gmqcedor', 0.0), ('Nvkepvdvbp,Nultuc', 0.0)], 'Zkhnix,Dbclrfna': [('Ynirxfj,Ed', 1.0), ('Cgxtlzficggm,Aj', 0.5), ('Zbgtrcetyk,Oduj', 0.0), ('Jpwjmxmhili,Vc', 0.0)], 'Jdvksrt,Mcf': [('Razwhggicbea,Uycidxes', 0.0), ('Pqgrkqc,Yrtr', 0.0), ('Avsumapfiaiu,Pdu', 0.0), ('Vatkvs,Ic', 0.0)], 'Diwmxiguytu,Cxzfymkw': [('Ncvaylgllql,Wzth', 0.5), ('Gnoiuhfpokor,Ruakcth', 1.0), ('Lvcwmx,Jnvmrzs', 0.0), ('Xfnkwkfh,Uwaavg', 0.0)], 'Fijxqdo,Qkejewcuv': [('Pwrdif,Pn', 1.0), ('Oaxroiqmgy,Iwxsnzm', 1.0), ('Gkvnfkwnr,Nb', 1.0), ('Mepqnqe,Zdearabz', 0.5)], 'Cbredxnp,Afsvhsq': [('Vzhidbbukvacp,Ob', 0.0), ('Zngkwjyxton,Wlmucbn', 1.0), ('Spetruhwsf,Xf', 1.0), ('Xmkbmhknq,Cpreaza', 0.5)], 'Yjyutfbvbgwk,Xvkbebuzl': [('Eyjjpkpajx,Szo', 0.0), ('Mzmbuwdmqp,Qu', 1.0), ('Ysrofua,Hyi', 1.0), ('Diunyiak,Hs', 0.0)], 'Aqyntzf,Dfohwz': [('Rohgtva,Gamkdx', 0.5), ('Hygajod,Jt', 0.5), ('Jizxbvrtlycy,M', 0.5), ('Dogoqaun

'83.3 MiB'

**Run the following cell, even if you skipped Exercise 5.**

We are loading a pre-computed solution that will be used in the following sections. The first two entries are displayed.

In [87]:
# Sample result for ex5
player_expected_score = test_utils.read_pickle('player_expected_score')
{k:v for k, v in list(player_expected_score.items())[:2]}

{'Deac,Bogdan-Daniel': 2.827559638896802, 'Giri,A': 4.389932419673484}

## Exercise 6 (2 points)

Fill in the function `compute_final_ratings(player_scores, expected_player_scores, player_ratings)` to meet the following requirements:

Given three dictionaries:

* `player_scores`: mapping players (String) to their observed tournament scores (Float)  
* `expected_player_scores`: mapping players (String) to their expected tournament scores (Float)  
* `player_ratings`: mapping players (String) to their pre-tournament Elo ratings (Float)  

calculate each player's post-tournament Elo ratings using this formula:

$$\text{Rating}_{\text{post}} = \text{Rating}_{\text{pre}} + 10(\text{Score}_{\text{observed}} - \text{Score}_{\text{expected}})$$

Return a dictionary mapping each player (String) to their post-tournament rating **rounded to the nearest integer**.

You can assume that all keys are common between the three input dictionaries.

For example:

`player_scores = {'Jim Halpert': 3.0, 'Dwight Schrute': 4.0, 'Stanley Hudson': 3.0}`

`expected_player_scores = {'Jim Halpert': 2.736, 'Dwight Schrute': 4.67, 'Stanley Hudson': 2.85}`

`player_ratings = {'Jim Halpert': 1500, 'Dwight Schrute': 1575, 'Stanley Hudson': 1452}`

Results:
`{'Jim Halpert': 1503, 'Dwight Schrute': 1568, 'Stanley Hudson': 1454}`

In [89]:
demo_player_scores = {'Jim Halpert': 3.0, 'Dwight Schrute': 4.0, 'Stanley Hudson': 3.0}
demo_expected_player_scores = {'Jim Halpert': 2.736, 'Dwight Schrute': 4.67, 'Stanley Hudson': 2.85}
demo_player_ratings = {'Jim Halpert': 1500, 'Dwight Schrute': 1575, 'Stanley Hudson': 1452}

In [94]:
def compute_final_ratings(player_scores, expected_player_scores, player_ratings):
    ###
    from collections import defaultdict
    results = defaultdict(int)
    
    for key in player_ratings:
        postRating = player_ratings[key] + (10*(player_scores[key] - expected_player_scores[key]))
        results[key] = round(postRating)
    
    return results
    ###

# Demo
compute_final_ratings(demo_player_scores, demo_expected_player_scores, demo_player_ratings)

defaultdict(int,
            {'Jim Halpert': 1503,
             'Dwight Schrute': 1568,
             'Stanley Hudson': 1454})

The test cell below runs your function **many times**. Remove or comment out any `print` statements to avoid generating excessive output.

In [95]:
# `ex6_test`: Test cell
from run_tests import ex6_test
for _ in range(200):
    ex6_test(10, compute_final_ratings)
print('Passed!')

###
### AUTOGRADER TEST - DO NOT REMOVE
###
test_utils.get_mem_usage_str()

Passed!


'84.3 MiB'

**Run the following cell, even if you skipped Exercise 6.**

We are loading a pre-computed solution that will be used in the following sections. The first two entries are displayed.

In [96]:
# Sample result for ex6
player_final_ratings = test_utils.read_pickle('player_final_ratings')
{k:v for k, v in list(player_final_ratings.items())[:2]}

{'Deac,Bogdan-Daniel': 2639, 'Giri,A': 2776}

## Exercise 7 (2 points)

The last task we have is to compute the change in rating. This isn't just an intermediate step in Exercise 6, because we have to handle some special cases as well.

Fill in the function `compute_deltas(old_ratings, new_ratings)` to meet the following requirements:

Given dictionaries `old_ratings` mapping players (String) to their pre-tournament Elo ratings (Integer) and `new_ratings` mapping players (String) to their post-tournament Elo ratings, determine the change in each player's rating. Return your result as a dictionary mapping players (String) to their delta (Integer).

Compute the delta as $$\Delta = \text{Rating}_{\text{new}} - \text{Rating}_{\text{old}}$$

If a player is not present as a key in the `old_ratings` input but is present as a key in the `new_ratings` input, then assume this is a new player with a starting rating of `1200`. Likewise, if a player is present as a key in `old_ratings` but is not present in `new_ratings`, assume that player did not play in the tournament and their rating is unchanged.

For example:

`old_ratings = {'Ryan Howard': 1755, 'Dwight Schrute': 1675}`

`new_ratings = {'Michael Scott': 1250, 'Ryan Howard': 1750}`

Should return:

`{'Michael Scott': 50, 'Ryan Howard': -5, 'Dwight Schrute': 0}`

In [99]:
demo_old_ratings = {'Ryan Howard': 1755, 'Dwight Schrute': 1675}
demo_new_ratings = {'Michael Scott': 1250, 'Ryan Howard': 1750}

In [103]:
def compute_deltas(old_ratings, new_ratings):
    ###
    from collections import defaultdict
    results = defaultdict(int)
    
    for key in old_ratings:
        results[key] = -123456789
        for new in new_ratings:
            if key == new:
                delta = new_ratings[new] - old_ratings[key]
                results[key] = delta
            elif results[key] == -123456789:
                results[key] = 0
    for key in new_ratings:
        if key not in results:
            results[key] = -123456789
        if results[key] == -123456789:
                results[key] = new_ratings[key] - 1200
    return results
    ###

# Demo
compute_deltas(demo_old_ratings, demo_new_ratings)

defaultdict(int, {'Ryan Howard': -5, 'Dwight Schrute': 0, 'Michael Scott': 50})

The test cell below runs your function **many times**. Remove or comment out any `print` statements to avoid generating excessive output.

In [104]:
# `ex7_test`: Test cell
from run_tests import ex7_test
for _ in range(200):
    ex7_test(10, compute_deltas)
print('Passed!')

###
### AUTOGRADER TEST - DO NOT REMOVE
###
test_utils.get_mem_usage_str()

Passed!


'84.3 MiB'

## Wrapping up
After parsing all of the information from the text file, we can display a summary of the tournament results. 

In [105]:
import pandas as pd
df = pd.DataFrame(index=player_scores.keys())
df['Initial Rating'] = pd.Series(player_ratings)
df['Score'] = pd.Series(player_scores)
df['Expected Score'] = pd.Series(player_expected_score)
df['Final Rating'] = pd.Series(player_final_ratings)
df['Delta'] = pd.Series(test_utils.read_pickle('player_deltas'))
display(df)

Unnamed: 0,Initial Rating,Score,Expected Score,Final Rating,Delta
"Deac,Bogdan-Daniel",2627,4.0,2.82756,2639,12
"Giri,A",2780,4.0,4.389932,2776,-4
"Lupulescu,C",2656,3.5,3.197736,2659,3
"Aronian,L",2781,4.5,4.395254,2782,1
"Grischuk,A",2776,5.0,4.848488,2778,2
Vachier,2760,3.0,4.131705,2749,-11
"Mamedyarov,S",2770,5.5,4.278985,2782,12
"So,W",2770,5.0,4.764597,2772,2
"Caruana,F",2820,3.5,4.876846,2806,-14
"Radjabov,T",2765,3.0,3.288897,2762,-3


**Fin!** You’ve reached the end of this part. Don’t forget to restart and run all cells again to make sure it’s all working when run in sequence; and make sure your work passes the submission process. Good luck!