# Data

In this notebook we will deal with raw data, extract it, enrich and store under data folder for analysis notebook to use.

Note that due to size issue, I have ignored data from github but is uploaded as data.zip. If you want to work on this make sure you uncompress that file first.

## Raw data extraction

Let's first download 2023 archive from hikaru nakamura using public API from chess.com

In [64]:
!pip install chess.com
!pip install jsonlines


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [65]:
#Preparing our chessdotcom client

from chessdotcom import Client

Client.request_config["headers"]["User-Agent"] = (
    "Python project on chess cheating analysis"
    "Contact me at vedad.zornic@gmail.com"
)


In [66]:
from chessdotcom import get_player_games_by_month
import jsonlines

games = []

with jsonlines.open('./data/raw/raw_2023.json', 'w') as writer:
    for x in range(1, 13):
        res = get_player_games_by_month("hikaru", "2023", x)
        games.extend(res.json['games'])
    
    print(len(games))
    writer.write_all(games)


7153


## Data enrichment
Now we have all Hikaru games for 2023 downloaded. Our next step is to enrich and reformat this data, so we can use it more easily in when doing our analysis. We will be focused on enriching each move data along with adding some data to entire game.

<br>

#### PGN
The data that comes from chess.com is in PGN format however this format does not look nice to me neither it is easy to parse in runtime and perform analysis. I want to add additional field to our data which is parsed moves field which will contain list of moves with some metadata attached to each move.

<br>

#### Stockfish

Stockfish is state-of-the-art chess engine which we will be using for extracting some information on the board position. Settings of stockfish we will be using are default stockfish settings.
<br>

### Move

Function below will extract and enhance our move objects with lots of data. Given that we will be using this data in our analysis lets talk about each of them
<br>

##### Eval
This is simply evaluation given by stockfish after current move has been made. As per stockfish documentation this will return either **centipawns** (positive for white advantage, negative for black advantage) or **mate** if there is mate in N moves.
<br><br>


##### Best move
This will be true if current move is one of the best two moves offered by stockfish.
<br><br>


##### Pieces
Number of pieces left (all pieces including pawns) after move has been made.
<br><br>

##### ~~WDL~~ (Deprecated in favor of eval)
This is WDL statistics offered by stockfish after move has been made. Basically this is array of 3 integers on scale 0 - 1000 each representing W/D/L score. Given that stockfish returns this for the player who needs to make move I have reverted W and L to get stats for the player who made the move.
<br><br>

##### Phase

Will represent a phase of a game which may be OPENING, MIDDLEGAME and ENDGAME. As OPENING phase I consider first 10 moves and as ENDGAME I consider situation where there is less than 15 pieces on the board, everything else is MIDDLEGAME.
<br><br>

##### Time took

Time that player took to play the move.
<br><br>

##### Time left

Time that is left for player after move has been made.
<br><br>

##### Under time pressure
Will be true if player has less than of 5 seconds to play the move.
<br><br>

##### ~~WDL Flags~~ (Deprecated in favor of eval flags)
This map contains flags that I've extracted out of WDL which may be interesting for our analysis. For this I'm comparing previous WDL of same player with current WDL of player. These flags are:

- **bad_move** If Loosing chances after this move increased more than threshold (100).
- **keep_advantage_move** If player has positive W score, and it is increased comparing to the last one.
- **take_advantage_move** If player after this move has W score increased by threshold (100).
- **recover_move** If players L score decreases comparing to last one and draw difference is greater than threshold (70).
- **decrease_advantage** If player still has W score positive however it decreased comparing to last one.
<br><br>

##### Eval Flags
Here we have extracted many flags related to chess.com move classification which ranges from Best - Blunder. Besides classifying move we also extracted a flag on whether this was what we call "only move". Now only moves are not same as in chess.com classification as here they simply mean that opponent did make a bad move and our move did take advantage on that.
I have also extracted the information on whether the only move was to take advantage or make more draw chances on lost position and if only move was missed.


In [67]:
!pip install chess
!pip install stockfish



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [68]:
import chess.pgn
import io
from stockfish import Stockfish, models
from concurrent.futures import ThreadPoolExecutor

BLUNDER_WDL_THRESHOLD = 100
TAKE_ADVANTAGE_WDL_THRESHOLD = 100
RECOVER_WDL_THRESHOLD = 70

pool = ThreadPoolExecutor(max_workers=8)

def enrich_game_moves(game):
    stockfish = Stockfish(parameters={"Threads": 1, "Hash": 256, "Minimum Thinking Time": 10})

    moves = []
    parsed_game = chess.pgn.read_game(io.StringIO(game['pgn']))

    time_controls = parsed_game.headers['TimeControl'].split('+')
    time_control = int(time_controls[0])
    increment = 0
    
    
    if len(time_controls) > 1:
        increment = int(time_controls[1])
    
    #Exclude really fast bullet games
    if time_control < 120:
        return None
    
    print(f'Game time control {time_control} with increment: {increment}')
    
    white_player = parsed_game.headers['White']
    black_player = parsed_game.headers['Black']
    white_time = time_control
    black_time = time_control
    
    black_previous_wdl = [0, 1000, 0]
    white_previous_wdl = [0, 1000, 0]
    
    move_count = 0
    pieces = 32 # count number of pieces left
    
    
    for move in parsed_game.mainline_moves():
        top_moves = stockfish.get_top_moves(2)
        if stockfish.will_move_be_a_capture(move.uci()) != models.Stockfish.Capture.NO_CAPTURE:
            pieces-=1
    
        stockfish.make_moves_from_current_position([move.uci()])
        # Calculate WDL after move has been made and switch w/l sides to apply to this move
        wdl = stockfish.get_wdl_stats()   

        enriched_move = {
            'move': move.uci(),
            'eval': stockfish.get_evaluation(),
            'best_move': any(item['Move'] == move.uci() for item in top_moves),
            'pieces': pieces
        }
        
        if wdl is not None:
            enriched_move['wdl'] = {
                'win': wdl[2],
                'draw': wdl[1],
                'loose': wdl[0]
            }
                
        
        current_time = parsed_game.variation(move).clock()
        move_count+=1
        white_move=True
        
        if move_count % 2 == 0:
            white_move=False
            
        if move_count/2 <= 10:
            game_phase = 'OPENING'
        elif pieces <= 15:
            game_phase = 'ENDGAME'
        else:
            game_phase = 'MIDDLEGAME'
            
        enriched_move['phase'] = game_phase
        enriched_move['playing'] = 'White' if white_move else 'Black'
        enriched_move['player'] = white_player if white_move else black_player

        wdl_flags = {}
        if white_move:
            enriched_move['under_time_pressure'] = True if white_time < 5 else False
            time_diff = white_time - current_time
            white_time = current_time + increment
            
            if wdl is not None:
                wdl_flags = compare_wdl(previous=white_previous_wdl, current=wdl)
                white_previous_wdl = wdl.copy()
        else:
            enriched_move['under_time_pressure'] = True if black_time < 5 else False
            time_diff = black_time - current_time
            black_time = current_time + increment
            if wdl is not None:
                wdl_flags = compare_wdl(previous=black_previous_wdl, current=wdl)
                black_previous_wdl = wdl.copy()
        
        enriched_move['time_took'] = time_diff
        enriched_move['time_left'] = white_time if white_move else black_time
        enriched_move['wdl_flags'] = wdl_flags

        moves.append(enriched_move)        
        parsed_game = parsed_game.next()
    
    game['moves'] = moves
    
    return game

def compare_wdl(previous, current):
    data = {}
    
    # Swapping win and loose as we're looking at the current move player
    win_diff = current[2] - previous[2]
    draw_diff = current[1] - previous[1]
    loose_diff = current[0] - previous[0]
    
    if loose_diff > BLUNDER_WDL_THRESHOLD:
        data['bad_move'] = True
        
    if current[2] > previous[2] > 0:
        data['keep_advantage_move'] = True
        
    if win_diff > TAKE_ADVANTAGE_WDL_THRESHOLD:
        data['take_advantage_move'] = True
        
    if current[0] < previous[0] and draw_diff > RECOVER_WDL_THRESHOLD:
        data['recover_move'] = True
        
    if current[2] < previous[2] and win_diff > 0:
        data['decrease_advantage']: True
    return data
    

Now we'll execute enrichment with moves. This process takes quite some time with current stockfish settings (10h+) on my M1 macbook. As I already did it once and stored data there is no need to do this step over again. We can simply read file. If changes are made to enrich_game_moves method from above we will need to uncomment below code and rerun enriching moves.


In [69]:
# with jsonlines.open('./data/enriched/enriched_moves_2023.json', 'w') as writer:
#  for g in pool.map(enrich_game_moves, games):
#     if g is not None:
#         writer.write(g)

After exercising a bit a data I've figured that WDL stats is not so good comparing to stockfish eval thus I have to load data back and enrich it with eval stats. Fortunately I have already saved all the evals within the file so no need to run stockfish again, all we have to do is read file, do enrichment v2 and save again. 

Now evaluation is same what chess.com uses for their eval which gives response in cp or mate if there is mate available. I have found the [chess.com article](https://support.chess.com/article/2965-how-are-moves-classified-what-is-a-blunder-or-brilliant-and-etc) on this  where, my understanding is, they evaluate the move in several categories based on the difference it made from the last move. For example, if there was no difference made (i.e. it was +1 before, and it is +1 now) it is considered Best move.

However, chess.com uses somewhat normalized values (what they call Expected points) where they can range from 0 to 1 instead of raw difference. I couldn't find the logic on how do they normalize cp into expected points, but I did find the following [article](https://www.chessprogramming.org/Pawn_Advantage,_Win_Percentage,_and_Elo) which does seem to be calculation similar to what Expected points are. This calculation will give us probability of winning and the difference between these we will use as Expected points for calculation below.
 
We will use same logic for categorizing our own dataset and will use chess.com table for this. 
<br><br>

| Classification | Lower Limit | Upper Limit |
|----------------|-------------|-------------|
| Best           | 0.00        | 0.00        |
| Excellent      | 0.00        | 0.02        |
| Good           | 0.02        | 0.05        |
| Inaccuracy     | 0.05        | 0.10        |
| Mistake        | 0.10        | 0.20        |
| Blunder        | 0.20        | 1.00        |


Now chess.com does also have some other classifications which are built on top of these such as Great Move or Brilliant Move. As their calculation isn't straight so for the time being we will skip them.


In [70]:
import json
 
def enrich_moves(game):
    game = enrich_with_eval(game)
    return enrich_with_only_move(game)
 
def enrich_with_eval(game):
    if game is None or 'moves' not in game:
        return game
    
    previous_move_eval = None
    
    for move in game['moves']:
        evaluation = evaluate_move(move, previous_move_eval)
        move['eval_flags'] = evaluation
        
        previous_move_eval = evaluation
    
    return game

def enrich_with_only_move(game):
    previous_move = None
    
    if game.get('moves', None) is None:
        return game

    for idx, move in enumerate(game['moves']):
        if previous_move is None:
            previous_move = move
            continue

        best_move = move['best_move'] and (move['eval_flags'].get('excellent', False) or move['eval_flags'].get('best', False))

        if best_move and previous_move['eval_flags'].get('inaccuracy', False):
            move['only_move_on_inaccuracy'] = True

        if best_move and previous_move['eval_flags'].get('mistake', False):
            move['only_move_on_mistake'] = True

        if best_move and previous_move['eval_flags'].get('blunder', False):
            move['only_move_on_blunder'] = True

        opponent_time_diff = move['time_left'] - previous_move['time_left']
        move['opponent_time_diff'] = opponent_time_diff
        move['only_move'] = any([move.get('only_move_on_inaccuracy', False), move.get('only_move_on_mistake', False), move.get('only_move_on_blunder', False)])

        opponent_previous_move_is_bad_move = any([ previous_move['eval_flags'].get('inaccuracy', False), 
                                                     previous_move['eval_flags'].get('mistake', False),
                                                     previous_move['eval_flags'].get('blunder', False)])

        move['only_move_candidate'] = opponent_previous_move_is_bad_move

        if move['only_move']:
            previous_move_by_this_player_index = idx - 2
            if previous_move_by_this_player_index >= 0:
                ONLY_MOVE_RECOVER_THRESHOLD = 0.5
                ONLY_MOVE_TAKE_ADVANTAGE_THRESHOLD = 0.7
                
                previous_move_by_this_player = game['moves'][previous_move_by_this_player_index]
                cp_diff = abs(normalize_cp(move) - normalize_cp(previous_move_by_this_player))
                was_loosing = is_loosing(previous_move_by_this_player)
                
                if was_loosing and cp_diff > ONLY_MOVE_RECOVER_THRESHOLD:
                     move['only_move_recovery'] = True
                     
                     is_winning_now = not is_loosing(move)
                     if is_winning_now:
                         move['only_move_recovery_turning_point'] = True
                     
                elif not was_loosing and cp_diff > ONLY_MOVE_TAKE_ADVANTAGE_THRESHOLD:
                     move['only_move_advantage'] = True    
        elif not move['only_move'] and opponent_previous_move_is_bad_move:
            move['missed_only_move'] = True
                
        previous_move = move

    return game

def is_loosing(move):
    color = move['playing']
    cp = normalize_cp(move)
    
    if color == 'Black':
        return cp > 0
    else:
        return cp < 0

def evaluate_move(move, previous_eval):
    flags = {}
    normalized_eval = normalize_cp(move)
    wp = winning_probability(normalized_eval)
    flags['wp'] = round(wp, 2)
    
    if previous_eval is not None:
        wp_diff = abs(wp - previous_eval['wp'])
        
        if wp_diff > 0.20:
            flags['blunder'] = True
        elif 0.10 < wp_diff <= 0.20:
            flags['mistake'] = True
        elif 0.05 < wp_diff <= 0.10:
            flags['inaccuracy'] = True
        elif 0.02 < wp_diff <= 0.05:
            flags['good'] = True
        elif 0 < wp_diff <= 0.02:
            flags['excellent'] = True
        elif wp_diff == 0:
            flags['best'] = True
    
    return flags

def normalize_cp(move):
    """
        Gets evaluation object and returns normalized eval as eval has two flavors cp and mate. For  cp we're keeping as is and for mate we're either normalizing to -10000 or +10000.
        
        Furthermore, eval is divided by 100.
    """
    evaluation = move['eval']

    if evaluation['type'] == 'cp':
        cp = evaluation['value']
    else:
        cp = 10000 if evaluation['value'] > 0 else -10000
    
    return cp

def winning_probability(cp):
    P = cp/100
    K = 4 
    return 1 / (1 + 10**(-P/K))

with open('./data/enriched/enriched_moves_2023.json', 'r') as json_file:
    games = [json.loads(line) for line in json_file]

enriched_games = list(map(lambda g: enrich_moves(g), games))

with jsonlines.open('data/enriched/enriched_moves_2023-v2.jsonl', 'w') as writer:
    writer.write_all(enriched_games)





### Game


Now that we have enriched our moves data we want to enrich game data with some stats. We will extract same stats for both Hikaru and opponent and in our analysis will use both to see if there are correlation between Hikaru features and opponent features. The data we extract here will be actually data that we will use in our analysis.

Stats that we will extract here are:
<br><br>

##### Was losing but won anyway (range of loosing 1 - 5).
These metrics would tell us on whether the game was lost by [1-5] eval but was won anyway. Useful to understand rate of comebacks in games. 
<br><br>

##### Top two moves, Only moves and classified moves percentages
So for each of the move types we have extracted before we will also extract what was percentage of these in game. Remember that we have extracted the following move types:

- Top two moves which tell us on if the move was one of top two stockfish proposals
- Only move which tell us whether it was best move played after opponent did make a bad move. Here we will also extract relative percentage given based on number of only moves offered in game.
- Best, excellent, good, inaccuracy, mistake and blunder moves percentage.
<br><br>

##### Top two moves, Only moves and classified move count
Same as above just instead of getting average we will get raw move count
<br><br>

##### Top two moves, Only moves and classified moves average time taken
For all the mentioned move types we will gather average time taken per move type.
<br><br>

##### Top two moves, Only moves and classified moves metrics per game phase
Same as 3 last bullets just breakdown per game phase of MIDDLEGAME and ENDGAME. Above ones are aggregated for both as OPPENING moves are excluded.
<br><br>

##### Wins via timeout
Number of games that Hikaru won when opponent lost on time




In [71]:
import statistics

def enrich_game(game):
    if game is None or 'moves' not in game:
        return game
    
    hikaru_moves = list(filter(lambda move: move['player'] == 'Hikaru', game['moves']))
    opponent_moves = list(filter(lambda move: move['player'] != 'Hikaru', game['moves']))
    
    if game['white']['username'] == 'Hikaru':
        hikaru_color = 'white'
        won = game['white']['result'] == 'win'
    else:
        hikaru_color = 'black'
        won = game['black']['result'] == 'win'

    opponent_color = 'white' if hikaru_color == 'black' else 'black'
    
    hikaru_stats = extract_flags(hikaru_moves, won, hikaru_color)
    
    opponent_stats = extract_flags(opponent_moves, not won, opponent_color)
    
    opponent = get_opponent(game)
    hikaru_stats['opponent_username'] = opponent['username']
    hikaru_stats['opponent_rating'] = opponent['rating']
    hikaru_stats['is_timeout_win'] = opponent['result'] == 'timeout'
    
    game['hikaru_stats'] = hikaru_stats
    game['opponent_stats'] = opponent_stats
    
    return game

def extract_flags(moves, won, color): 
    stats = {}
    
    stats['total_moves'] = len(moves)
    stats['top_2_moves_count'] = len(list(filter(lambda move: move['phase'] in ['MIDDLEGAME', 'ENDGAME'] and move['best_move'] == True, moves)))
    stats['only_moves_count'] = len(list(filter(lambda move: move['phase'] in ['MIDDLEGAME', 'ENDGAME'] and move.get('only_move', False), moves)))
    
    stats['top_2_move_avg'] = top_2_moves_average(moves, ['MIDDLEGAME', 'ENDGAME'])
    stats['top_2_move_middlegame_avg'] = top_2_moves_average(moves, ['MIDDLEGAME'])
    stats['top_2_move_endgame_avg'] = top_2_moves_average(moves, ['ENDGAME'])
    
    stats['top_2_move_perc'] = top_2_moves_percentage(moves, ['MIDDLEGAME', 'ENDGAME'])
    stats['top_2_move_middlegame_perc'] = top_2_moves_percentage(moves, ['MIDDLEGAME'])
    stats['top_2_move_endgame_perc'] = top_2_moves_percentage(moves, ['ENDGAME'])

    stats['only_move_avg'] = only_moves_average(moves, ['MIDDLEGAME', 'ENDGAME'])
    stats['only_move_middlegame_avg'] = only_moves_average(moves, ['MIDDLEGAME'])
    stats['only_move_endgame_avg'] = only_moves_average(moves, ['ENDGAME'])

    stats['only_move_perc'] = only_moves_percentage(moves, ['MIDDLEGAME', 'ENDGAME'])
    stats['only_move_middlegame_perc'] = only_moves_percentage(moves, ['MIDDLEGAME'])
    stats['only_move_endgame_perc'] = only_moves_percentage(moves, ['ENDGAME'])

    stats['only_move_relative_perc'] = only_moves_relative_percentage(moves, ['MIDDLEGAME', 'ENDGAME'])
    stats['only_move_relative_middlegame_perc'] = only_moves_relative_percentage(moves, ['MIDDLEGAME'])
    stats['only_move_relative_endgame_perc'] = only_moves_relative_percentage(moves, ['ENDGAME'])

    for move_type in ['best', 'excellent', 'good', 'inaccuracy', 'mistake', 'blunder']:
        stats[f'{move_type}_move_avg'] = eval_moves_average(moves, ['MIDDLEGAME', 'ENDGAME'], move_type)
        stats[f'{move_type}_middlegame_move_avg'] = eval_moves_average(moves, ['MIDDLEGAME'], move_type)
        stats[f'{move_type}_endgame_move_avg'] = eval_moves_average(moves, ['ENDGAME'], move_type)
        
        stats[f'{move_type}_move_perc'] = eval_moves_percentage(moves, ['MIDDLEGAME', 'ENDGAME'], move_type)
        stats[f'{move_type}_middlegame_move_perc'] = eval_moves_percentage(moves, ['MIDDLEGAME'], move_type)
        stats[f'{move_type}_endgame_move_perc'] = eval_moves_percentage(moves, ['ENDGAME'], move_type)
        
        stats[f'{move_type}_move_count'] = eval_moves_count(moves, ['MIDDLEGAME', 'ENDGAME'], move_type)
        stats[f'{move_type}_middlegame_move_count'] = eval_moves_count(moves, ['MIDDLEGAME'], move_type)
        stats[f'{move_type}_endgame_move_count'] = eval_moves_count(moves, ['ENDGAME'], move_type)

    
    # switch sign of eval when white to negate cp and check whether player is loosing 
    eval_multiplier = -1 if color == 'white' else 1
      
    was_loosing_gte_1 = any(normalize_cp(move)/100 * eval_multiplier >= 1 for move in moves)
    was_loosing_gte_2 = any(normalize_cp(move)/100 * eval_multiplier >= 2 for move in moves)
    was_loosing_gte_3 = any(normalize_cp(move)/100 * eval_multiplier >= 3 for move in moves)
    was_loosing_gte_4 = any(normalize_cp(move)/100 * eval_multiplier >= 4 for move in moves)
    was_loosing_gte_5 = any(normalize_cp(move)/100 * eval_multiplier >= 5 for move in moves)

    stats['was_losing_gte_1_but_won'] = was_loosing_gte_1 and won
    stats['was_losing_gte_2_but_won'] = was_loosing_gte_2 and won
    stats['was_losing_gte_3_but_won'] = was_loosing_gte_3 and won
    stats['was_losing_gte_4_but_won'] = was_loosing_gte_4 and won
    stats['was_losing_gte_5_but_won'] = was_loosing_gte_5 and won
    
    stats['win'] = won
    
    return stats    

def get_opponent(game):
    """
        Returns opponent object from chess.com response
    """
    return game['white'] if game['white']['username'] != 'Hikaru' else game['black']

def eval_moves_percentage(all_moves, phases, move_type):
    """
        Calculates percentage of moves out of total moves with phase and move type filter applied
    """
    total_moves = list(filter(lambda move: move['phase'] in phases, all_moves))
    target_moves = list(filter(lambda move: move['eval_flags'].get(move_type, False), total_moves))
    
    if len(target_moves) > 0:
        return len(target_moves)/len(total_moves) * 100
    return 0

def eval_moves_average(all_moves, phases, move_type):
    """
       Calculates simple average of move with phase and move type filter applied
    """
    target_moves = list(filter(lambda move: move['phase'] in phases and move['eval_flags'].get(move_type, False), all_moves))
    
    if len(target_moves) > 0:
        return statistics.mean(list(map(lambda move: move['time_took'], target_moves)))
    return 0

def eval_moves_count(all_moves, phases, move_type):
    """
       Returns number of moves for phase and move_type filter applied
    """
    target_moves = list(filter(lambda move: move['phase'] in phases and move['eval_flags'].get(move_type, False), all_moves))
    
    return len(target_moves)

def top_2_moves_percentage(all_moves, phases):
    """
        Calculates percentage of best moves out of total moves with phase filter applied
    """
    total_moves = list(filter(lambda move: move['phase'] in phases, all_moves))
    best_moves = list(filter(lambda move: move['best_move'], total_moves))
    
    if len(best_moves) > 0:
        return len(best_moves)/len(total_moves) * 100
    return 0

def top_2_moves_average(all_moves, phases):
    """
       Calculates simple average of best move with phase filter applied
    """
    best_moves = list(filter(lambda move: move['phase'] in phases and move['best_move'], all_moves))
    
    if len(best_moves) > 0:
        return statistics.mean(list(map(lambda move: move['time_took'], best_moves)))
    return 0

def only_moves_percentage(all_moves, phases):
    """
        Calculates percentage of only moves out of total moves with phase filter applied
    """
    total_moves = list(filter(lambda move: move['phase'] in phases, all_moves))
    best_moves = list(filter(lambda move: move.get('only_move', False), total_moves))
    
    if len(best_moves) > 0:
        return len(best_moves)/len(total_moves) * 100
    return 0

def only_moves_relative_percentage(all_moves, phases):
    """
        Calculates relative percentage of only moves out of total moves with phase filter applied related to total only move candidates
    """
    total_moves = list(filter(lambda move: move['phase'] in phases and move['only_move_candidate'], all_moves))
    best_moves = list(filter(lambda move: move.get('only_move', False), total_moves))
    
    if len(best_moves) > 0:
        return len(best_moves)/len(total_moves) * 100
    return 0

def only_moves_average(all_moves, phases):
    """
       Calculates simple average of only move with phase filter applied
    """
    best_moves = list(filter(lambda move: move['phase'] in phases and move.get('only_move', False), all_moves))
    
    if len(best_moves) > 0:
        return statistics.mean(list(map(lambda move: move['time_took'], best_moves)))
    return 0


In [72]:
import json

with open('./data/enriched/enriched_moves_2023-v2.jsonl', 'r') as json_file:
    games = [json.loads(line) for line in json_file]

enriched_games = list(map(lambda g: enrich_game(g), games))
    
with jsonlines.open('data/enriched/enriched_games_2023.jsonl', 'w') as writer:
    writer.write_all(enriched_games)
        

Now we have entire data set with our flags/stats set lets move to analysis notebook.