# Analyze All-Time Kicks

Every kick from every match is also saved in a separate branch of the database so that we can query analytics for players and their kicks across multiple matches. We call these the "all-time kicks" dataset.

We have stripped the player user names so that the dataset can be analyzed. This notebook summarizes the dataset.

In [1]:
import sys
sys.path.append("../")

from haxml.utils import (
    is_target_stadium,
    is_shot
)
import json
import pandas as pd

In [2]:
ALL_TIME_KICKS = "../data/all_time_kicks.json"
with open(ALL_TIME_KICKS, "r") as file:
    all_time_kicks = json.load(file)

In [3]:
df = pd.DataFrame(all_time_kicks.values())

## Dataset Summary

In [4]:
df.head().T

Unnamed: 0,0,1,2,3,4
fromName,Player 0,Player 1,Player 1,Player 1,Player 1
fromTeam,red,blue,blue,blue,blue
fromX,12,14,-219,-676,-651
fromY,0,0,11,-203,120
match,-MOTVkwbfE_IKa15MVn9,-MOTVkwbfE_IKa15MVn9,-MOTVkwbfE_IKa15MVn9,-MOTVkwbfE_IKa15MVn9,-MOTVkwbfE_IKa15MVn9
saved,1607903416292,1607903416293,1607903416294,1607903416296,1607903416297
scoreBlue,0,0,0,0,1
scoreLimit,2,2,2,2,2
scoreRed,1,1,1,1,1
stadium,NAFL Official Map v1,NAFL Official Map v1,NAFL Official Map v1,NAFL Official Map v1,NAFL Official Map v1


The three stadiums we will spend the most time playing in and analyzing are the two main NAFL (Futsal) maps (1v1, 2v2, 3v3, 4v4) and the Classic HaxBall map. We want to create a model that can be applied across multiple stadiums (as long as the stadium meets our assumptions), but we can focus our analysis and development on these three stadiums.

In [5]:
def summarize_df(df):
    n_matches = df["match"].nunique()
    n_players = df["fromName"].nunique()
    print(f"Rows: {df.shape[0]:,} kicks")
    print(f"Columns: {df.shape[1]:,} features")
    print(f"Matches: {n_matches:,} matches")
    print(f"Players: {n_players:,} unique players")

    
print("All-Time Kicks:")
summarize_df(df)
print()
print("From Target Stadiums:")
df_target = df[df["stadium"].apply(is_target_stadium)]
summarize_df(df_target)

All-Time Kicks:
Rows: 149,875 kicks
Columns: 22 features
Matches: 1,890 matches
Players: 711 unique players

From Target Stadiums:
Rows: 139,541 kicks
Columns: 22 features
Matches: 1,701 matches
Players: 675 unique players


### Table 1. Count of kicks, by stadium.

In [6]:
gp_stadium = df.groupby("stadium")["kick"].count()
pd.DataFrame(gp_stadium.sort_values(ascending=False)).reset_index()

Unnamed: 0,stadium,kick
0,NAFL Official Map v1,113082
1,NAFL 1v1/2v2 Map v1,22327
2,Classic,4132
3,Futsal 3x3 4x4 from HaxMaps,3505
4,HBA Fixed Map,808
5,Small,782
6,Futsal 1x1 2x2 from HaxMaps,577
7,🎮Kralın Futsal 3v🎮,559
8,BFF Classic v3,433
9,hb.jakjus.com WATER POLO,390


### Table 2. Count of matches, by stadium.

In [7]:
gp_matches = df.groupby("stadium")["match"].nunique()
pd.DataFrame(gp_matches.sort_values(ascending=False)).reset_index()

Unnamed: 0,stadium,match
0,NAFL Official Map v1,1086
1,NAFL 1v1/2v2 Map v1,543
2,Classic,72
3,Futsal 3x3 4x4 from HaxMaps,35
4,Futsal 1x1 2x2 from HaxMaps,20
5,Small,16
6,HBA Fixed Map,15
7,Strong Ball Classic from HaxMaps,10
8,Esquinas by Boom from HaxMaps,7
9,BFF Classic v3,7


### Table 3. Count of unique players who kicked (from), by stadium.

In [8]:
gp_players = df.groupby("stadium")["fromName"].nunique()
pd.DataFrame(gp_players.sort_values(ascending=False)).reset_index()

Unnamed: 0,stadium,fromName
0,NAFL Official Map v1,604
1,NAFL 1v1/2v2 Map v1,275
2,Futsal 3x3 4x4 from HaxMaps,53
3,Classic,52
4,Esquinas by Boom from HaxMaps,19
5,Futsal 1x1 2x2 from HaxMaps,18
6,hb.jakjus.com WATER POLO,18
7,Small,17
8,Big,15
9,Strong Ball Classic from HaxMaps,14


### Table 4. Count of kicks, by stadium and by type.

Showing target stadiums only.

In [12]:
df_types = df_target.groupby(["stadium", "type"])["kick"].count()
df_types_stadium = df_types.groupby("stadium", group_keys=False)
df_tsv = df_types_stadium.apply(lambda x: x.sort_values(ascending=False))
df_tsk = df_tsv.sort_index(level=0, key=lambda x: gp_stadium[x], ascending=False)
pd.DataFrame(df_tsk)

TypeError: sort_index() got an unexpected keyword argument 'key'

### Table 5. Fraction of kicks that are goals, by stadium.

In [13]:
"""
Calculate the fraction of kicks that are goals.
"""
def get_goal_fraction(kick_types):
    goals = sum(1 if val == "goal" else 0 for val in kick_types)
    return goals / len(kick_types)


gp_goals = df.groupby("stadium")["type"].agg(get_goal_fraction)
gp_goals_sorted = gp_goals.sort_index(key=lambda x: gp_stadium[x], ascending=False)
pd.DataFrame(gp_goals_sorted).reset_index()

TypeError: sort_index() got an unexpected keyword argument 'key'