# Analyze All-Time Kicks

Every kick from every match is also saved in a separate branch of the database so that we can query analytics for players and their kicks across multiple matches. We call these the "all-time kicks" dataset.

We have stripped the player user names so that the dataset can be analyzed. This notebook summarizes the dataset.

In [1]:
import sys
sys.path.append("../")

from haxml.utils import is_shot
import json
import pandas as pd

In [2]:
ALL_TIME_KICKS = "../data/all_time_kicks.json"
with open(ALL_TIME_KICKS, "r") as file:
    all_time_kicks = json.load(file)

In [3]:
df = pd.DataFrame(all_time_kicks.values())

## Dataset Summary

In [4]:
df.head()

Unnamed: 0,fromName,fromTeam,fromX,fromY,match,saved,scoreBlue,scoreLimit,scoreRed,stadium,...,type,kick,toName,toTeam,toX,toY,assistName,assistTeam,assistX,assistY
0,Player 0,red,12,0,-MOTVkwbfE_IKa15MVn9,1607903416292,0,2,1,NAFL Official Map v1,...,goal,-MOTVkyxy_n8JAv-qlf9,,,,,,,,
1,Player 1,blue,14,0,-MOTVkwbfE_IKa15MVn9,1607903416293,0,2,1,NAFL Official Map v1,...,steal,-MOTVkyyL55nLXoECmzF,Player 0,red,-201.0,0.0,,,,
2,Player 1,blue,-219,11,-MOTVkwbfE_IKa15MVn9,1607903416294,0,2,1,NAFL Official Map v1,...,steal,-MOTVkyyL55nLXoECmzG,Player 0,red,-253.0,0.0,,,,
3,Player 1,blue,-676,-203,-MOTVkwbfE_IKa15MVn9,1607903416296,0,2,1,NAFL Official Map v1,...,save,-MOTVkyzdyLltWs_jad9,Player 0,red,-677.0,-73.0,,,,
4,Player 1,blue,-651,120,-MOTVkwbfE_IKa15MVn9,1607903416297,1,2,1,NAFL Official Map v1,...,goal,-MOTVkyzdyLltWs_jadA,,,,,,,,


The three stadiums we will spend the most time playing in and analyzing are the two main NAFL (Futsal) maps (1v1, 2v2, 3v3, 4v4) and the Classic HaxBall map. We want to create a model that can be applied across multiple stadiums (as long as the stadium meets our assumptions), but we can focus our analysis and development on these three stadiums.

In [5]:
def summarize_df(df):
    n_matches = df["match"].nunique()
    n_players = df["fromName"].nunique()
    print(f"Rows: {df.shape[0]:,} kicks")
    print(f"Columns: {df.shape[1]:,} features")
    print(f"Matches: {n_matches:,} matches")
    print(f"Players: {n_players:,} unique players")

    
target_stadiums = {
    "NAFL Official Map v1": True,
    "NAFL 1v1/2v2 Map v1": True,
    "Classic": True
}


print("All-Time Kicks:")
summarize_df(df)
print()
print("From Target Stadiums:")
df_target = df[df["stadium"].apply(lambda x: x in target_stadiums)]
summarize_df(df_target)

All-Time Kicks:
Rows: 35,382 kicks
Columns: 22 features
Matches: 457 matches
Players: 278 unique players

From Target Stadiums:
Rows: 34,439 kicks
Columns: 22 features
Matches: 439 matches
Players: 270 unique players


### Table 1. Count of kicks, by stadium.

In [6]:
gp_stadium = df.groupby("stadium")["kick"].count()
pd.DataFrame(gp_stadium.sort_values(ascending=False)).reset_index()

Unnamed: 0,stadium,kick
0,NAFL Official Map v1,29196
1,NAFL 1v1/2v2 Map v1,3637
2,Classic,1606
3,Futsal 3x3 4x4 from HaxMaps,430
4,Futsal 1x1 2x2 from HaxMaps,116
5,Big,105
6,Small,102
7,Futsal 3v3 v2 from HaxMaps,79
8,Futsal 1v1 by Luchooo from HaxMaps,70
9,Easy,26


### Table 2. Count of matches, by stadium.

In [7]:
gp_matches = df.groupby("stadium")["match"].nunique()
pd.DataFrame(gp_matches.sort_values(ascending=False)).reset_index()

Unnamed: 0,stadium,match
0,NAFL Official Map v1,307
1,NAFL 1v1/2v2 Map v1,98
2,Classic,34
3,Futsal 3x3 4x4 from HaxMaps,4
4,Small,3
5,Futsal 1x1 2x2 from HaxMaps,3
6,Quidditch by Pael from HaxMaps,1
7,POWER 4v4 MNC from HaxMaps,1
8,Kafa Topu by Vhagar,1
9,KERO CANZ Power Classic Dark v2 from HaxMaps,1


### Table 3. Count of unique players who kicked (from), by stadium.

In [8]:
gp_players = df.groupby("stadium")["fromName"].nunique()
pd.DataFrame(gp_players.sort_values(ascending=False)).reset_index()

Unnamed: 0,stadium,fromName
0,NAFL Official Map v1,249
1,NAFL 1v1/2v2 Map v1,89
2,Classic,19
3,Futsal 3x3 4x4 from HaxMaps,12
4,Futsal 3v3 v2 from HaxMaps,8
5,Futsal 1x1 2x2 from HaxMaps,7
6,Big,7
7,Small,3
8,KERO CANZ Power Classic Dark v2 from HaxMaps,2
9,Futsal 1v1 by Luchooo from HaxMaps,2


### Table 4. Count of kicks, by stadium and by type.

In [9]:
df_types = df.groupby(["stadium", "type"])["kick"].count()
df_types_stadium = df_types.groupby("stadium", group_keys=False)
df_tsv = df_types_stadium.apply(lambda x: x.sort_values(ascending=False))
df_tsk = df_tsv.sort_index(level=0, key=lambda x: gp_stadium[x], ascending=False)
pd.DataFrame(df_tsk)

Unnamed: 0_level_0,Unnamed: 1_level_0,kick
stadium,type,Unnamed: 2_level_1
NAFL Official Map v1,steal,14958
NAFL Official Map v1,save,4086
NAFL Official Map v1,pass,9114
NAFL Official Map v1,own_goal,211
NAFL Official Map v1,goal,761
NAFL Official Map v1,error,66
NAFL 1v1/2v2 Map v1,steal,1823
NAFL 1v1/2v2 Map v1,save,1152
NAFL 1v1/2v2 Map v1,pass,255
NAFL 1v1/2v2 Map v1,own_goal,95


### Table 5. Fraction of kicks that are goals, by stadium.

In [10]:
"""
Calculate the fraction of kicks that are goals.
"""
def get_goal_fraction(kick_types):
    goals = sum(1 if val == "goal" else 0 for val in kick_types)
    return goals / len(kick_types)


gp_goals = df.groupby("stadium")["type"].agg(get_goal_fraction)
gp_goals_sorted = gp_goals.sort_index(key=lambda x: gp_stadium[x], ascending=False)
pd.DataFrame(gp_goals_sorted).reset_index()

Unnamed: 0,stadium,type
0,NAFL Official Map v1,0.026065
1,NAFL 1v1/2v2 Map v1,0.079736
2,Classic,0.055417
3,Futsal 3x3 4x4 from HaxMaps,0.025581
4,Futsal 1x1 2x2 from HaxMaps,0.051724
5,Big,0.028571
6,Small,0.04902
7,Futsal 3v3 v2 from HaxMaps,0.063291
8,Futsal 1v1 by Luchooo from HaxMaps,0.057143
9,Easy,0.038462
