# Análise de Dados Soccer2D

## Introdução

Esta oficina proporcionará uma introdução prática e direta à análise de dados utilizando os dados do coletados de partidas da RoboCup Soccer Simulation 2D, que é uma competição em que dois times de 11 robôs autônomos simulados joguem uma partida de futebol contra outros times.

## Objetivos

1. Processar dados de uma partida;
2. Extrair insights úteis a partir de dados coletados;
3. Identificar as estratégias utilizadas pelas melhores equipes da competição;

Antes de iniciarmos as análises, nós precisamos que algumas bibliotecas já estejam disponíveis. São elas:

- [Numpy](http://www.numpy.org/) para todas as operações matriciais;
- [Matplotlib](https://matplotlib.org/) para a plotagem dos gráficos;
- [Pandas](https://pandas.pydata.org/docs/index.html) que contém funções e ferramentas de manipulação e visualização de dados.
- [Seaborn](https://seaborn.pydata.org/index.html) para visualização dos dados;
- [Scikit-learn](https://scikit-learn.org/stable/index.html) para utilização de algoritmos de machine learning e outras funções úteis.

In [1]:
# # permite o acesso aos dados
# from google.colab import drive
# drive.mount('/content/drive/')

In [2]:
import pandas as pd
import matplotlib as plt
import numpy as np
import math

## 1. Processamento dos dados

Nesta etapa, carregaremos o dataset e, em seguida, o dividiremos para facilitar a análise

In [3]:
# ajuste o path para a localização do arquivo csv
# df = pd.read_csv('/content/drive/MyDrive/semuni24/Oxsy_1-vs-YuShan2023_3-2.csv')
df = pd.read_csv('Oxsy_1-vs-YuShan2023_3-2.csv')

In [4]:
# visualizando os dados
df.head()

Unnamed: 0,show_time,playmode,team_name_l,team_name_r,team_score_l,team_score_r,team_pen_score_l,team_pen_score_r,team_pen_miss_l,team_pen_miss_r,...,player_r11_counting_dash,player_r11_counting_turn,player_r11_counting_catch,player_r11_counting_move,player_r11_counting_turn_neck,player_r11_counting_change_view,player_r11_counting_say,player_r11_counting_tackle,player_r11_counting_point_to,player_r11_counting_attention_to
0,1,kick_off_l,Oxsy,YuShan2023,0,0,0,0,0,0,...,1,98,0,1,100,1,0,0,0,1
1,2,kick_off_l,Oxsy,YuShan2023,0,0,0,0,0,0,...,2,98,0,1,101,2,0,0,0,2
2,3,kick_off_l,Oxsy,YuShan2023,0,0,0,0,0,0,...,2,99,0,1,102,2,0,0,0,2
3,4,kick_off_l,Oxsy,YuShan2023,0,0,0,0,0,0,...,2,100,0,1,103,2,0,0,0,2
4,5,kick_off_l,Oxsy,YuShan2023,0,0,0,0,0,0,...,2,101,0,1,104,2,0,0,0,3


In [5]:
# visualizando todas as colunas e uma lista
df.columns.to_list()

['show_time',
 'playmode',
 'team_name_l',
 'team_name_r',
 'team_score_l',
 'team_score_r',
 'team_pen_score_l',
 'team_pen_score_r',
 'team_pen_miss_l',
 'team_pen_miss_r',
 'ball_x',
 'ball_y',
 'ball_vx',
 'ball_vy',
 'player_l1_side',
 'player_l1_unum',
 'player_l1_type',
 'player_l1_state',
 'player_l1_x',
 'player_l1_y',
 'player_l1_vx',
 'player_l1_vy',
 'player_l1_body',
 'player_l1_neck',
 'player_l1_point_to_x',
 'player_l1_point_to_y',
 'player_l1_view_quality',
 'player_l1_view_width',
 'player_l1_attribute_stamina',
 'player_l1_attribute_effort',
 'player_l1_attribute_recovery',
 'player_l1_attribute_stamina_capacity',
 'player_l1_focus_side',
 'player_l1_counting_kick',
 'player_l1_counting_dash',
 'player_l1_counting_turn',
 'player_l1_counting_catch',
 'player_l1_counting_move',
 'player_l1_counting_turn_neck',
 'player_l1_counting_change_view',
 'player_l1_counting_say',
 'player_l1_counting_tackle',
 'player_l1_counting_point_to',
 'player_l1_counting_attention_to',


As colunas do dataset contém informações acerca da partida e de cada um dos jogadores de cada time.

Para facilitar a análise, iremos separar o dataset em 3 dataset menores, organizados da seguinte forma:

partida:
- cycle: ciclo do servidor (~ 0.1 segundo)
- stopped: indica se o jogo está parado ou não
- playmode: estado atual do jogo (escanteio, pênalti, falta, etc)
- [LADO]_name: nome do time que está em cada lado
- [LADO]_score: gols de cada time
- [LADO]_pen_score: gols de pênalti
- b_[COORD]: coordenada x ou y da bola
- b_v[COORD]: velocidade da bola no eixo x ou y

times:
- [LADO][NUM_JOG]_t: tipo do jogador
- [LADO][NUM_JOG][COORD]: coordenada x ou y do jogador
- [LADO][NUM_JOG]_v[COORD]: velocidade do jogador
- [LADO][NUM_JOG]_body: angulação do corpo do jogador, variando entre (-180, 180) graus
- [LADO][NUM_JOG]_neck: angulação do pescoço do jogador, variando entre (-90, 90) graus

Chamaremos de `team_l` o time que estiver à esquerda do campo e de `team_r` o time que estiver à direita.

Também iremos remover um espaço em branco antes do nome de cada coluna.

In [6]:
# remove espaco em branco
df.columns = df.columns.str.lstrip()

In [7]:
game_cols = ['playmode', 'team_name_l', 'team_name_r', 'team_score_l', 'team_score_r',
             'team_pen_score_l', 'team_pen_score_r', 'ball_x', 'ball_y', 'ball_vx', 'ball_vy']

In [8]:
game = df[game_cols]
teams = df.drop(columns=game_cols)

In [9]:
game.head()

Unnamed: 0,playmode,team_name_l,team_name_r,team_score_l,team_score_r,team_pen_score_l,team_pen_score_r,ball_x,ball_y,ball_vx,ball_vy
0,kick_off_l,Oxsy,YuShan2023,0,0,0,0,0.0,0.0,0.0,0.0
1,kick_off_l,Oxsy,YuShan2023,0,0,0,0,0.0,0.0,0.0,0.0
2,kick_off_l,Oxsy,YuShan2023,0,0,0,0,0.0,0.0,0.0,0.0
3,kick_off_l,Oxsy,YuShan2023,0,0,0,0,0.0,0.0,0.0,0.0
4,kick_off_l,Oxsy,YuShan2023,0,0,0,0,0.0,0.0,0.0,0.0


In [10]:
teams.head()

Unnamed: 0,show_time,team_pen_miss_l,team_pen_miss_r,player_l1_side,player_l1_unum,player_l1_type,player_l1_state,player_l1_x,player_l1_y,player_l1_vx,...,player_r11_counting_dash,player_r11_counting_turn,player_r11_counting_catch,player_r11_counting_move,player_r11_counting_turn_neck,player_r11_counting_change_view,player_r11_counting_say,player_r11_counting_tackle,player_r11_counting_point_to,player_r11_counting_attention_to
0,1,0,0,l,1,7,9,-34.6,0.0,0.0,...,1,98,0,1,100,1,0,0,0,1
1,2,0,0,l,1,7,9,-34.6,0.0,0.0,...,2,98,0,1,101,2,0,0,0,2
2,3,0,0,l,1,7,9,-34.6,0.0,0.0,...,2,99,0,1,102,2,0,0,0,2
3,4,0,0,l,1,7,9,-34.6,0.0,0.0,...,2,100,0,1,103,2,0,0,0,2
4,5,0,0,l,1,7,9,-34.6,0.0,0.0,...,2,101,0,1,104,2,0,0,0,3


## 2. Exploração dos dados

Nesta etapa, focaremos em explorar os dados que temos

A coluna `playmode` descreve modos do jogo que podem ocorrer durante a partida. Cada modo corresponde à um evento, semelhante ao futebol da vida real.

In [11]:
game['playmode'].unique()

array(['kick_off_l', 'play_on', 'kick_in_r', 'goal_r', 'foul_charge_l',
       'free_kick_r', 'kick_in_l', 'goal_l', 'kick_off_r', 'offside_r',
       'free_kick_l', 'foul_charge_r', 'goal_kick_r', 'goal_kick_l',
       'corner_kick_l', 'time_over'], dtype=object)

Segue a descrição de cada modo de jogo:

- **kick_off_[LADO]**: Início da partida com o time à esquerda ou direita dando o pontapé inicial.
- **play_on**: O jogo está em andamento normalmente, sem interrupções.
- **kick_in_[LADO]**: O time à direita ou esquerda executa um arremesso lateral.
- **goal_[LADO]**: Gol marcado pelo time à direita ou esquerda.
- **foul_charge_[LADO]**: Falta cometida pelo time à esquerda ou direita por carga irregular.
- **free_kick_[LADO]**: Cobrança de falta para o time à direita ou esquerda.
- **kick_off_[LADO]**: Início da partida ou reinício com o time à direita ou esquerda dando o pontapé inicial.
- **offside_[LADO]**: Impedimento do time à direita ou esquerda.
- **goal_kick_[LADO]**: Tiro de meta cobrado pelo time à direita ou esquerda.
- **corner_kick_[LADO]**: Cobrança de escanteio pelo time à esquerda ou direita.
- **time_over**: O tempo regulamentar do jogo terminou.

## 3. Analisando os dados

### 3.1 Funções e classes auxiliares

Antes de começarmos a análise, vale definir algumas funções e classes que podem ser úteis durante o processo.

Para simplificar, optamos por implementar todas as funções e classes auxiliares nos arquivos `helpers.py` e `utils.py`, respectivamente. Portanto, agora basta importar esses arquivos.

No entanto, recomendamos que você abra e entenda o código desses arquivos para ampliar o conhecimento da análise como um todo.

In [12]:
import helpers

#### Posse de bola

para calcular a posse de bola, consideraremos que o time mais próximo da bola terá o domínio

In [13]:
# pega as colunas referentes às coordenadas de cada jogador
players_left = []
players_right = []
for i in range(1, 12):
    players_left.append((f'player_l{i}_x', f'player_l{i}_y'))
    players_right.append((f'player_r{i}_x', f'player_r{i}_y'))
players = [players_left, players_right]

In [14]:
# pega os momentos em que o jogo não estava parado
filtered_game = df[df['playmode'] == 'play_on']
time_left = time_right = 0

for idx, row in filtered_game.iterrows():
  # usa função auxiliar at() para determinar a posse
  possession_side = helpers.at(idx, filtered_game, players)

  if possession_side == 'left':
    time_left += 1
  else:
    time_right += 1

total_time = time_left + time_right
possession_left = time_left/ total_time
possession_right = time_right/ total_time

print(f"l = {possession_left} r = {possession_right}")

l = 0.375030599755202 r = 0.6249694002447981


#### Passes


In [15]:
correct_passes_l = 0
wrong_passes_l = 0
intercepted_passes_l = 0

correct_passes_r = 0
wrong_passes_r = 0
intercepted_passes_r = 0

pass_r = False
pass_l = False

player_right_position = helpers.Point()
player_left_position = helpers.Point()

for current_cycle, row in df.iterrows():
    #Right Passing
    if not pass_r:
        pass_r, player_who_kicked = helpers.kick(current_cycle, 'r', df, True) # Checks if a pass occurred
    else:
        pass_l = False

        if df['playmode'][current_cycle] == 'kick_in_l': # Ball out by right team
            pass_r  = False
            wrong_passes_r += 1
            continue

        possession, player_who_possesses = helpers.define_player_possession(current_cycle, player_left_position, 
                                                                            player_right_position, players, df, True)

        if possession == 'right': #ball to the same team
            pass_r = False
            if player_who_kicked != player_who_possesses: # only counts if the pass is to another player
                correct_passes_r += 1


        if possession == 'left': # enemy intercepted
            pass_r = False
            intercepted_passes_l += 1
            try:
                for l in range(5):
                    if df['playmode'][current_cycle+l] in ['kick_off_l', 'foul_charge_l']:
                        intercepted_passes_l -= 1
                        break
            except:
                pass

    # Left Passing
    if not pass_l:
        pass_l, player_who_kicked = helpers.kick(current_cycle, 'l', df, True) # Checks if a pass occurred
    else:
        pass_r = False

        if df['playmode'][current_cycle] == 'kick_in_r': # Ball out by left team
            pass_l  = False
            wrong_passes_l += 1
            continue

        possession, player_who_possesses = helpers.define_player_possession(current_cycle, player_left_position, 
                                                                            player_right_position, players, df, True)
        if possession == 'left': # ball to the same team
            pass_l = False
            if player_who_kicked != player_who_possesses: # only counts if the pass is to another player
                correct_passes_l += 1


        if possession == 'right': # enemy intercepted
            pass_l = False
            intercepted_passes_r += 1
            try:
                for l in range(5):
                    if df['playmode'][current_cycle+l] in ['kick_off_r',  'foul_charge_r']:
                        intercepted_passes_r -= 1
                        break
            except:
                pass

left_team_total_passes = correct_passes_l + wrong_passes_l + intercepted_passes_r
right_team_total_passes = correct_passes_r + wrong_passes_r + intercepted_passes_l

left_team_passing_accuracy = correct_passes_l/(left_team_total_passes)
right_team_passing_accuracy = correct_passes_r/(right_team_total_passes)

left_team_completed_passes = correct_passes_l
right_team_completed_passes = correct_passes_r

left_team_interceptions = intercepted_passes_l
right_team_interceptions = intercepted_passes_r

team_name_left = df['team_name_l'][0]
team_name_right = df['team_name_r'][0]

print(
    f"""
    ==========================
    {team_name_left}

    passes totais: {left_team_total_passes}
    passes completos: {left_team_completed_passes}
    acurácia: {left_team_passing_accuracy:.2f}
    interceptações: {left_team_interceptions}
    
    ==========================
    {team_name_right}

    passes totais: {right_team_total_passes}
    passes completos: {right_team_completed_passes}
    acurácia: {right_team_passing_accuracy:.2f}
    interceptações: {right_team_interceptions}

    ===========================
    """
)


    Oxsy

    passes totais: 122
    passes completos: 104
    acurácia: 0.85
    interceptações: 20
    
    YuShan2023

    passes totais: 276
    passes completos: 253
    acurácia: 0.92
    interceptações: 17

    


### Escanteios

In [21]:
# retorna uma lista com o ciclo em que o escanteio ocorreu para cada time
left_occurrencies = helpers.find_last_unique_event_ocurrences(df, "corner_kick_l")
right_occurrencies = helpers.find_last_unique_event_ocurrences(df, "corner_kick_r")

print(f"left occurencies at cycles: {left_occurrencies}")
print("====================")
print(f"right occurencies at cycles: {right_occurrencies}")

left occurencies at cycle: [6358]
right occurencies at cycle: []


### Faltas

In [17]:
left_charges, right_charges = helpers.analyze_fouls(df)

print("position of left foul charges:")
for f in left_charges:
    print(f)

print("position of right foul charges:")
for f in right_charges:
    print(f)

position of left foul charges:
x: -51 - y: -29
position of right foul charges:
x: 38 - y: -25
x: 23 - y: -27


### Defesas do Goleiro

In [22]:
catches, adversary_goal_quantity, distances, ball_positions, goalie_positions = helpers.analyze_goalkeeper(df, players, "left")

print(
    f"""
    Team {df['team_name_l'][0]} goalie's stats:

    catches: {catches}
    goals given: {adversary_goal_quantity}
    distances of the goalkeeper from the ball: {distances}
    positions of the goalie when the goals occured: {goalie_positions}
    positions of the ball when the goals occured: {ball_positions}
    """    
)


    Team Oxsy goalie's stats:

    catches: 2
    goals given: 4
    distances of the goalkeeper from the ball: [2.063761073864895, 5.811892505888254, 88.21718849096246, 2.8081142159819645]
    positions of the goalie when the goals occured: [<helpers.Point object at 0x7b4b835f3430>, <helpers.Point object at 0x7b4b835f25c0>, <helpers.Point object at 0x7b4b835f3a00>, <helpers.Point object at 0x7b4b835f3820>]
    positions of the ball when the goals occured: [<helpers.Point object at 0x7b4b835f1360>, <helpers.Point object at 0x7b4b835f0a30>, <helpers.Point object at 0x7b4b835f3850>, <helpers.Point object at 0x7b4b835f31f0>]
    


### Chutes e Gols esperados (xG)