## Northeastern University Open Sports Data & Analytics Conference Getting 
### Started with IMPECT Open Event Data and [Kloppy](https://kloppy.pysport.org/) (powered by [PySport](https://pysport.org/))

### Install Packages:
- Download Python3.11+ if you don't have it already.
- Make a virtual environment to store and install all the Python packages related to this project.
- Activate the virtual environment (select it as a Kernel for this Jupyter Notebook)

Install the following package to use this notebook:

In [1]:
!pip install "kloppy>=3.18.0" polars pyarrow




[notice] A new release of pip is available: 25.2 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


### Kloppy

Kloppy is _the_ industry standard open-source soccer data standardization package used by clubs in the English Premier League, Italian Seria A, La Liga, German BundesLiga, Major League Soccer, Dutch Eredivisie etc etc. It is used to standardize data from different data providers into a single format, because each data provider uses its own proprietary formats, event definitions and coordinate systems.

We can use Kloppy to directly load and access [Open IMPECT Event Data](https://github.com/ImpectAPI/open-data).

### IMPECT Open Event Data

[IMPECT](https://www.impect.com/en/) is a big data provider that offers free data event data for the 2023/24 Bundesliga season, available for research purposes.

### 306 Bundesliga Games

We can easily access and see all publicly available competitions using the functionality below. 

1. We load the [**matches file**](https://github.com/ImpectAPI/open-data/blob/main/data/matches/matches_743.json) and the [**squads file**](https://github.com/ImpectAPI/open-data/blob/main/data/squads/squads_743.json) directly from GitHub. 
2. We remove, rename and unpack (unnest) the json files using [**Polars**](https://pola.rs/), an improved and faster alternative to Pandas, to obtain `matches` and `squads`

In [3]:
import polars as pl
import requests
import io

from kloppy.utils import github_resolve_raw_data_url

# 1. Load matches and squads data from IMPECT Open Data repository
match_url = github_resolve_raw_data_url(
    repository="ImpectAPI/open-data",
    branch="main",
    file="data/matches/matches_743.json"
)
squads_url = github_resolve_raw_data_url(
    repository="ImpectAPI/open-data",
    branch="main",
    file="data/squads/squads_743.json"
)

# 2. Load and process matches data
response = requests.get(match_url)
matches = (
    pl.read_json(io.StringIO(response.text))
    .unnest("matchDay")
    .rename({'iterationId': 'competitionId', 'id': 'matchId'})
    .drop(['idMappings', 'lastCalculationDate', 'name', 'available'])
    .with_columns([
        (pl.col("index") + 1).alias("matchDay")
    ])
    .drop("index")
)

response = requests.get(squads_url)
squads = (
    pl.read_json(io.StringIO(response.text))
    .drop(['type', 'gender', 'imageUrl', 'idMappings', 'access', 'countryId'])
)


3. We combine `matches` with `squads` to know the `homeTeam` and `awayTeam` names, which we obtain from the `squads` file.

In [4]:
matches = (
    matches
    .join(
        squads.rename({"name": "homeTeam"}),
        left_on="homeSquadId",
        right_on="id",
        how="left"
    )
    .join(
        squads.rename({"name": "awayTeam"}),
        left_on="awaySquadId",
        right_on="id",
        how="left"
    )
)

Now, we can load one game at a time using Kloppy's `impect.load_open_data` functionality.

In [5]:
from kloppy import impect

match_id = 122838 
dataset = impect.load_open_data(
    match_id=match_id,
    competition_id=743,
)



You are about to use IMPECT public data.
By using this data, you are agreeing to the user agreement. 
The user agreement can be found here: https://github.com/ImpectAPI/open-data/blob/main/LICENSE.pdf



In [6]:
dataset

<EventDataset record_count=3057>

### Basic Kloppy Operations

- Transform the [**coordinate system**](https://kloppy.pysport.org/user-guide/concepts/coordinates/) to meters and such that X $\in$ (-52.5, 52.5) and Y $\in$ (-34.0, 34.0) [called "secondspectrum"].
    Note: kloppy supports many different coodinate systems, and even custom coordinate systems.
- Filter for Passes and Shots
- Output to [Polars](https://pola.rs/) dataframe

In [7]:
(
    dataset
    .transform(to_coordinate_system="secondspectrum")  
    .filter(lambda event: event.event_type.name in ["PASS", "SHOT"])
    .to_df(engine="polars")  # or engine="pandas"
)

event_id,event_type,period_id,timestamp,end_timestamp,ball_state,ball_owning_team,team_id,player_id,coordinates_x,coordinates_y,end_coordinates_x,end_coordinates_y,receiver_player_id,body_part_type,set_piece_type,result,success,pass_type,is_under_pressure
str,str,i64,duration[μs],duration[μs],str,str,str,str,f64,f64,f64,f64,str,str,str,str,bool,str,bool
"""4858179098""","""PASS""",1,0µs,332ms,"""alive""","""33""","""33""","""204""",0.0,0.0,,,,"""RIGHT_FOOT""","""KICK_OFF""","""INCOMPLETE""",false,,
"""4858179102""","""PASS""",1,4s 192999µs,6s 904999µs,"""alive""","""33""","""33""","""1202""",-30.9,-0.1,13.4,27.8,,"""RIGHT_FOOT""",,"""INCOMPLETE""",false,,
"""4858179104""","""PASS""",1,6s 905100µs,8s 260ms,"""alive""","""33""","""38""","""1028""",-13.4,-27.8,-20.1,-24.9,"""13599""","""HEAD""",,"""COMPLETE""",true,"""HEAD_PASS""",true
"""4858179108""","""PASS""",1,11s 740100µs,14s 175ms,"""alive""","""33""","""33""","""50321""",-9.5,14.3,-9.6,34.0,,"""HEAD""",,"""INCOMPLETE""",false,"""HEAD_PASS""",true
"""4858179110""","""PASS""",1,24s 350ms,25s 161ms,"""alive""","""38""","""38""","""1028""",9.6,-34.0,13.3,-24.5,"""9550""","""KEEPER_ARM""","""THROW_IN""","""COMPLETE""",true,"""HAND_PASS""",
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
"""4858182062""","""PASS""",2,50m 29s 618999µs,50m 34s 489999µs,"""alive""","""38""","""38""","""109588""",41.4,27.0,47.9,-27.8,,"""RIGHT_FOOT""",,"""INCOMPLETE""",false,,true
"""4858182065""","""PASS""",2,50m 39s 229ms,50m 40s 712ms,"""alive""","""33""","""33""","""32432""",-29.0,25.5,-15.7,9.4,"""7594""","""RIGHT_FOOT""",,"""COMPLETE""",true,,
"""4858182068""","""PASS""",2,50m 42s 126ms,50m 43s 678ms,"""alive""","""33""","""33""","""7594""",-9.2,13.5,3.0,6.6,"""929""","""RIGHT_FOOT""",,"""COMPLETE""",true,,true
"""4858182070""","""PASS""",2,50m 43s 678100µs,50m 44s 981ms,"""alive""","""33""","""33""","""929""",3.0,6.6,0.3,-4.9,"""216""","""RIGHT_FOOT""",,"""COMPLETE""",true,,


### Basic Kloppy Functionalities
- [EventDataset](https://kloppy.pysport.org/user-guide/concepts/event-data/)
- [Metadata (players, team names etc.)](https://kloppy.pysport.org/user-guide/concepts/metadata/)
- [Coordinate Systems](https://kloppy.pysport.org/user-guide/concepts/coordinates/#built-in-coordinate-systems)
- [Transformations](https://kloppy.pysport.org/user-guide/transformations/coordinates/)
- [Filter](https://kloppy.pysport.org/user-guide/getting-started/#filtering-data)
- [Exporting to pandas / polars DataFrames](https://kloppy.pysport.org/user-guide/exporting-data/dataframes/)

### Plotting

Use `mplsoccer` and `matplotlib` to plot some different configurations of tracking data.

See [Plotting Examples](https://kloppy.pysport.org/user-guide/getting-started/#exec-51--__tabbed_1_2)