## Northeastern University Open Sports Data & Analytics Conference Getting 
### Started with IMPECT Open Event Data and [Kloppy](https://kloppy.pysport.org/) (powered by [PySport](https://pysport.org/))

### Install Packages:
- Download Python3.11+ if you don't have it already.
- Make a virtual environment to store and install all the Python packages related to this project.
- Activate the virtual environment (select it as a Kernel for this Jupyter Notebook)

Install the following package to use this notebook:

In [None]:
!pip install "kloppy>=3.18.0" polars pyarrow

### Kloppy

Kloppy is _the_ industry standard open-source soccer data standardization package used by clubs in the English Premier League, Italian Seria A, La Liga, German BundesLiga, Major League Soccer, Dutch Eredivisie etc etc. It is used to standardize data from different data providers into a single format, because each data provider uses its own proprietary formats, event definitions and coordinate systems.

We can use Kloppy to directly load and access [Open IMPECT Event Data](https://github.com/ImpectAPI/open-data).

### IMPECT Open Event Data

[IMPECT](https://www.impect.com/en/) is a big data provider that offers free data event data for the 2023/24 Bundesliga season, available for research purposes.

### 306 Bundesliga Games

We can easily access and see all publicly available competitions using the functionality below. 

1. We load the [**matches file**](https://github.com/ImpectAPI/open-data/blob/main/data/matches/matches_743.json) and the [**squads file**](https://github.com/ImpectAPI/open-data/blob/main/data/squads/squads_743.json) directly from GitHub. 
2. We remove, rename and unpack (unnest) the json files using [**Polars**](https://pola.rs/), an improved and faster alternative to Pandas, to obtain `matches` and `squads`

In [None]:
import polars as pl
import requests
import io

from kloppy.utils import github_resolve_raw_data_url

# 1. Load matches and squads data from IMPECT Open Data repository
match_url = github_resolve_raw_data_url(
    repository="ImpectAPI/open-data",
    branch="main",
    file="data/matches/matches_743.json"
)
squads_url = github_resolve_raw_data_url(
    repository="ImpectAPI/open-data",
    branch="main",
    file="data/squads/squads_743.json"
)

# 2. Load and process matches data
response = requests.get(match_url)
matches = (
    pl.read_json(io.StringIO(response.text))
    .unnest("matchDay")
    .rename({'iterationId': 'competitionId', 'id': 'matchId'})
    .drop(['idMappings', 'lastCalculationDate', 'name', 'available'])
    .with_columns([
        (pl.col("index") + 1).alias("matchDay")
    ])
    .drop("index")
)

response = requests.get(squads_url)
squads = (
    pl.read_json(io.StringIO(response.text))
    .drop(['type', 'gender', 'imageUrl', 'idMappings', 'access', 'countryId'])
)


3. We combine `matches` with `squads` to know the `homeTeam` and `awayTeam` names, which we obtain from the `squads` file.

In [None]:
matches = (
    matches
    .join(
        squads.rename({"name": "homeTeam"}),
        left_on="homeSquadId",
        right_on="id",
        how="left"
    )
    .join(
        squads.rename({"name": "awayTeam"}),
        left_on="awaySquadId",
        right_on="id",
        how="left"
    )
)

Now, we can load one game at a time using Kloppy's `impect.load_open_data` functionality.

In [None]:
from kloppy import impect

match_id = 122838 
dataset = impect.load_open_data(
    match_id=match_id,
    competition_id=743,
)


In [None]:
dataset

### Basic Kloppy Operations

- Transform the [**coordinate system**](https://kloppy.pysport.org/user-guide/concepts/coordinates/) to meters and such that X $\in$ (-52.5, 52.5) and Y $\in$ (-34.0, 34.0) [called "secondspectrum"].
    Note: kloppy supports many different coodinate systems, and even custom coordinate systems.
- Filter for Passes and Shots
- Output to [Polars](https://pola.rs/) dataframe

In [None]:
(
    dataset
    .transform(to_coordinate_system="secondspectrum")  
    .filter(lambda event: event.event_type.name in ["PASS", "SHOT"])
    .to_df(engine="polars")  # or engine="pandas"
)

### Basic Kloppy Functionalities
- [EventDataset](https://kloppy.pysport.org/user-guide/concepts/event-data/)
- [Metadata (players, team names etc.)](https://kloppy.pysport.org/user-guide/concepts/metadata/)
- [Coordinate Systems](https://kloppy.pysport.org/user-guide/concepts/coordinates/#built-in-coordinate-systems)
- [Transformations](https://kloppy.pysport.org/user-guide/transformations/coordinates/)
- [Filter](https://kloppy.pysport.org/user-guide/getting-started/#filtering-data)
- [Exporting to pandas / polars DataFrames](https://kloppy.pysport.org/user-guide/exporting-data/dataframes/)

### Plotting

Use `mplsoccer` and `matplotlib` to plot some different configurations of tracking data.

See [Plotting Examples](https://kloppy.pysport.org/user-guide/getting-started/#exec-51--__tabbed_1_2)