# Loading and Investigating World Cup Data

In this notebook, we will understand how to load and inspect event data of Women's World Cup matches. We follow the Prof. David Sumpter's [video](https://www.youtube.com/watch?v=GTtuOt03FM0&ab_channel=FriendsofTracking) for understanding how to download the data and inspect it using Python. During the course of this notebook, we will assume that both Statsbomb and Wyscout data is available in the `data` directory. URLs to download the data are provided in the *References* section.

The event data is provided in JSON files, so we need to import the `json` package to load these files. We will need `matplotlib` to plot the data and `numpy` to transform the data.

In [None]:
import json

import matplotlib.pyplot as plt
import numpy as np

First, we will use the Statsbomb data. Let us load information about the competitions for which data is available.

In [None]:
with open("./data/statsbomb/data/competitions.json", "r") as f:
    competitions: list = json.load(f)

We have a list of 19 competitions covered in the Statsbomb data. Let us look at the information of the first competition.

In [None]:
competitions[0]

In this notebook, we want to inspect data for the 2019 Women's World Cup. Its competition ID is `72`.

In [None]:
competition_id: int = 72

Let us load information about all matches from the competition.

In [None]:
with open(f"./data/statsbomb/data/matches/{competition_id}/30.json", "r") as f:
    matches: list = json.load(f)

There were 52 matches played during the World Cup.

In [None]:
len(matches)

Let us now print the result of every match in the World Cup. It will help us understand the structure of match result.

In [None]:
match: dict
for match in matches:
    home_team_name: str = match["home_team"]["country"]["name"]
    away_team_name: str = match["away_team"]["country"]["name"]
    home_score: int = match["home_score"]
    away_score: int = match["away_score"]
    print(f"The match between {home_team_name} and {away_team_name} finished {home_score}-{away_score}")

Let us consider the final of the World Cup between the USA and Netherlands and find its match ID.

In [None]:
required_home_team: str = "United States of America"
required_away_team: str = "Netherlands"

In [None]:
for match in matches:
    home_team_name: str = match["home_team"]["country"]["name"]
    away_team_name: str = match["away_team"]["country"]["name"]
    if (home_team_name == required_home_team) and (away_team_name == required_away_team):
        required_match_id: int = match["match_id"]

print(f"{required_home_team} vs {required_away_team} has ID: {required_match_id}")

Let us now load the event data for this match based on its ID.

In [None]:
with open(f"./data/statsbomb/data/events/{required_match_id}.json", "r") as f:
    match_events: list = json.load(f)

This is the event data that we can use for various purposes like creating different kinds of plot and building models like expected goals. The first part of this data contains information about lineups and formations. After that, all information about events that happened on the ball are captured.

## References
- [Statsbomb event data](https://github.com/statsbomb/open-data)
- [Wyscout event data](https://figshare.com/collections/Soccer_match_event_dataset/4415000/5)
- [Loading in and investigating World Cup data in Python](https://www.youtube.com/watch?v=GTtuOt03FM0&ab_channel=FriendsofTracking)
- [Making Your Own Shot and Pass Maps](https://www.youtube.com/watch?v=oOAnERLiN5U&ab_channel=FriendsofTracking)