# **Expected Goals Classifier**

### Overview

Create an Expected Goals (xG) classification model using existing historical match data to produce actionable recommendations which can be utilized in technical and tactical analysis to improve goal-scoring.

Project detailed on Github: [Expected Goals Classifier]()

# Data Extraction Notebook

*Notebook 1 of 7*

### Index

1. Data extracted in [expected_goals_data_extraction_notebook]()
2. Features engineered in [expected_goals_feature_engineering_notebook]()
3. Data cleaned in [expected_goals_data_cleaning_notebook]()
4. Data explored in [expected_goals_data_exploration_notebook]()
5. Data preprocessed in [expected_goals_data_preprocessing_notebook]()
6. Modeling in [expected_goals_model_fitting_notebook]()
7. Conclusions in [expected_goals_model_assessment_notebook]()

### Data

Data sourced from [StatsBomb](https://statsbomb.com/), a United Kingdom based football (soccer) data analytics company.

StatsBomb has provided free access to their proprietary dataset via GitHub: [StatsBomb Open Data](https://github.com/statsbomb/open-data)

StatsBomb Open Data is organized in JSON files:
* **[Matches](https://github.com/statsbomb/open-data/tree/master/data/matches)**
  * Folders organized by competition (league or tournament)
    * Files organized by season (year) ID
    * Files contain nested dictionaries with descriptive data for each individual match
* **[Events](https://github.com/statsbomb/open-data/tree/master/data/events)**
  * Files organized by match ID
  * Files contain nested dictionaries with descriptive data for each event within each individual match

# Packages

In [None]:
# rpy2 to run R
%load_ext rpy2.ipython

# Drive  and IO to access saved files
from google.colab import drive, files
drive.mount('/content/drive')

import io

# PyPy to improve speed
!apt-get install pypy

# warnings to ignore warnings
import warnings
warnings.filterwarnings('ignore')

# Pathlib for file retrieval
import pathlib
from pathlib import Path as path

# Statsbombpy package for extracting StatsBomb data
!pip install statsbombpy
from statsbombpy import sb

# Pandas for dataframes
import pandas as pd

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Reading package lists... Done
Building dependency tree       
Reading state information... Done
pypy is already the newest version (5.10.0+dfsg-3build2).
0 upgraded, 0 newly installed, 0 to remove and 40 not upgraded.


## Matches Data

In [None]:
# View competitions available through StatsBomb Open Data

competitions_df = sb.competitions()
competitions_df.head

credentials were not supplied. open data access only


2021-09-13 19:51:57,602 - INFO     - NumExpr defaulting to 2 threads.


https://wdmz0nmjc2-496ff2e9c6d22116-40000-colab.googleusercontent.com/dtale/main/1

In [None]:
print('Available Competitions:',
      competitions_df['competition_name'].unique())

credentials were not supplied. open data access only
Available Competitions: ['Champions League' "FA Women's Super League" 'FIFA World Cup' 'La Liga'
 'NWSL' 'Premier League' "Women's World Cup"]


In [None]:
# Isolate target competions from StatsBomb Open Data
# Women's competitions

target_comp_df = competitions_df.loc[competitions_df['competition_gender'] == 'female']

target_comp_ids = target_comp_df['competition_id'].unique()

target_season_ids = target_comp_df['season_id'].unique()

credentials were not supplied. open data access only
credentials were not supplied. open data access only


In [None]:
print('Target Competitions:',
      target_comp_df['competition_name'].unique(),
      '\n',
      'Target competition_ids:',
      target_comp_ids,
      '\n',
      'Target season_ids:',
      target_season_ids)

Target Competitions: ["FA Women's Super League" 'NWSL' "Women's World Cup"] 
 Target competition_ids: [37 49 72] 
 Target season_ids: [42  4  3 30]


In [None]:
# Refine target competitions
# Women's club competitions

target_comp_df = competitions_df.loc[competitions_df['competition_id'].isin([37, 49])]

target_comp_ids = target_comp_df['competition_id'].unique()

target_season_ids = target_comp_df['season_id'].unique()

credentials were not supplied. open data access only
credentials were not supplied. open data access only


In [None]:
print('Target Competitions:',
      target_comp_df['competition_name'].unique(),
      '\n',
      'Target competition_ids:',
      target_comp_ids,
      '\n',
      'Target season_ids:',
      target_season_ids)

Target Competitions: ["FA Women's Super League" 'NWSL'] 
 Target competition_ids: [37 49] 
 Target season_ids: [42  4  3]


In [None]:
print('Number of Seasons:',
      len(target_season_ids))

Number of Seasons: 3


In [None]:
# Create dataframes for the matches in each season of the target competitions

matches_df_37_42 = sb.matches(competition_id = 37,
                              season_id = 42)

matches_df_37_4 = sb.matches(competition_id = 37,
                             season_id = 4)

matches_df_49_3 = sb.matches(competition_id = 49,
                             season_id = 3)

credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only


In [None]:
# Combine dataframes for the matches in each season of the target leagues

matches_df = pd.concat([matches_df_37_42,
                        matches_df_37_4,
                        matches_df_49_3],
                       ignore_index = True)

In [None]:
matches_df.head()

Unnamed: 0,match_id,match_date,kick_off,competition,season,home_team,away_team,home_score,away_score,match_status,match_status_360,last_updated,last_updated_360,match_week,competition_stage,stadium,referee,data_version,shot_fidelity_version,xy_fidelity_version
0,2275054,2020-01-05,15:00:00.000,England - FA Women's Super League,2019/2020,Brighton & Hove Albion WFC,Liverpool WFC,1,0,available,unscheduled,2020-07-29T05:00,,11,Regular Season,The People's Pension Stadium,A. Fearn,1.1.0,2,2
1,2275072,2020-01-05,13:30:00.000,England - FA Women's Super League,2019/2020,Chelsea FCW,Reading WFC,3,1,available,unscheduled,2020-07-29T05:00,,11,Regular Season,The Cherry Red Records Stadium,S. Pearson,1.1.0,2,2
2,2275085,2020-01-05,15:00:00.000,England - FA Women's Super League,2019/2020,Tottenham Hotspur Women,Manchester City WFC,1,4,available,unscheduled,2020-07-29T05:00,,11,Regular Season,The Hive Stadium,H. Conley,1.1.0,2,2
3,2275113,2020-01-19,16:00:00.000,England - FA Women's Super League,2019/2020,West Ham United LFC,Brighton & Hove Albion WFC,2,1,available,unscheduled,2020-07-29T05:00,,13,Regular Season,The Rush Green Stadium,Ryan Atkin,1.1.0,2,2
4,2275142,2020-01-05,13:00:00.000,England - FA Women's Super League,2019/2020,Manchester United,Bristol City WFC,0,1,available,unscheduled,2020-10-20T18:35:32.568528,,11,Regular Season,Leigh Sports Village Stadium,L. Oliver,1.1.0,2,2


In [None]:
print('Total Matches:', len(matches_df))

Total Matches: 230


In [None]:
# Save matches_df

matches_df.to_parquet('/content/drive/MyDrive/flatiron/expected_goals/data_extraction/dataframes/matches_df.parquet')

In [None]:
print('matches_df Filesize:',
      path('/content/drive/MyDrive/flatiron/expected_goals/data_extraction/dataframes/matches_df.parquet').stat().st_size,
      'bytes')

matches_df Filesize: 20604 bytes


## Shots Events Data

In [None]:
# Create dataframes for the target events in each season of the target competitions
# Shots

shots_df_37_42 = sb.competition_events(country = 'England',
                                       division = "FA Women's Super League",
                                       season = '2018/2019',
                                       gender = 'female',
                                       split = True)['shots']

shots_df_37_4 = sb.competition_events(country = 'England',
                                      division = "FA Women's Super League",
                                      season = '2019/2020',
                                      gender = 'female',
                                      split = True)['shots']

shots_df_49_3 = sb.competition_events(country = 'United States of America',
                                      division = 'NWSL',
                                      season = '2018',
                                      gender = 'female',
                                      split = True)['shots']

credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data acces

In [None]:
# Combine dataframes for the target events in each season of the target leagues

shots_df = pd.concat([shots_df_37_42,
                      shots_df_37_4,
                      shots_df_49_3],
                     ignore_index = True)

In [None]:
shots_df.head()

Unnamed: 0,id,index,period,timestamp,minute,second,type,possession,possession_team,play_pattern,team,player,position,location,duration,under_pressure,related_events,match_id,shot_statsbomb_xg,shot_end_location,shot_key_pass_id,shot_technique,shot_outcome,shot_type,shot_body_part,shot_freeze_frame,shot_one_on_one,shot_aerial_won,shot_open_goal,shot_first_time,out,shot_redirect,shot_deflected,off_camera,shot_saved_off_target,shot_saved_to_post,shot_follows_dribble
0,8f5a3b7c-db0b-42ec-bac0-adc0bedca2ea,258,1,00:04:38.609,4,38,Shot,11,Chelsea FCW,Regular Play,Chelsea FCW,Francesca Kirby,Center Forward,"[109.0, 46.0]",0.2788,True,"[011167bc-9cbc-46a3-9b7b-28065eab7af1, 2c37831...",19743,0.266154,"[112.0, 45.0]",bf82ea91-c3e3-4d8c-b91d-c9d0ccd44f11,Normal,Blocked,Open Play,Left Foot,"[{'location': [104.0, 50.0], 'player': {'id': ...",,,,,,,,,,,
1,60ead7a6-4aa2-41ab-85a1-21357f50e4e0,542,1,00:11:45.046,11,45,Shot,24,Chelsea FCW,From Free Kick,Chelsea FCW,Bethany England,Left Midfield,"[113.0, 35.0]",0.25673,True,"[a4b77cbb-14d0-4bd3-ba8b-7312335098fe, b9b246c...",19743,0.093521,"[120.0, 32.9, 0.4]",b99082e1-812b-48dd-bf94-8856b1ff079b,Normal,Off T,Open Play,Head,"[{'location': [108.0, 45.0], 'player': {'id': ...",True,True,,,,,,,,,
2,f68deb6f-0711-4b9d-8081-122dc3722c55,614,1,00:18:03.461,18,3,Shot,29,Chelsea FCW,Regular Play,Chelsea FCW,Drew Spence,Left Defensive Midfield,"[94.0, 43.0]",1.147883,True,"[3c03553f-3bed-4d21-8096-ed4ef269da62, bb13e23...",19743,0.036171,"[120.0, 42.8, 0.5]",5022d0b3-ea32-42a8-bd41-b46cc244beb9,Normal,Saved,Open Play,Left Foot,"[{'location': [118.0, 41.0], 'player': {'id': ...",,,,,,,,,,,
3,f301190f-cc0a-4f16-8278-27e5279ea24e,877,1,00:23:11.935,23,11,Shot,43,Birmingham City WFC,From Goal Kick,Birmingham City WFC,Chloe Arthur,Right Back,"[86.0, 34.0]",2.161012,True,"[0bfe1b6c-d690-41a6-be3e-f9b6295ddd85, 570e15b...",19743,0.016625,"[119.0, 33.3, 0.5]",fdf4a564-4973-46e5-bc07-d84785f8c183,Normal,Off T,Open Play,Left Foot,"[{'location': [78.0, 58.0], 'player': {'id': 1...",,,,,,,,,,,
4,8558535e-b1ee-4f53-b003-1b5fba2712bd,892,1,00:23:45.810,23,45,Shot,44,Chelsea FCW,From Goal Kick,Chelsea FCW,Bethany England,Left Midfield,"[94.0, 33.0]",1.225187,,[1455cb46-43a3-4e6f-b845-171abcd344bc],19743,0.030716,"[120.0, 34.8, 0.5]",37712221-3b0b-4090-a30c-08a3ee6492be,Normal,Off T,Open Play,Right Foot,"[{'location': [117.0, 40.0], 'player': {'id': ...",,,,,,,,,,,


In [None]:
print('Total Shot Events:',
      len(shots_df))

Total Shot Events: 6080


In [None]:
print('Total Shot Features:',
      shots_df.shape[1])

Total Shot Features: 37


In [None]:
# Save shots_df

shots_df.to_parquet('/content/drive/MyDrive/flatiron/expected_goals/data_extraction/dataframes/shots_df.parquet')

In [None]:
print('shots_df Filesize:',
      path('/content/drive/MyDrive/flatiron/expected_goals/data_extraction/dataframes/shots_df.parquet').stat().st_size,
      'bytes')

shots_df Filesize: 1588328 bytes


##Pass Events Data

In [None]:
# Create dataframes for the target events in each season of the target competitions
# Passes

passes_df_37_42 = sb.competition_events(country = 'England',
                                        division = "FA Women's Super League",
                                        season = '2018/2019',
                                        gender = 'female',
                                        split = True)['passes']

passes_df_37_4 = sb.competition_events(country = 'England',
                                       division = "FA Women's Super League",
                                       season = '2019/2020',
                                       gender = 'female',
                                       split = True)['passes']

passes_df_49_3 = sb.competition_events(country = 'United States of America',
                                       division = 'NWSL',
                                       season = '2018',
                                       gender = 'female',
                                       split = True)['passes']

credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data access only
credentials were not supplied. open data acces

In [None]:
# Combine dataframes for the target events in each season of the target leagues

passes_df = pd.concat([passes_df_37_42,
                       passes_df_37_4,
                       passes_df_49_3],
                      ignore_index = True)

In [None]:
passes_df.head()

Unnamed: 0,id,index,period,timestamp,minute,second,type,possession,possession_team,play_pattern,team,player,position,location,duration,related_events,match_id,pass_recipient,pass_length,pass_angle,pass_height,pass_end_location,pass_body_part,pass_type,under_pressure,pass_outcome,pass_aerial_won,pass_assisted_shot_id,pass_shot_assist,off_camera,pass_switch,pass_through_ball,pass_technique,pass_backheel,pass_cross,counterpress,pass_cut_back,pass_deflected,pass_goal_assist,pass_miscommunication,pass_inswinging,pass_straight,pass_outswinging,pass_no_touch,out
0,667dda2e-b35d-4d46-ad09-40b3f491f160,5,1,00:00:01.324,0,1,Pass,2,Chelsea FCW,From Kick Off,Chelsea FCW,Francesca Kirby,Center Forward,"[61.0, 41.0]",1.228695,[8dc92bd7-d6a0-4d60-b24e-b0352d135b62],19743,Sophie Ingle,9.848858,2.723368,Ground Pass,"[52.0, 45.0]",Right Foot,Kick Off,,,,,,,,,,,,,,,,,,,,,
1,932644ad-d6be-4cf3-b6e2-048d4e9e1651,8,1,00:00:03.388,0,3,Pass,2,Chelsea FCW,From Kick Off,Chelsea FCW,Sophie Ingle,Right Defensive Midfield,"[53.0, 45.0]",1.693583,[d0b87c61-dff3-424c-8543-d8fdf4267fe7],19743,Magdalena Ericsson,21.213203,-2.356194,Ground Pass,"[38.0, 30.0]",Right Foot,,,,,,,,,,,,,,,,,,,,,,
2,0bb993c0-3201-4169-ac17-a594b5dd66c1,11,1,00:00:05.122,0,5,Pass,2,Chelsea FCW,From Kick Off,Chelsea FCW,Magdalena Ericsson,Left Center Back,"[38.0, 30.0]",1.257417,[fc41344a-265d-4216-8802-0f56add7b85f],19743,Millie Bright,22.561028,1.794273,Ground Pass,"[33.0, 52.0]",Left Foot,,,,,,,,,,,,,,,,,,,,,,
3,175af0fd-bd34-4cc7-bbf6-e82802734429,14,1,00:00:09.208,0,9,Pass,2,Chelsea FCW,From Kick Off,Chelsea FCW,Millie Bright,Right Center Back,"[36.0, 57.0]",1.58506,[7e99b3a1-2880-46e6-8d35-fd38e1adb9cd],19743,Jessica Carter,21.540659,1.19029,Ground Pass,"[44.0, 77.0]",Right Foot,,,,,,,,,,,,,,,,,,,,,,
4,db69f6dd-0f11-4091-8896-bc55158021ee,18,1,00:00:12.945,0,12,Pass,2,Chelsea FCW,From Kick Off,Chelsea FCW,Jessica Carter,Right Back,"[38.0, 74.0]",2.457301,"[076e57c9-375d-4c9a-8fc5-6945b1cf7f43, e0f3fb6...",19743,Rut Hedvig Lindahl,36.05551,-2.55359,Ground Pass,"[8.0, 54.0]",Right Foot,,True,,,,,,,,,,,,,,,,,,,,


In [None]:
print('Total Pass Events:',
      len(passes_df))

Total Pass Events: 208122


In [None]:
print('Total Pass Features:',
      passes_df.shape[1])

Total Pass Features: 45


In [None]:
# Save passes_df

passes_df.to_parquet('/content/drive/MyDrive/flatiron/expected_goals/data_extraction/dataframes/passes_df.parquet')

In [None]:
print('passes_df Filesize:',
      path('/content/drive/MyDrive/flatiron/expected_goals/data_extraction/dataframes/passes_df.parquet').stat().st_size,
      'bytes')

passes_df Filesize: 26022813 bytes


## Merge Data

In [None]:
# Merge pass data from passes_df with shots_df
# Match shots_df 'shot_key_pass_id' to passes_df 'id'

passes_df2 = passes_df.rename(columns = {'id': 'shot_key_pass_id'})

extracted_data = pd.merge(shots_df, passes_df2, on = ['shot_key_pass_id'], how = 'left')

In [None]:
extracted_data.head()

Unnamed: 0,id,index_x,period_x,timestamp_x,minute_x,second_x,type_x,possession_x,possession_team_x,play_pattern_x,team_x,player_x,position_x,location_x,duration_x,under_pressure_x,related_events_x,match_id_x,shot_statsbomb_xg,shot_end_location,shot_key_pass_id,shot_technique,shot_outcome,shot_type,shot_body_part,shot_freeze_frame,shot_one_on_one,shot_aerial_won,shot_open_goal,shot_first_time,out_x,shot_redirect,shot_deflected,off_camera_x,shot_saved_off_target,shot_saved_to_post,shot_follows_dribble,index_y,period_y,timestamp_y,...,second_y,type_y,possession_y,possession_team_y,play_pattern_y,team_y,player_y,position_y,location_y,duration_y,related_events_y,match_id_y,pass_recipient,pass_length,pass_angle,pass_height,pass_end_location,pass_body_part,pass_type,under_pressure_y,pass_outcome,pass_aerial_won,pass_assisted_shot_id,pass_shot_assist,off_camera_y,pass_switch,pass_through_ball,pass_technique,pass_backheel,pass_cross,counterpress,pass_cut_back,pass_deflected,pass_goal_assist,pass_miscommunication,pass_inswinging,pass_straight,pass_outswinging,pass_no_touch,out_y
0,8f5a3b7c-db0b-42ec-bac0-adc0bedca2ea,258,1,00:04:38.609,4,38,Shot,11,Chelsea FCW,Regular Play,Chelsea FCW,Francesca Kirby,Center Forward,"[109.0, 46.0]",0.2788,True,"[011167bc-9cbc-46a3-9b7b-28065eab7af1, 2c37831...",19743,0.266154,"[112.0, 45.0]",bf82ea91-c3e3-4d8c-b91d-c9d0ccd44f11,Normal,Blocked,Open Play,Left Foot,"[{'location': [104.0, 50.0], 'player': {'id': ...",,,,,,,,,,,,253.0,1.0,00:04:35.786,...,35.0,Pass,11.0,Chelsea FCW,Regular Play,Chelsea FCW,Bethany England,Left Midfield,"[95.0, 49.0]",1.361685,"[58da4d74-7684-405d-a8cc-bef1d658f1b6, 60d1337...",19743.0,Francesca Kirby,11.18034,0.463648,Ground Pass,"[105.0, 54.0]",Left Foot,,True,,,8f5a3b7c-db0b-42ec-bac0-adc0bedca2ea,True,,,,,,,,,,,,,,,,
1,60ead7a6-4aa2-41ab-85a1-21357f50e4e0,542,1,00:11:45.046,11,45,Shot,24,Chelsea FCW,From Free Kick,Chelsea FCW,Bethany England,Left Midfield,"[113.0, 35.0]",0.25673,True,"[a4b77cbb-14d0-4bd3-ba8b-7312335098fe, b9b246c...",19743,0.093521,"[120.0, 32.9, 0.4]",b99082e1-812b-48dd-bf94-8856b1ff079b,Normal,Off T,Open Play,Head,"[{'location': [108.0, 45.0], 'player': {'id': ...",True,True,,,,,,,,,,539.0,1.0,00:11:42.863,...,42.0,Pass,24.0,Chelsea FCW,From Free Kick,Chelsea FCW,Erin Cuthbert,Right Midfield,"[82.0, 54.0]",2.1038,[540a29f4-8533-4852-b492-307d124cf084],19743.0,Bethany England,37.735924,-0.558599,High Pass,"[114.0, 34.0]",Right Foot,Free Kick,,,,60ead7a6-4aa2-41ab-85a1-21357f50e4e0,True,,,,,,,,,,,,,,,,
2,f68deb6f-0711-4b9d-8081-122dc3722c55,614,1,00:18:03.461,18,3,Shot,29,Chelsea FCW,Regular Play,Chelsea FCW,Drew Spence,Left Defensive Midfield,"[94.0, 43.0]",1.147883,True,"[3c03553f-3bed-4d21-8096-ed4ef269da62, bb13e23...",19743,0.036171,"[120.0, 42.8, 0.5]",5022d0b3-ea32-42a8-bd41-b46cc244beb9,Normal,Saved,Open Play,Left Foot,"[{'location': [118.0, 41.0], 'player': {'id': ...",,,,,,,,,,,,610.0,1.0,00:18:01.596,...,1.0,Pass,29.0,Chelsea FCW,Regular Play,Chelsea FCW,So-yun Ji,Center Attacking Midfield,"[98.0, 60.0]",0.918187,"[753c6e78-72f9-4963-bcb7-c3e4ed58be6a, c884125...",19743.0,Drew Spence,11.18034,-2.034444,Ground Pass,"[93.0, 50.0]",Right Foot,,True,,,f68deb6f-0711-4b9d-8081-122dc3722c55,True,,,,,,,,,,,,,,,,
3,f301190f-cc0a-4f16-8278-27e5279ea24e,877,1,00:23:11.935,23,11,Shot,43,Birmingham City WFC,From Goal Kick,Birmingham City WFC,Chloe Arthur,Right Back,"[86.0, 34.0]",2.161012,True,"[0bfe1b6c-d690-41a6-be3e-f9b6295ddd85, 570e15b...",19743,0.016625,"[119.0, 33.3, 0.5]",fdf4a564-4973-46e5-bc07-d84785f8c183,Normal,Off T,Open Play,Left Foot,"[{'location': [78.0, 58.0], 'player': {'id': 1...",,,,,,,,,,,,873.0,1.0,00:23:08.192,...,8.0,Pass,43.0,Birmingham City WFC,From Goal Kick,Birmingham City WFC,Emma Follis,Center Forward,"[86.0, 15.0]",2.033567,[7d3eb214-4b99-4e3f-ad83-155793b118fc],19743.0,Chloe Arthur,13.892444,2.098871,Ground Pass,"[79.0, 27.0]",Right Foot,,,,,f301190f-cc0a-4f16-8278-27e5279ea24e,True,,,,,,,,,,,,,,,,
4,8558535e-b1ee-4f53-b003-1b5fba2712bd,892,1,00:23:45.810,23,45,Shot,44,Chelsea FCW,From Goal Kick,Chelsea FCW,Bethany England,Left Midfield,"[94.0, 33.0]",1.225187,,[1455cb46-43a3-4e6f-b845-171abcd344bc],19743,0.030716,"[120.0, 34.8, 0.5]",37712221-3b0b-4090-a30c-08a3ee6492be,Normal,Off T,Open Play,Right Foot,"[{'location': [117.0, 40.0], 'player': {'id': ...",,,,,,,,,,,,888.0,1.0,00:23:41.728,...,41.0,Pass,44.0,Chelsea FCW,From Goal Kick,Chelsea FCW,Jonna Andersson,Left Back,"[83.0, 10.0]",1.243357,[fad5af63-bf6e-4e51-9321-644b99e9f2b8],19743.0,Bethany England,14.56022,1.292497,Ground Pass,"[87.0, 24.0]",Left Foot,,,,,8558535e-b1ee-4f53-b003-1b5fba2712bd,True,,,,,,,,,,,,,,,,


In [None]:
print('Total Events:',
      len(extracted_data))

Total Events: 6080


In [None]:
print('Total Features:',
      extracted_data.shape[1])

Total Features: 81


In [None]:
# Save extracted_data

extracted_data.to_parquet('/content/drive/MyDrive/flatiron/expected_goals/data_extraction/dataframes/extracted_data.parquet')

In [None]:
print('extracted_data Filesize:',
      path('/content/drive/MyDrive/flatiron/expected_goals/data_extraction/dataframes/extracted_data.parquet').stat().st_size,
      'bytes')

extracted_data Filesize: 2247593 bytes


Continued in [expected_goals_feature_engineering_notebook]()

*2 of 7*