# **Expected Goals Classifier**

## Overview

Create an Expected Goals (xG) classification model using existing historical match data to produce actionable recommendations which can be utilized in technical and tactical analysis to improve goal-scoring.

Project detailed on Github: [Expected Goals Classifier]()

# Feature Engineering Notebook

*Notebook 2 of 7*

### Index

1. Data extracted in [expected_goals_data_extraction_notebook]()
2. Features engineered in [expected_goals_feature_engineering_notebook]()
3. Data cleaned in [expected_goals_data_cleaning_notebook]()
4. Data explored in [expected_goals_data_exploration_notebook]()
5. Data preprocessed in [expected_goals_data_preprocessing_notebook]()
6. Modeling in [expected_goals_model_fitting_notebook]()
7. Conclusions in [expected_goals_model_assessment_notebook]()

# Packages

In [None]:
# Drive  and IO to access saved data
from google.colab import drive, files
drive.mount('/content/drive')

import io

# Pathlib for file retrieval
import pathlib
from pathlib import Path

# Pandas for Dataframes
import pandas as pd

# Numpy for mathematical functions
import numpy as np

import math
from math import atan2

# Shapely for geometric functions
import shapely
from shapely import wkt
from shapely.geometry import Point, Polygon, LineString, GeometryCollection

import warnings
warnings.filterwarnings('ignore')

Mounted at /content/drive


### Data

Data sourced from [StatsBomb](https://statsbomb.com/), a United Kingdom based football (soccer) data analytics company.

StatsBomb have provided free access to their proprietary dataset via GitHub: [StatsBomb Open Data](https://github.com/statsbomb/open-data)

In [None]:
# Import extracted_data from expected_goals_data_extraction_notebook

extracted_data = pd.read_parquet('/content/drive/MyDrive/flatiron/expected_goals/data_extraction/dataframes/extracted_data.parquet')

In [None]:
extracted_data.head()

Unnamed: 0,id,index_x,period_x,timestamp_x,minute_x,second_x,type_x,possession_x,possession_team_x,play_pattern_x,team_x,player_x,position_x,location_x,duration_x,under_pressure_x,related_events_x,match_id_x,shot_statsbomb_xg,shot_end_location,shot_key_pass_id,shot_technique,shot_outcome,shot_type,shot_body_part,shot_freeze_frame,shot_one_on_one,shot_aerial_won,shot_open_goal,shot_first_time,out_x,shot_redirect,shot_deflected,off_camera_x,shot_saved_off_target,shot_saved_to_post,shot_follows_dribble,index_y,period_y,timestamp_y,...,second_y,type_y,possession_y,possession_team_y,play_pattern_y,team_y,player_y,position_y,location_y,duration_y,related_events_y,match_id_y,pass_recipient,pass_length,pass_angle,pass_height,pass_end_location,pass_body_part,pass_type,under_pressure_y,pass_outcome,pass_aerial_won,pass_assisted_shot_id,pass_shot_assist,off_camera_y,pass_switch,pass_through_ball,pass_technique,pass_backheel,pass_cross,counterpress,pass_cut_back,pass_deflected,pass_goal_assist,pass_miscommunication,pass_inswinging,pass_straight,pass_outswinging,pass_no_touch,out_y
0,8f5a3b7c-db0b-42ec-bac0-adc0bedca2ea,258,1,00:04:38.609,4,38,Shot,11,Chelsea FCW,Regular Play,Chelsea FCW,Francesca Kirby,Center Forward,"[109.0, 46.0]",0.2788,True,"[011167bc-9cbc-46a3-9b7b-28065eab7af1, 2c37831...",19743,0.266154,"[112.0, 45.0]",bf82ea91-c3e3-4d8c-b91d-c9d0ccd44f11,Normal,Blocked,Open Play,Left Foot,"[{'location': [104.0, 50.0], 'player': {'id': ...",,,,,,,,,,,,253.0,1.0,00:04:35.786,...,35.0,Pass,11.0,Chelsea FCW,Regular Play,Chelsea FCW,Bethany England,Left Midfield,"[95.0, 49.0]",1.361685,"[58da4d74-7684-405d-a8cc-bef1d658f1b6, 60d1337...",19743.0,Francesca Kirby,11.18034,0.463648,Ground Pass,"[105.0, 54.0]",Left Foot,,True,,,8f5a3b7c-db0b-42ec-bac0-adc0bedca2ea,True,,,,,,,,,,,,,,,,
1,60ead7a6-4aa2-41ab-85a1-21357f50e4e0,542,1,00:11:45.046,11,45,Shot,24,Chelsea FCW,From Free Kick,Chelsea FCW,Bethany England,Left Midfield,"[113.0, 35.0]",0.25673,True,"[a4b77cbb-14d0-4bd3-ba8b-7312335098fe, b9b246c...",19743,0.093521,"[120.0, 32.9, 0.4]",b99082e1-812b-48dd-bf94-8856b1ff079b,Normal,Off T,Open Play,Head,"[{'location': [108.0, 45.0], 'player': {'id': ...",True,True,,,,,,,,,,539.0,1.0,00:11:42.863,...,42.0,Pass,24.0,Chelsea FCW,From Free Kick,Chelsea FCW,Erin Cuthbert,Right Midfield,"[82.0, 54.0]",2.1038,[540a29f4-8533-4852-b492-307d124cf084],19743.0,Bethany England,37.735924,-0.558599,High Pass,"[114.0, 34.0]",Right Foot,Free Kick,,,,60ead7a6-4aa2-41ab-85a1-21357f50e4e0,True,,,,,,,,,,,,,,,,
2,f68deb6f-0711-4b9d-8081-122dc3722c55,614,1,00:18:03.461,18,3,Shot,29,Chelsea FCW,Regular Play,Chelsea FCW,Drew Spence,Left Defensive Midfield,"[94.0, 43.0]",1.147883,True,"[3c03553f-3bed-4d21-8096-ed4ef269da62, bb13e23...",19743,0.036171,"[120.0, 42.8, 0.5]",5022d0b3-ea32-42a8-bd41-b46cc244beb9,Normal,Saved,Open Play,Left Foot,"[{'location': [118.0, 41.0], 'player': {'id': ...",,,,,,,,,,,,610.0,1.0,00:18:01.596,...,1.0,Pass,29.0,Chelsea FCW,Regular Play,Chelsea FCW,So-yun Ji,Center Attacking Midfield,"[98.0, 60.0]",0.918187,"[753c6e78-72f9-4963-bcb7-c3e4ed58be6a, c884125...",19743.0,Drew Spence,11.18034,-2.034444,Ground Pass,"[93.0, 50.0]",Right Foot,,True,,,f68deb6f-0711-4b9d-8081-122dc3722c55,True,,,,,,,,,,,,,,,,
3,f301190f-cc0a-4f16-8278-27e5279ea24e,877,1,00:23:11.935,23,11,Shot,43,Birmingham City WFC,From Goal Kick,Birmingham City WFC,Chloe Arthur,Right Back,"[86.0, 34.0]",2.161012,True,"[0bfe1b6c-d690-41a6-be3e-f9b6295ddd85, 570e15b...",19743,0.016625,"[119.0, 33.3, 0.5]",fdf4a564-4973-46e5-bc07-d84785f8c183,Normal,Off T,Open Play,Left Foot,"[{'location': [78.0, 58.0], 'player': {'id': 1...",,,,,,,,,,,,873.0,1.0,00:23:08.192,...,8.0,Pass,43.0,Birmingham City WFC,From Goal Kick,Birmingham City WFC,Emma Follis,Center Forward,"[86.0, 15.0]",2.033567,[7d3eb214-4b99-4e3f-ad83-155793b118fc],19743.0,Chloe Arthur,13.892444,2.098871,Ground Pass,"[79.0, 27.0]",Right Foot,,,,,f301190f-cc0a-4f16-8278-27e5279ea24e,True,,,,,,,,,,,,,,,,
4,8558535e-b1ee-4f53-b003-1b5fba2712bd,892,1,00:23:45.810,23,45,Shot,44,Chelsea FCW,From Goal Kick,Chelsea FCW,Bethany England,Left Midfield,"[94.0, 33.0]",1.225187,,[1455cb46-43a3-4e6f-b845-171abcd344bc],19743,0.030716,"[120.0, 34.8, 0.5]",37712221-3b0b-4090-a30c-08a3ee6492be,Normal,Off T,Open Play,Right Foot,"[{'location': [117.0, 40.0], 'player': {'id': ...",,,,,,,,,,,,888.0,1.0,00:23:41.728,...,41.0,Pass,44.0,Chelsea FCW,From Goal Kick,Chelsea FCW,Jonna Andersson,Left Back,"[83.0, 10.0]",1.243357,[fad5af63-bf6e-4e51-9321-644b99e9f2b8],19743.0,Bethany England,14.56022,1.292497,Ground Pass,"[87.0, 24.0]",Left Foot,,,,,8558535e-b1ee-4f53-b003-1b5fba2712bd,True,,,,,,,,,,,,,,,,


# Inside 18-Yard Box

### Width

In [None]:
# Calculate if the shot was taken within the width of the 18-yard box

inside_18_width_list = []
for i in range(0, len(organized_data)):
  if (organized_data.iloc[i]['location_y'] > 22) & (organized_data.iloc[i]['location_y'] < 58):
    inside_18_width_list.append(True)
  
  else:
    inside_18_width_list.append(False)

organized_data['inside_18_width'] = inside_18_width_list

In [None]:
organized_data['inside_18_width'].value_counts()

True     5719
False     385
Name: inside_18_width, dtype: int64

### Depth

In [None]:
# Calculate if the shot was taken within the depth of the 18-yard box

inside_18_depth_list = []
for i in range(0, len(organized_data)):
  if (organized_data.iloc[i]['location_x'] > 102):
    inside_18_depth_list.append(True)
  
  else:
    inside_18_depth_list.append(False)

organized_data['inside_18_depth'] = inside_18_depth_list

In [None]:
organized_data['inside_18_depth'].value_counts()

True     3747
False    2357
Name: inside_18_depth, dtype: int64

### Total

In [None]:
# Calculate if the shot was taken within the 18-yard box

inside_18_list = []
for i in range(0, len(organized_data)):
  if ((organized_data.iloc[i]['inside_18_width'] == True) &
      (organized_data.iloc[i]['inside_18_depth'] == True)):
    inside_18_list.append(True)
  
  else:
    inside_18_list.append(False)

organized_data['inside_18'] = inside_18_list

In [None]:
organized_data['inside_18'].value_counts()

True     3588
False    2516
Name: inside_18, dtype: int64

# Distance

In [None]:
# Define goal center
# Field coordinates for events measured for team in-possession
# Goal center will be consistent for both home and away teams
# because map in flipped and consistent for both teams

goal_center = (120, 40)

In [None]:
# Use location_x and location_y to define shot coordinates

shot_location_list = []
for i in range(0, len(organized_data)):
  shot_location_list.append((organized_data.iloc[i]['location_x'],
                             organized_data.iloc[i]['location_y']))

In [None]:
# Calculate distance from shot location to goal_center

shot_distance_list = []
for sl in shot_location_list:
  shot_distance_list.append(Point(sl).distance(Point(goal_center)))

organized_data['shot_distance'] = shot_distance_list

In [None]:
pd.DataFrame(organized_data['shot_distance'].describe())

Unnamed: 0,shot_distance
count,6104.0
mean,18.916359
std,9.071373
min,1.0
25%,11.526057
50%,17.804494
75%,25.546526
max,66.540213


# Angle

In [None]:
# Calculate angle between the shot location and goal_center

shot_angle_list = []
for i in range(0, len(organized_data)):
  shot_angle_list.append(round(math.degrees(math.atan2((goal_center[0] - organized_data.iloc[i]['location_x']),
                                                       (goal_center[1] - organized_data.iloc[i]['location_y']))), 2))

organized_data['shot_angle'] = shot_angle_list

In [None]:
pd.DataFrame(organized_data['shot_angle'].describe())

Unnamed: 0,shot_angle
count,6104.0
mean,91.022638
std,33.911162
min,0.0
25%,64.65
50%,90.46
75%,116.995
max,180.0


# Bodypart Angle

In [None]:
organized_data['bodypart'].value_counts()

Right Foot    3493
Left Foot     1676
Head           926
Other            9
Name: bodypart, dtype: int64

In [None]:
# Combare the side the shot was taken from to
# the foot the shot was taken with

bodypart_angle_list = []
for i in range(0, len(organized_data)):
  if ((organized_data.iloc[i]['shot_angle'] > 90) &
      (organized_data.iloc[i]['bodypart'] == 'Right Foot')):
    bodypart_angle_list.append('Right - Outside Foot')
  
  elif ((organized_data.iloc[i]['shot_angle'] > 90) &
        (organized_data.iloc[i]['bodypart'] == 'Left Foot')):
    bodypart_angle_list.append('Right - Inside Foot')
  
  elif ((organized_data.iloc[i]['shot_angle'] > 90) &
        (organized_data.iloc[i]['bodypart'] == 'Head')):
    bodypart_angle_list.append('Right - Head')
  
  elif ((organized_data.iloc[i]['shot_angle'] < 90) &
        (organized_data.iloc[i]['bodypart'] == 'Left Foot')):
    bodypart_angle_list.append('Left - Outside Foot')
  
  elif ((organized_data.iloc[i]['shot_angle'] < 90) &
        (organized_data.iloc[i]['bodypart'] == 'Right Foot')):
    bodypart_angle_list.append('Left - Inside Foot')
  
  elif ((organized_data.iloc[i]['shot_angle'] < 90) &
        (organized_data.iloc[i]['bodypart'] == 'Head')):
    bodypart_angle_list.append('Left - Head')
  
  else:
    bodypart_angle_list.append('Other')

organized_data['bodypart_angle'] = bodypart_angle_list

In [None]:
organized_data['bodypart_angle'].value_counts()

Right - Outside Foot    1882
Left - Inside Foot      1518
Left - Outside Foot      891
Right - Inside Foot      734
Right - Head             440
Left - Head              436
Other                    203
Name: bodypart_angle, dtype: int64

# Significant Time

In [None]:
# Convert time datatype to datetime

organized_data['time'] = organized_data['time'].astype(str)
organized_data['time'] = pd.to_datetime(organized_data['time'])

In [None]:
significant_time_list = []
for i in range(0, len(organized_data)):
  if organized_data.iloc[i]['time'] < pd.Timestamp(2021, 6, 14, 0, 5, 0):
    significant_time_list.append('First 5min')
  
  elif organized_data.iloc[i]['time'] >  pd.Timestamp(2021, 6, 14, 0, 45, 0):
    significant_time_list.append('Stoppage Time')
  
  elif organized_data.iloc[i]['time'] > pd.Timestamp(2021, 6, 14, 0, 40, 0):
    significant_time_list.append('Last 5min')
  
  else:
    significant_time_list.append('Regular Time')

organized_data['significant_time'] = significant_time_list

In [None]:
organized_data['significant_time'].value_counts()

Regular Time     4433
Last 5min         611
First 5min        611
Stoppage Time     449
Name: significant_time, dtype: int64

# Data with Engineered Features

In [None]:
data_with_engineered_features = organized_data
data_with_engineered_features.head()

Unnamed: 0,location_x,location_y,time,statsbomb_xg,outcome,player,team,bodypart,technique,first_touch,state_of_play,assist1,assist2,assist3,assist_state_of_play,inside_18_width,inside_18_depth,inside_18,shot_distance,shot_angle,bodypart_angle,significant_time
0,109.0,46.0,2021-06-14 00:04:38.609,0.266154,Blocked,Francesca Kirby,Chelsea FCW,Left Foot,Normal,False,Open Play,Ground Pass,,,Regular Play,True,True,True,12.529964,118.61,Right - Inside Foot,First 5min
1,113.0,35.0,2021-06-14 00:11:45.046,0.093521,Off T,Bethany England,Chelsea FCW,Head,Normal,False,Open Play,High Pass,,,From Free Kick,True,True,True,8.602325,54.46,Left - Head,Regular Time
2,94.0,43.0,2021-06-14 00:18:03.461,0.036171,Saved,Drew Spence,Chelsea FCW,Left Foot,Normal,False,Open Play,Ground Pass,,,Regular Play,True,False,False,26.172505,96.58,Right - Inside Foot,Regular Time
3,86.0,34.0,2021-06-14 00:23:11.935,0.016625,Off T,Chloe Arthur,Birmingham City WFC,Left Foot,Normal,False,Open Play,Ground Pass,,,From Goal Kick,True,False,False,34.525353,79.99,Left - Outside Foot,Regular Time
4,94.0,33.0,2021-06-14 00:23:45.810,0.030716,Off T,Bethany England,Chelsea FCW,Right Foot,Normal,False,Open Play,Ground Pass,,,From Goal Kick,True,False,False,26.925824,74.93,Left - Inside Foot,Regular Time


In [None]:
data_with_engineered_features.to_csv('/content/drive/MyDrive/flatiron/expected_goals/feature_engineering/data_with_engineered_features.csv')

Continued in [expected_goals_data_cleaning_notebook](https://github.com/wswager/milwaukee_rampage_fc/blob/main/data_cleaning/expected_goals_data_cleaning_notebook.ipynb)

*3 of 7*