# **Expected Goals Classifier**

## Overview

Create an Expected Goals (xG) classification model using existing historical match data to produce actionable recommendations which can be utilized in technical and tactical analysis to improve goal-scoring.

Project detailed on Github: [Expected Goals Classifier]()

# Feature Engineering Notebook

Continued from [expected_goals_data_exploration_notebook]()

*Notebook 3 of 7*

### Index

1. Data extracted in [expected_goals_data_extraction_notebook]()
2. Data cleaned in [expected_goals_data_cleaning_notebook]()
3. Data explored in [expected_goals_data_exploration_notebook]()
4. Features engineered in [expected_goals_feature_engineering_notebook]()
5. Data preprocessed in [expected_goals_data_preprocessing_notebook]()
6. Modeling in [expected_goals_model_fitting_notebook]()
7. Conclusions in [expected_goals_model_assessment_notebook]()

# Packages

In [1]:
# rpy2 to run R
%load_ext rpy2.ipython

# Drive  and IO to access saved files
from google.colab import drive, files
drive.mount('/content/drive')

import io

# Pathlib for file retrieval
import pathlib
from pathlib import Path as path

# PyPy to improve speed
!apt-get install pypy

# warnings to ignore warnings
import warnings
warnings.filterwarnings('ignore')

# Pandas for dataframes
import pandas as pd

# Numpy for mathematical functions
import numpy as np

# Shapely for geometric functions
import shapely
from shapely import wkt
from shapely.geometry import Point, Polygon, LineString, GeometryCollection

Mounted at /content/drive
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  pypy-lib
Suggested packages:
  pypy-doc pypy-tk
The following NEW packages will be installed:
  pypy pypy-lib
0 upgraded, 2 newly installed, 0 to remove and 37 not upgraded.
Need to get 13.1 MB of archives.
After this operation, 84.6 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 pypy-lib amd64 5.10.0+dfsg-3build2 [2,303 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 pypy amd64 5.10.0+dfsg-3build2 [10.8 MB]
Fetched 13.1 MB in 2s (7,777 kB/s)
Selecting previously unselected package pypy-lib:amd64.
(Reading database ... 155013 files and directories currently installed.)
Preparing to unpack .../pypy-lib_5.10.0+dfsg-3build2_amd64.deb ...
Unpacking pypy-lib:amd64 (5.10.0+dfsg-3build2) ...
Selecting previously unselected package pypy.
Preparing 

### Data

Data sourced from [StatsBomb](https://statsbomb.com/), a United Kingdom based football (soccer) data analytics company.

StatsBomb have provided free access to their proprietary dataset via GitHub: [StatsBomb Open Data](https://github.com/statsbomb/open-data)

In [3]:
# Import cleaned_data from expected_goals_data_extraction_notebook

cleaned_data = pd.read_parquet('/content/drive/MyDrive/flatiron/expected_goals/data_cleaning/dataframes/cleaned_data.parquet')

In [4]:
cleaned_data.head()

Unnamed: 0,period_x,timestamp_x,play_pattern_x,under_pressure_x,shot_statsbomb_xg,shot_end_location,shot_technique,goal,shot_type,shot_body_part,shot_one_on_one,shot_aerial_won,shot_open_goal,shot_first_time,shot_redirect,shot_deflected,shot_follows_dribble,play_pattern_y,pass_length,pass_angle,pass_height,pass_body_part,pass_type,pass_switch,pass_through_ball,pass_technique,pass_backheel,pass_cross,counterpress,pass_cut_back,pass_deflected,pass_inswinging,pass_straight,pass_outswinging,pass_no_touch,shot_location_y,shot_location_x
0,1,00:04:38.609,Regular Play,True,0.266154,45.0,Normal,False,Open Play,Left Foot,False,False,False,False,False,False,False,Regular Play,11.18034,0.463648,Ground Pass,Left Foot,Open Play,False,False,Standard,False,False,False,False,False,False,False,False,False,109.0,46.0
1,1,00:11:45.046,From Free Kick,True,0.093521,32.9,Normal,False,Open Play,Head,True,True,False,False,False,False,False,From Free Kick,37.735924,-0.558599,High Pass,Right Foot,Free Kick,False,False,Standard,False,False,False,False,False,False,False,False,False,113.0,35.0
2,1,00:18:03.461,Regular Play,True,0.036171,42.8,Normal,False,Open Play,Left Foot,False,False,False,False,False,False,False,Regular Play,11.18034,-2.034444,Ground Pass,Right Foot,Open Play,False,False,Standard,False,False,False,False,False,False,False,False,False,94.0,43.0
3,1,00:23:11.935,From Goal Kick,True,0.016625,33.3,Normal,False,Open Play,Left Foot,False,False,False,False,False,False,False,From Goal Kick,13.892444,2.098871,Ground Pass,Right Foot,Open Play,False,False,Standard,False,False,False,False,False,False,False,False,False,86.0,34.0
4,1,00:23:45.810,From Goal Kick,False,0.030716,34.8,Normal,False,Open Play,Right Foot,False,False,False,False,False,False,False,From Goal Kick,14.56022,1.292497,Ground Pass,Left Foot,Open Play,False,False,Standard,False,False,False,False,False,False,False,False,False,94.0,33.0


# Distance

In [5]:
# Define goal center
# Note: Field coordinates for events measured for in-possession team

goal_center = (120, 40)

In [6]:
# Use location_x and location_y to define shot coordinates

shot_location_list = []
for i in range(0, len(cleaned_data)):
  shot_location_list.append((cleaned_data.iloc[i]['shot_location_x'],
                             cleaned_data.iloc[i]['shot_location_y']))

In [13]:
# Calculate distance from shot location to shot end location

shot_distance_list = []
for sl in shot_location_list:
  shot_distance_list.append(Point(sl).distance(Point((cleaned_data.iloc[i]['shot_end_location'],
                                                      120))))

In [14]:
# Create new feature in cleaned_data for shot_distance

cleaned_data['shot_distance'] = shot_distance_list

In [15]:
cleaned_data['shot_distance'].describe()

count    6080.000000
mean       20.617300
std         9.187915
min         1.400000
25%        13.300376
50%        19.798990
75%        27.221356
max        71.478668
Name: shot_distance, dtype: float64

# Angle

In [None]:
# Calculate angle between the shot location and goal_center

shot_angle_list = []
for i in range(0, len(organized_data)):
  shot_angle_list.append(round(math.degrees(math.atan2((goal_center[0] - organized_data.iloc[i]['location_x']),
                                                       (goal_center[1] - organized_data.iloc[i]['location_y']))), 2))

organized_data['shot_angle'] = shot_angle_list

In [None]:
pd.DataFrame(organized_data['shot_angle'].describe())

Unnamed: 0,shot_angle
count,6104.0
mean,91.022638
std,33.911162
min,0.0
25%,64.65
50%,90.46
75%,116.995
max,180.0


# Bodypart Angle

In [None]:
organized_data['bodypart'].value_counts()

Right Foot    3493
Left Foot     1676
Head           926
Other            9
Name: bodypart, dtype: int64

In [None]:
# Combare the side the shot was taken from to
# the foot the shot was taken with

bodypart_angle_list = []
for i in range(0, len(organized_data)):
  if ((organized_data.iloc[i]['shot_angle'] > 90) &
      (organized_data.iloc[i]['bodypart'] == 'Right Foot')):
    bodypart_angle_list.append('Right - Outside Foot')
  
  elif ((organized_data.iloc[i]['shot_angle'] > 90) &
        (organized_data.iloc[i]['bodypart'] == 'Left Foot')):
    bodypart_angle_list.append('Right - Inside Foot')
  
  elif ((organized_data.iloc[i]['shot_angle'] > 90) &
        (organized_data.iloc[i]['bodypart'] == 'Head')):
    bodypart_angle_list.append('Right - Head')
  
  elif ((organized_data.iloc[i]['shot_angle'] < 90) &
        (organized_data.iloc[i]['bodypart'] == 'Left Foot')):
    bodypart_angle_list.append('Left - Outside Foot')
  
  elif ((organized_data.iloc[i]['shot_angle'] < 90) &
        (organized_data.iloc[i]['bodypart'] == 'Right Foot')):
    bodypart_angle_list.append('Left - Inside Foot')
  
  elif ((organized_data.iloc[i]['shot_angle'] < 90) &
        (organized_data.iloc[i]['bodypart'] == 'Head')):
    bodypart_angle_list.append('Left - Head')
  
  else:
    bodypart_angle_list.append('Other')

organized_data['bodypart_angle'] = bodypart_angle_list

In [None]:
organized_data['bodypart_angle'].value_counts()

Right - Outside Foot    1882
Left - Inside Foot      1518
Left - Outside Foot      891
Right - Inside Foot      734
Right - Head             440
Left - Head              436
Other                    203
Name: bodypart_angle, dtype: int64

# Significant Time

In [None]:
# Convert time datatype to datetime

organized_data['time'] = organized_data['time'].astype(str)
organized_data['time'] = pd.to_datetime(organized_data['time'])

In [None]:
significant_time_list = []
for i in range(0, len(organized_data)):
  if organized_data.iloc[i]['time'] < pd.Timestamp(2021, 6, 14, 0, 5, 0):
    significant_time_list.append('First 5min')
  
  elif organized_data.iloc[i]['time'] >  pd.Timestamp(2021, 6, 14, 0, 45, 0):
    significant_time_list.append('Stoppage Time')
  
  elif organized_data.iloc[i]['time'] > pd.Timestamp(2021, 6, 14, 0, 40, 0):
    significant_time_list.append('Last 5min')
  
  else:
    significant_time_list.append('Regular Time')

organized_data['significant_time'] = significant_time_list

In [None]:
organized_data['significant_time'].value_counts()

Regular Time     4433
Last 5min         611
First 5min        611
Stoppage Time     449
Name: significant_time, dtype: int64

# Data with Engineered Features

In [None]:
data_with_engineered_features = organized_data
data_with_engineered_features.head()

Unnamed: 0,location_x,location_y,time,statsbomb_xg,outcome,player,team,bodypart,technique,first_touch,state_of_play,assist1,assist2,assist3,assist_state_of_play,inside_18_width,inside_18_depth,inside_18,shot_distance,shot_angle,bodypart_angle,significant_time
0,109.0,46.0,2021-06-14 00:04:38.609,0.266154,Blocked,Francesca Kirby,Chelsea FCW,Left Foot,Normal,False,Open Play,Ground Pass,,,Regular Play,True,True,True,12.529964,118.61,Right - Inside Foot,First 5min
1,113.0,35.0,2021-06-14 00:11:45.046,0.093521,Off T,Bethany England,Chelsea FCW,Head,Normal,False,Open Play,High Pass,,,From Free Kick,True,True,True,8.602325,54.46,Left - Head,Regular Time
2,94.0,43.0,2021-06-14 00:18:03.461,0.036171,Saved,Drew Spence,Chelsea FCW,Left Foot,Normal,False,Open Play,Ground Pass,,,Regular Play,True,False,False,26.172505,96.58,Right - Inside Foot,Regular Time
3,86.0,34.0,2021-06-14 00:23:11.935,0.016625,Off T,Chloe Arthur,Birmingham City WFC,Left Foot,Normal,False,Open Play,Ground Pass,,,From Goal Kick,True,False,False,34.525353,79.99,Left - Outside Foot,Regular Time
4,94.0,33.0,2021-06-14 00:23:45.810,0.030716,Off T,Bethany England,Chelsea FCW,Right Foot,Normal,False,Open Play,Ground Pass,,,From Goal Kick,True,False,False,26.925824,74.93,Left - Inside Foot,Regular Time


In [None]:
data_with_engineered_features.to_csv('/content/drive/MyDrive/flatiron/expected_goals/feature_engineering/data_with_engineered_features.csv')

Continued in [expected_goals_data_cleaning_notebook](https://github.com/wswager/milwaukee_rampage_fc/blob/main/data_cleaning/expected_goals_data_cleaning_notebook.ipynb)

*3 of 7*