# Contents
1. Introduction <br>
    1.1. Import Libraries<br>
    1.2. Terminology <br>
    
    
2. Gathering and Cleaning Data <br> 
    2.1. Player Data <br>
    2.2. Fixture Data <br>
    2.3. Game Data <br>
    
    
3. Data Engineering (Team Data) <br>
    3.1. Team npxG <br>
    3.2. Team npxG (Home Fixtures) <br>
    3.3. Team npxG (Away Fixtures) <br>
    3.4. Team npxGA <br>
    3.5. Team npxGA (Home Fixtures) <br>
    3.6. Team npxGA (Away Fixtures) <br>
    
    
4. Home/Away Modifiers for npxG/npxGA <br>
    4.1. Home/Away Modifier Calculation <br>
    4.2. Dictionaries <br> 


5. Team Fixture Tables <br>
    5.1. Home Fixture Table <br>
    5.2. Away Fixture Table <br>
    5.3. Home Fixture Table - Insert Stats <br> 
    5.4. Away Fixture Table - Insert Stats <br>
    5.5. Finalizing Fixture Table <br>
    
    
6. Data Engineering (Player Data) - Fuzzy Matching Included <br>
    6.1. Player npxG and xAG <br> 
    6.2. Fuzzy Matching <br>
    
    
7. Players Fixture Tables <br>
    7.1. Merge with Team Data <br>
    7.2. Home Fixture Table <br>
    7.3. Away Fixture Table <br>
    7.4. Finalizing Fixture Table <br>
    
    
8. Final Dataframes <br>
    8.1. Team Stats <br>
    8.2. Team Fixture Tables <br>
    8.3. Team npxG Predictions <br> 
    8.4. Player npxG and xAG Predictions <br>

# 1. Introduction 

Welcome to our Fantasy Premier League (FPL) kernel! We will be gathering data from the official FPL API and popular football stats website FBref to produce a prediction system for expected goals, assists and goals conceeded for each team and player in the premier league for the upcoming five gameweeks.   

## 1.1. Import Libraries

In [2]:
import pandas as pd
import numpy as np
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')
import soccerdata as sd
import re
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from itertools import repeat
from fpl import FPL
import requests, json
from thefuzz import fuzz, process
from sklearn.preprocessing import MinMaxScaler

## 1.2. Terminology


### Non-Penalty Expected Goals (npxG)
Non-penalty expected goals (npxG) is the probability that a non-penalty shot will result in a goal. This metric is known to be the best predictor for future goals. 

### Non-Penalty Expected Goals Against (npxGA)
Non-penalty expected goals against (npxGA) is the probability that an opponent's non-penalty shot will result in a goal conceeded.

### Expected Assist (xAG)
Expected assists (xAG) is the probability that a given pass will become an assist to a goal.

# 2. Gathering and Cleaning Data

We need to gather data from the FPL API and the FBref site. The former will be accessed directly. The latter will be accessed using a python library (soccerdata) which serves as a wrapper. 

## 2.1. Player Data


We'll be gathering 3 datasets from the official FPL API and joining them;

Team

Player

Player position 

In [3]:
#Calling FPL API for various player-related datasets.
base_url = 'https://fantasy.premierleague.com/api/'
r = requests.get(base_url+'bootstrap-static/').json()

teams = pd.json_normalize(r['teams'])
players = pd.json_normalize(r['elements'])
players_pos = pd.json_normalize(r['element_types'])

teams.head(3)
players.head(3)
players_pos.head(3)

Unnamed: 0,code,draw,form,id,loss,name,played,points,position,short_name,...,team_division,unavailable,win,strength_overall_home,strength_overall_away,strength_attack_home,strength_attack_away,strength_defence_home,strength_defence_away,pulse_id
0,3,0,,1,0,Arsenal,0,0,0,ARS,...,,False,0,1220,1270,1240,1250,1200,1270,1
1,7,0,,2,0,Aston Villa,0,0,0,AVL,...,,False,0,1090,1100,1110,1130,1090,1110,2
2,91,0,,3,0,Bournemouth,0,0,0,BOU,...,,False,0,1060,1090,1070,1130,1050,1080,127


Unnamed: 0,chance_of_playing_next_round,chance_of_playing_this_round,code,cost_change_event,cost_change_event_fall,cost_change_start,cost_change_start_fall,dreamteam_count,element_type,ep_next,...,now_cost_rank,now_cost_rank_type,form_rank,form_rank_type,points_per_game_rank,points_per_game_rank_type,selected_rank,selected_rank_type,starts_per_90,clean_sheets_per_90
0,,,84450,-1,1,-1,1,1,3,6.6,...,239,138,79,27,59,26,101,34,1.06,0.42
1,0.0,0.0,153256,0,0,-4,4,1,3,0.0,...,625,320,491,183,387,169,210,64,0.81,0.0
2,,,156074,0,0,-3,3,0,2,0.6,...,598,169,339,119,421,157,433,163,0.0,0.0


Unnamed: 0,id,plural_name,plural_name_short,singular_name,singular_name_short,squad_select,squad_min_play,squad_max_play,ui_shirt_specific,sub_positions_locked,element_count
0,1,Goalkeepers,GKP,Goalkeeper,GKP,2,1,1,True,[12],77
1,2,Defenders,DEF,Defender,DEF,5,3,5,False,[],250
2,3,Midfielders,MID,Midfielder,MID,5,2,5,False,[],321


Alot of bulk, let's clean it up.

Team data filtered for: ID and corresponding Team Name.

In [4]:
dfTeam=teams[['id', 'name']]
dfTeam.head(3)

Unnamed: 0,id,name
0,1,Arsenal
1,2,Aston Villa
2,3,Bournemouth


Player position data filtered for: ID and corresponding position name. 

In [5]:
dfPlayers_pos=players_pos[['id','plural_name_short']]
dfPlayers_pos=dfPlayers_pos.rename(columns={'plural_name_short':'Position'})
dfPlayers_pos.head(3)

Unnamed: 0,id,Position
0,1,GKP
1,2,DEF
2,3,MID


Player data filtered for: Team (which is in ID form), Position (element_type), First Name and Last Name (combined to 'Player').

In [6]:
dfPlayers=players[['team', 'element_type', 'first_name', 'second_name']]
dfPlayers['Player'] = dfPlayers['first_name'] + ' ' + dfPlayers['second_name']
dfPlayers=dfPlayers.rename(columns={'team':'id'}) 
dfPlayers.head(3)

Unnamed: 0,id,element_type,first_name,second_name,Player
0,1,3,Granit,Xhaka,Granit Xhaka
1,1,3,Mohamed,Elneny,Mohamed Elneny
2,1,2,Rob,Holding,Rob Holding


In the dfPlayers dataframe, we need to replace ID with the respective team ID and similarly replace element type with the respective position.

In [7]:
#Make dictionaries for Team ID+Names & Player Position ID+Names.
team_id_dict=pd.Series(dfTeam.name.values,index=dfTeam.id).to_dict()
player_pos_id_dict=pd.Series(dfPlayers_pos.Position.values,index=dfPlayers_pos.id).to_dict()

Apply dictionaries to make new columns giving actual team names and positions.

In [8]:
dfPlayers["Team"] = dfPlayers["id"].apply(lambda x: team_id_dict.get(x))
dfPlayers["Position"] = dfPlayers["element_type"].apply(lambda x: player_pos_id_dict.get(x))
dfPlayers_FPL=dfPlayers.drop(columns=['element_type', 'id', 'first_name', 'second_name'])

dfPlayers_FPL.head()

Unnamed: 0,Player,Team,Position
0,Granit Xhaka,Arsenal,MID
1,Mohamed Elneny,Arsenal,MID
2,Rob Holding,Arsenal,DEF
3,Thomas Partey,Arsenal,MID
4,Martin Ødegaard,Arsenal,MID


Our player data is cleaned up! 

## 2.2. Fixtures Data

Our fixture data will be accessed from the official API again. Like before, we're going to see alot of excess that needs to be cleaned up. 

In [9]:
#Calling FPL API for fixture data. 
base_url = 'https://fantasy.premierleague.com/api/'
r = requests.get(base_url+'fixtures?future=1').json()
fixtures = pd.json_normalize(r)
fixtures.head()

Unnamed: 0,code,event,finished,finished_provisional,id,kickoff_time,minutes,provisional_start_time,started,team_a,team_a_score,team_h,team_h_score,stats,team_h_difficulty,team_a_difficulty,pulse_id
0,2293039,23,False,False,230,2023-02-11T12:30:00Z,0,False,False,6,,19,,[],3,3,75140
1,2293031,23,False,False,221,2023-02-11T15:00:00Z,0,False,False,4,,1,,[],2,4,75131
2,2293032,23,False,False,223,2023-02-11T15:00:00Z,0,False,False,5,,7,,[],3,3,75133
3,2293033,23,False,False,224,2023-02-11T15:00:00Z,0,False,False,16,,9,,[],2,2,75134
4,2293035,23,False,False,226,2023-02-11T15:00:00Z,0,False,False,18,,10,,[],3,3,75136


Columns team_h and team_a give the respective Home and Away team in ID form. I will use the dictionary made in the previous section to replace ID's with their respective Team Names. 

In [10]:
fixtures['Home Team']=fixtures["team_h"].apply(lambda x: team_id_dict.get(x))
fixtures['Away Team']=fixtures["team_a"].apply(lambda x: team_id_dict.get(x))
fixtures.head()

Unnamed: 0,code,event,finished,finished_provisional,id,kickoff_time,minutes,provisional_start_time,started,team_a,team_a_score,team_h,team_h_score,stats,team_h_difficulty,team_a_difficulty,pulse_id,Home Team,Away Team
0,2293039,23,False,False,230,2023-02-11T12:30:00Z,0,False,False,6,,19,,[],3,3,75140,West Ham,Chelsea
1,2293031,23,False,False,221,2023-02-11T15:00:00Z,0,False,False,4,,1,,[],2,4,75131,Arsenal,Brentford
2,2293032,23,False,False,223,2023-02-11T15:00:00Z,0,False,False,5,,7,,[],3,3,75133,Crystal Palace,Brighton
3,2293033,23,False,False,224,2023-02-11T15:00:00Z,0,False,False,16,,9,,[],2,2,75134,Fulham,Nott'm Forest
4,2293035,23,False,False,226,2023-02-11T15:00:00Z,0,False,False,18,,10,,[],3,3,75136,Leicester,Spurs


Remove excess columns and replace team names (will be important later on). 

In [11]:
fixtures=fixtures[['event', 'Home Team', 'Away Team']]
fixtures=fixtures.rename(columns={'event':'GameWeek'})
fixtures['Home Team'] = fixtures['Home Team'].replace({'Newcastle':'Newcastle Utd', 'Spurs':'Tottenham','Leeds':'Leeds United', 'Leicester':'Leicester City', 'Man City':'Manchester City', 'Man Utd':'Manchester Utd', "Nott'm Forest":"Nott'ham Forest"})
fixtures['Away Team'] = fixtures['Away Team'].replace({'Newcastle':'Newcastle Utd', 'Spurs':'Tottenham','Leeds':'Leeds United', 'Leicester':'Leicester City', 'Man City':'Manchester City', 'Man Utd':'Manchester Utd', "Nott'm Forest":"Nott'ham Forest"})
dfFixtures=fixtures
dfFixtures.head()

Unnamed: 0,GameWeek,Home Team,Away Team
0,23,West Ham,Chelsea
1,23,Arsenal,Brentford
2,23,Crystal Palace,Brighton
3,23,Fulham,Nott'ham Forest
4,23,Leicester City,Tottenham


Our fixture data is cleaned up! 

## 2.3. Game Data 

Finally we need to bring in our games database. For this, we'll be using the aforementioned scraper (soccerdata) to bring in all games from the previous and current season from FBref. 

Be warned, this can take about 5 minutes.

In [12]:
#Create scraper class objects from 'soccerdata' library for scraping seasons 2021/2022 and 2022/2023.
fbref = sd.FBref(leagues='ENG-Premier League', seasons='2021')
fbref1 = sd.FBref(leagues='ENG-Premier League', seasons='2022', no_cache=True)

In [13]:
%%capture  
#Scrape player stats for 2021/2022 and 2022/2023 season.
player_match_stats_2021=fbref.read_player_match_stats(stat_type='summary')
player_match_stats=fbref1.read_player_match_stats(stat_type='summary')

In [14]:
player_match_stats_2021.head(3)
player_match_stats.head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,#,Nation,Pos,Age,Min,Performance,Performance,Performance,Performance,Performance,...,Expected,SCA,SCA,Passes,Passes,Passes,Passes,Dribbles,Dribbles,game_id
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Gls,Ast,PK,PKatt,Sh,...,xAG,SCA,GCA,Cmp,Att,Cmp%,Prog,Succ,Att,Unnamed: 25_level_1
league,season,game,team,player,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2
ENG-Premier League,2021,2020-09-12 Crystal Palace-Southampton,Crystal Palace,Andros Townsend,10.0,eng ENG,RM,29-058,90.0,0.0,1.0,0.0,0.0,0.0,...,0.6,2.0,1.0,13.0,30.0,43.3,1.0,2.0,2.0,db261cb0
ENG-Premier League,2021,2020-09-12 Crystal Palace-Southampton,Crystal Palace,Cheikhou Kouyaté,8.0,sn SEN,CB,30-266,90.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,11.0,19.0,57.9,1.0,0.0,0.0,db261cb0
ENG-Premier League,2021,2020-09-12 Crystal Palace-Southampton,Crystal Palace,Eberechi Eze,25.0,eng ENG,LM,22-075,10.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,8.0,10.0,80.0,1.0,2.0,5.0,db261cb0


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,#,Nation,Pos,Age,Min,Performance,Performance,Performance,Performance,Performance,...,SCA,Passes,Passes,Passes,Passes,Carries,Carries,Take-Ons,Take-Ons,game_id
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Gls,Ast,PK,PKatt,Sh,...,GCA,Cmp,Att,Cmp%,PrgP,Carries,PrgC,Att,Succ,Unnamed: 25_level_1
league,season,game,team,player,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2
ENG-Premier League,2223,2022-08-05 Crystal Palace-Arsenal,Arsenal,Aaron Ramsdale,1.0,eng ENG,GK,24-083,90,0,0,0,0,0,...,0,24,32,75.0,0,24,0,0,0,e62f6e78
ENG-Premier League,2223,2022-08-05 Crystal Palace-Arsenal,Arsenal,Albert Sambi Lokonga,23.0,be BEL,"CM,RW",22-287,1,0,0,0,0,0,...,0,1,1,100.0,0,1,0,0,0,e62f6e78
ENG-Premier League,2223,2022-08-05 Crystal Palace-Arsenal,Arsenal,Ben White,4.0,eng ENG,RB,24-301,90,0,0,0,0,0,...,0,31,41,75.6,6,24,2,2,1,e62f6e78


Here we can see the data is presented in a multi-level index. This can be difficult to work with so we'll convert to a single-level index. We will also reassign columns and merge our dataframes. 

In [80]:
pd.set_option('display.max_columns', None)

df_2021=player_match_stats_2021
df_2021=df_2021.droplevel(['league', 'season'])
df_2021.columns=df_2021.columns.droplevel(0)
df_2021.columns.values[0] = "#"
df_2021.columns.values[1] = 'Nation'
df_2021.columns.values[2] = 'Pos'
df_2021.columns.values[3] = 'Age'
df_2021.columns.values[4] = 'Mins'
df_2021.columns.values[28] = 'game_id'
df_2021=df_2021.reset_index()
df_2021 = df_2021.loc[:,~df_2021.columns.duplicated()]
#------------------------------------------------------------------------------------------------------------------------------#
df_2022=player_match_stats
df_work=player_match_stats 
df_2022=df_2022.droplevel(['league', 'season'])
df_2022.columns=df_2022.columns.droplevel(0)
df_2022.columns.values[0] = "#"
df_2022.columns.values[1] = 'Nation'
df_2022.columns.values[2] = 'Pos'
df_2022.columns.values[3] = 'Age'
df_2022.columns.values[4] = 'Mins'
df_2022.columns.values[30] = 'game_id'
df_2022.rename(columns={'PrgP':'Prog'}, inplace=True)
df_2022.drop(columns=['PrgC', 'Carries'], inplace=True)
df_2022=df_2022.reset_index()
df_2022 = df_2022.loc[:,~df_2022.columns.duplicated()]
#------------------------------------------------------------------------------------------------------------------------------#
dfFinal=pd.concat([df_2021, df_2022], ignore_index=True)

In [140]:
dfFinal.head()

Unnamed: 0,date,team,player,#,Nation,Pos,Age,Mins,Gls,Ast,PK,PKatt,Sh,SoT,CrdY,CrdR,Touches,Tkl,Int,Blocks,xG,npxG,xAG,SCA,GCA,Cmp,Att,Cmp%,Prog,Succ,game_id,Location,Opposition
0,2023-02-08,Manchester Utd,Wout Weghorst,27.0,nl NED,FW,30-185,58.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,17.0,0.0,0.0,1.0,0.0,0.0,0.0,2.0,0.0,10.0,11.0,90.9,1.0,0.0,80532a54,Home,Leeds United
1,2023-02-08,Leeds United,Tyler Adams,12.0,us USA,DM,23-359,90.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,56.0,2.0,4.0,5.0,0.0,0.0,0.0,1.0,0.0,30.0,44.0,68.2,5.0,1.0,80532a54,Away,Manchester Utd
2,2023-02-08,Leeds United,Brenden Aaronson,7.0,us USA,AM,22-109,28.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,18.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,8.0,10.0,80.0,1.0,0.0,80532a54,Away,Manchester Utd
3,2023-02-08,Leeds United,Crysencio Summerville,10.0,nl NED,RW,21-101,83.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,31.0,1.0,1.0,1.0,0.0,0.0,0.0,2.0,0.0,17.0,22.0,77.3,0.0,0.0,80532a54,Away,Manchester Utd
4,2023-02-08,Leeds United,Georginio Rutter,24.0,fr FRA,FW,20-294,28.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,18.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,2.0,6.0,33.3,0.0,2.0,80532a54,Away,Manchester Utd


Upon inspection, there is no column indicating a home/away fixture or the opposition team. However, when looking at the game column, the home team appears to always be the team listed first after the date, and the away team the second. We'll be leveraging this to make our own Location column to define home/away fixtures and make a new column called 'opposition'. Also, our game column will serve as a date column. 

In [83]:
#Replicate game column to be used as a date column. Also removing the date from Location column.
dfFinal['Location'] = dfFinal['game']
dfFinal['Location'] = dfFinal['Location'].str[11:]  

In [84]:
sep = '-'
def f1(x):
    return re.sub('[^0-9]','', x) #Replaces all characters that aren't numbers or hyphens to blank spaces.
def f2(x):
    return x.split(sep, 1)[1] #Deletes any character before a hyphen.
def f3(x):
    return x.split(sep, 1)[0] #Deletes any character after a hyphen.    

Apply f1 to 'game' column to become our 'date column' and sort our dataframe by date. 

Apply f2 to our location column to leave the second listed team 

Apply f3 to our location column to leave the first listed team 

In [85]:
#Fixing up new date column.
dfFinal['game'] = dfFinal['game'].apply(f1)
dfFinal['game'] = pd.to_datetime(dfFinal['game'])
dfFinal=dfFinal.rename(columns={'game':'date'})
dfFinal=dfFinal.reset_index(drop=True)
dfFinal=dfFinal.sort_values(by='date', ascending=False)
#------------------------------------------------------------------------------------------------------------------------------#
#Creating column for second listed team.
dfFinal['team2'] = dfFinal['Location'].apply(f2)
#------------------------------------------------------------------------------------------------------------------------------#
#Fixing up location column based on first listed team.
dfFinal['Location'] = dfFinal['Location'].apply(f3)

dfFinal.head()

Unnamed: 0,date,team,player,#,Nation,Pos,Age,Mins,Gls,Ast,PK,PKatt,Sh,SoT,CrdY,CrdR,Touches,Tkl,Int,Blocks,xG,npxG,xAG,SCA,GCA,Cmp,Att,Cmp%,Prog,Succ,game_id,Location,team2
16636,2023-02-08,Manchester United,Wout Weghorst,27.0,nl NED,FW,30-185,58.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,17.0,0.0,0.0,1.0,0.0,0.0,0.0,2.0,0.0,10.0,11.0,90.9,1.0,0.0,80532a54,Manchester Utd,Leeds United
16620,2023-02-08,Leeds United,Tyler Adams,12.0,us USA,DM,23-359,90.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,56.0,2.0,4.0,5.0,0.0,0.0,0.0,1.0,0.0,30.0,44.0,68.2,5.0,1.0,80532a54,Manchester Utd,Leeds United
16605,2023-02-08,Leeds United,Brenden Aaronson,7.0,us USA,AM,22-109,28.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,18.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,8.0,10.0,80.0,1.0,0.0,80532a54,Manchester Utd,Leeds United
16606,2023-02-08,Leeds United,Crysencio Summerville,10.0,nl NED,RW,21-101,83.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,31.0,1.0,1.0,1.0,0.0,0.0,0.0,2.0,0.0,17.0,22.0,77.3,0.0,0.0,80532a54,Manchester Utd,Leeds United
16608,2023-02-08,Leeds United,Georginio Rutter,24.0,fr FRA,FW,20-294,28.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,18.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,2.0,6.0,33.3,0.0,2.0,80532a54,Manchester Utd,Leeds United


In order to reassign values in the location column to Home/Away, the names in the team and location column must have the exact same spelling. 

In [86]:
dfFinal['team'].unique()
dfFinal['Location'].unique()

array(['Manchester United', 'Leeds United', 'Nottingham Forest',
       'Tottenham Hotspur', 'Manchester City', 'Bournemouth',
       'Southampton', 'Brighton & Hove Albion', 'Arsenal', 'Aston Villa',
       'Leicester City', 'Brentford', 'Liverpool', 'West Ham United',
       'Newcastle United', 'Wolverhampton Wanderers', 'Crystal Palace',
       'Everton', 'Chelsea', 'Fulham', 'West Bromwich Albion',
       'Sheffield United', 'Burnley'], dtype=object)

array(['Manchester Utd', "Nott'ham Forest", 'Tottenham', 'Brighton',
       'Brentford', 'Everton', 'Aston Villa', 'Wolves', 'Newcastle Utd',
       'Chelsea', 'Fulham', 'Leeds United', 'Arsenal', 'Manchester City',
       'Crystal Palace', 'Leicester City', 'Bournemouth', 'Liverpool',
       'Southampton', 'West Ham', 'Sheffield Utd', 'Burnley', 'West Brom'],
      dtype=object)

As we can see, some teams are named differently (e.g. Newcastle United vs Newcastle Utd).  This is resolved by manually renaming team spellings in the team column to those observed in the Location column's.

In [87]:
df_2022['team'] = df_2022['team'].replace({'Wolverhampton Wanderers':'Wolves', 'Newcastle United':'Newcastle Utd', 'Tottenham Hotspur':'Tottenham', 'Brighton & Hove Albion':'Brighton', 'Manchester United':'Manchester Utd', 'West Ham United':'West Ham', 'Nottingham Forest':"Nott'ham Forest"})
dfFinal['team'] = dfFinal['team'].replace({'Wolverhampton Wanderers':'Wolves', 'Newcastle United':'Newcastle Utd', 'Tottenham Hotspur':'Tottenham', 'Brighton & Hove Albion':'Brighton', 'Manchester United':'Manchester Utd', 'West Ham United':'West Ham', 'Nottingham Forest':"Nott'ham Forest"})

In [88]:
dfFinal['team'].unique()
dfFinal['Location'].unique()

array(['Manchester Utd', 'Leeds United', "Nott'ham Forest", 'Tottenham',
       'Manchester City', 'Bournemouth', 'Southampton', 'Brighton',
       'Arsenal', 'Aston Villa', 'Leicester City', 'Brentford',
       'Liverpool', 'West Ham', 'Newcastle Utd', 'Wolves',
       'Crystal Palace', 'Everton', 'Chelsea', 'Fulham',
       'West Bromwich Albion', 'Sheffield United', 'Burnley'],
      dtype=object)

array(['Manchester Utd', "Nott'ham Forest", 'Tottenham', 'Brighton',
       'Brentford', 'Everton', 'Aston Villa', 'Wolves', 'Newcastle Utd',
       'Chelsea', 'Fulham', 'Leeds United', 'Arsenal', 'Manchester City',
       'Crystal Palace', 'Leicester City', 'Bournemouth', 'Liverpool',
       'Southampton', 'West Ham', 'Sheffield Utd', 'Burnley', 'West Brom'],
      dtype=object)

We can now focus on creating the opposition column.

In [89]:
#Create opposition column and give TRUE/FALSE values based on if Location/Team columns are the same.
dfFinal['Opposition'] = dfFinal['Location']
dfFinal = dfFinal.assign(Opposition = lambda x: (x['team']==x['Location']))
dfFinal.reset_index(drop = True, inplace= True)

#If TRUE, we know the opposition is the team listed in team2 column, otherwise it must be the team listed in location column. 
rowcount = 0 
for x in dfFinal['Opposition']:
    if x == True:
        dfFinal.at[rowcount, 'Opposition']=dfFinal.at[rowcount, 'team2']
    else:
        dfFinal.at[rowcount, 'Opposition']=dfFinal.at[rowcount, 'Location']
    rowcount = rowcount + 1

dfFinal.drop(columns=['team2'], inplace=True)

dfFinal.head()

Unnamed: 0,date,team,player,#,Nation,Pos,Age,Mins,Gls,Ast,PK,PKatt,Sh,SoT,CrdY,CrdR,Touches,Tkl,Int,Blocks,xG,npxG,xAG,SCA,GCA,Cmp,Att,Cmp%,Prog,Succ,game_id,Location,Opposition
0,2023-02-08,Manchester Utd,Wout Weghorst,27.0,nl NED,FW,30-185,58.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,17.0,0.0,0.0,1.0,0.0,0.0,0.0,2.0,0.0,10.0,11.0,90.9,1.0,0.0,80532a54,Manchester Utd,Leeds United
1,2023-02-08,Leeds United,Tyler Adams,12.0,us USA,DM,23-359,90.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,56.0,2.0,4.0,5.0,0.0,0.0,0.0,1.0,0.0,30.0,44.0,68.2,5.0,1.0,80532a54,Manchester Utd,Manchester Utd
2,2023-02-08,Leeds United,Brenden Aaronson,7.0,us USA,AM,22-109,28.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,18.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,8.0,10.0,80.0,1.0,0.0,80532a54,Manchester Utd,Manchester Utd
3,2023-02-08,Leeds United,Crysencio Summerville,10.0,nl NED,RW,21-101,83.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,31.0,1.0,1.0,1.0,0.0,0.0,0.0,2.0,0.0,17.0,22.0,77.3,0.0,0.0,80532a54,Manchester Utd,Manchester Utd
4,2023-02-08,Leeds United,Georginio Rutter,24.0,fr FRA,FW,20-294,28.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,18.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,2.0,6.0,33.3,0.0,2.0,80532a54,Manchester Utd,Manchester Utd


That's better. Let's now convert the team name in location to either home/away. 

In [90]:
#Test if value in column equals value in location/team (if TRUE - Home game, and if FALSE - Away game).
dfFinal = dfFinal.assign(Location = lambda x: (x['team']==x['Location']))

#Convert TRUE/FALSE boolean to string to allow for string replacements. 
mask = dfFinal.applymap(type) != bool
d = {True: 'TRUE', False: 'FALSE'}                                        
dfFinal = dfFinal.where(mask, dfFinal.replace(d))

#Replace TRUE/FALSE with Home/Away string.
dfFinal['Location'] = dfFinal['Location'].replace({'TRUE':'Home', 'FALSE':'Away'})

dfFinal.head()

Unnamed: 0,date,team,player,#,Nation,Pos,Age,Mins,Gls,Ast,PK,PKatt,Sh,SoT,CrdY,CrdR,Touches,Tkl,Int,Blocks,xG,npxG,xAG,SCA,GCA,Cmp,Att,Cmp%,Prog,Succ,game_id,Location,Opposition
0,2023-02-08,Manchester Utd,Wout Weghorst,27.0,nl NED,FW,30-185,58.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,17.0,0.0,0.0,1.0,0.0,0.0,0.0,2.0,0.0,10.0,11.0,90.9,1.0,0.0,80532a54,Home,Leeds United
1,2023-02-08,Leeds United,Tyler Adams,12.0,us USA,DM,23-359,90.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,56.0,2.0,4.0,5.0,0.0,0.0,0.0,1.0,0.0,30.0,44.0,68.2,5.0,1.0,80532a54,Away,Manchester Utd
2,2023-02-08,Leeds United,Brenden Aaronson,7.0,us USA,AM,22-109,28.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,18.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,8.0,10.0,80.0,1.0,0.0,80532a54,Away,Manchester Utd
3,2023-02-08,Leeds United,Crysencio Summerville,10.0,nl NED,RW,21-101,83.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,31.0,1.0,1.0,1.0,0.0,0.0,0.0,2.0,0.0,17.0,22.0,77.3,0.0,0.0,80532a54,Away,Manchester Utd
4,2023-02-08,Leeds United,Georginio Rutter,24.0,fr FRA,FW,20-294,28.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,18.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,2.0,6.0,33.3,0.0,2.0,80532a54,Away,Manchester Utd


Voila! Let's print out our final 3 dataframes. 

In [91]:
dfPlayers.head(2)
dfFixtures.head(2)
dfFinal.head(2)

Unnamed: 0,id,element_type,first_name,second_name,Player,Team,Position
0,1,3,Granit,Xhaka,Granit Xhaka,Arsenal,MID
1,1,3,Mohamed,Elneny,Mohamed Elneny,Arsenal,MID


Unnamed: 0,GameWeek,Home Team,Away Team
0,23,West Ham,Chelsea
1,23,Arsenal,Brentford


Unnamed: 0,date,team,player,#,Nation,Pos,Age,Mins,Gls,Ast,PK,PKatt,Sh,SoT,CrdY,CrdR,Touches,Tkl,Int,Blocks,xG,npxG,xAG,SCA,GCA,Cmp,Att,Cmp%,Prog,Succ,game_id,Location,Opposition
0,2023-02-08,Manchester Utd,Wout Weghorst,27.0,nl NED,FW,30-185,58.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,17.0,0.0,0.0,1.0,0.0,0.0,0.0,2.0,0.0,10.0,11.0,90.9,1.0,0.0,80532a54,Home,Leeds United
1,2023-02-08,Leeds United,Tyler Adams,12.0,us USA,DM,23-359,90.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,56.0,2.0,4.0,5.0,0.0,0.0,0.0,1.0,0.0,30.0,44.0,68.2,5.0,1.0,80532a54,Away,Manchester Utd


That's all of our dataframes cleaned up!

# 3. Data Engineering (Team Data)  

### npxG/npxGA Calculation
We will be applying a linear weighted mean over each team's npxG/npxGA over their 10 most recent games to predict future npxG/npxGA. The weighting chosen has been derived from previous testing. 

## 3.1. Team npxG

In [92]:
#Collecting the npxG for each team in the current season for the last 10 games.
Teams=df_2022['team'].unique().tolist()
team_list=[]
npxG_Team_Weighted_Mean=[]

#Iterate through each team's previous 10 games via their game_id and store them as a list. 
for team in Teams: 
    dfTeam=dfFinal.loc[dfFinal['team'] == team]
    gamelist=(dfTeam['game_id'].unique().tolist())[:10] 
    gamelist=gamelist[::-1] 
    
#Go through each game_id and sum each player's npxG for said game and then multiply by a constant for our weighted average. 
    x=0 
    npxgsum=0
    sumweighting=0
    for game in gamelist:
        df2=dfTeam.loc[dfTeam['game_id'] == game]
        df2=df2.drop_duplicates().reset_index(drop=True)
        k=1+x
        x+=(1/9)
        npxgsum+=df2['npxG'].sum()*k 
        sumweighting=sumweighting+k 
    #Store results.
    npxGMean=npxgsum/sumweighting
    npxG_Team_Weighted_Mean.append(npxGMean)
    team_list.append(team)

Split our df into fixtures only containing home/away games.

In [93]:
dfHome=dfFinal.loc[dfFinal['Location'] == 'Home']
dfAway=dfFinal.loc[dfFinal['Location'] == 'Away']

## 3.2. Team npxG (Home Fixtures)

In [94]:
team_list1=[]
Home_npxG_Team_Weighted_Mean=[]

for team in Teams: 
    dfTeam=dfHome.loc[dfHome['team'] == team] #This is where we specify home games. All else is same as above. 
    gamelist=(dfTeam['game_id'].unique().tolist())[:10] 
    gamelist=gamelist[::-1] 
    
    Home_npxgsum=0
    x=0
    sumweighting=0
    for game in gamelist:
        df2=dfTeam.loc[dfTeam['game_id'] == game]
        df2=df2.drop_duplicates().reset_index(drop=True)
        k=1+x
        x+=(1/9)
        Home_npxgsum+=df2['npxG'].sum()*k
        sumweighting=sumweighting+k 
    
    Home_npxGMean=Home_npxgsum/sumweighting
    Home_npxG_Team_Weighted_Mean.append(Home_npxGMean)
    team_list1.append(team)

## 3.3. Team npxG (Away Fixtures)

In [95]:
team_list2=[]
Away_npxG_Team_Weighted_Mean=[]

for team in Teams: 
    dfTeam=dfAway.loc[dfAway['team'] == team] #This is where we specify away games. All else is same as above.
    gamelist=(dfTeam['game_id'].unique().tolist())[:10] 
    gamelist=gamelist[::-1] 
    
    Away_npxgsum=0
    x=0
    sumweighting=0
    for game in gamelist:
        df2=dfTeam.loc[dfTeam['game_id'] == game]
        df2=df2.drop_duplicates().reset_index(drop=True)
        k=1+x
        x+=(1/9)
        Away_npxgsum+=df2['npxG'].sum()*k
        sumweighting=sumweighting+k 

    Away_npxGMean=Away_npxgsum/sumweighting
    Away_npxG_Team_Weighted_Mean.append(Away_npxGMean)
    team_list2.append(team)

## 3.4. Team npxGA 

Same process as above but summing npxG of the opposition to collect defensive stats (npxGA) for each team.

In [96]:
team_list1=[]
npxGA_Team_Weighted_Mean=[]

for team in Teams: 
    dfTeam=dfFinal.loc[dfFinal['team'] == team]
    gamelist=(dfTeam['game_id'].unique().tolist())[:10] 
    gamelist=gamelist[::-1] 
     
    x=0 
    npxgasum=0
    sumweighting=0
    for game in gamelist:
        df2=dfFinal.loc[dfFinal['game_id'] == game]
        df2=df2.loc[df2['team'] != team] #Here we specify collecting npxG for opposition - in turn giving us the npxGA.
        df2=df2.drop_duplicates().reset_index(drop=True)
        k=1+x
        x+=(1/9)
        npxgasum+=df2['npxG'].sum()*k
        sumweighting=sumweighting+k 

    npxGAMean=npxgasum/sumweighting
    npxGA_Team_Weighted_Mean.append(npxGAMean)
    team_list1.append(team)

## 3.5. Team npxGA (Home Fixtures)

In [97]:
team_list1=[]
Home_npxGA_Team_Weighted_Mean=[]

for team in Teams: 
    dfTeam=dfHome.loc[dfHome['team'] == team]
    gamelist=(dfTeam['game_id'].unique().tolist())[:10] 
    gamelist=gamelist[::-1] 
    
    Home_npxgasum=0
    x=0
    sumweighting=0
    for game in gamelist:
        df2=dfAway.loc[dfAway['game_id'] == game]
        df2=df2.loc[df2['team'] != team]
        df2=df2.drop_duplicates().reset_index(drop=True)
        k=1+x
        x+=(1/9)
        Home_npxgasum+=df2['npxG'].sum()*k
        sumweighting=sumweighting+k 
        
    Home_npxGAMean=Home_npxgasum/sumweighting
    Home_npxGA_Team_Weighted_Mean.append(Home_npxGAMean)
    team_list1.append(team)

## 3.6. Team npxGA (Away Fixtures)

In [98]:
team_list2=[]
Away_npxGA_Team_Weighted_Mean=[]

for team in Teams: 
    dfTeam=dfAway.loc[dfAway['team'] == team]
    gamelist=(dfTeam['game_id'].unique().tolist())[:10] 
    gamelist=gamelist[::-1] 
    
    Away_npxgasum=0
    x=0
    sumweighting=0
    for game in gamelist:
        df2=dfHome.loc[dfHome['game_id'] == game]
        df2=df2.loc[df2['team'] != team]
        df2=df2.drop_duplicates().reset_index(drop=True)
        k=1+x
        x+=(1/9)
        Away_npxgasum+=df2['npxG'].sum()*k
        sumweighting=sumweighting+k   

    Away_npxGAMean=Away_npxgasum/sumweighting
    Away_npxGA_Team_Weighted_Mean.append(Away_npxGAMean)
    team_list1.append(team)

Let's combine our npxG and npxGA home/away data into a new dataframe for inspection. 

In [99]:
df_att_team = pd.DataFrame((zip(team_list, npxG_Team_Weighted_Mean, Home_npxG_Team_Weighted_Mean, Away_npxG_Team_Weighted_Mean)),
               columns =['Team', 'npxG', 'Home npxG', 'Away npxG'])
df_def_team = pd.DataFrame((zip(team_list1, npxGA_Team_Weighted_Mean, Home_npxGA_Team_Weighted_Mean, Away_npxGA_Team_Weighted_Mean)),
               columns =['Team', 'npxGA', 'Home npxGA', 'Away npxGA'])
df_team=(pd.merge(df_att_team, df_def_team, on='Team'))
df_team.head()

Unnamed: 0,Team,npxG,Home npxG,Away npxG,npxGA,Home npxGA,Away npxGA
0,Arsenal,2.03037,2.32,1.661481,0.865926,0.59037,0.985926
1,Crystal Palace,0.777037,0.778519,0.851111,1.371111,1.415556,1.39037
2,Aston Villa,1.237778,1.334815,0.959259,1.505185,1.496296,1.116296
3,Bournemouth,0.96963,0.942963,0.74963,1.594074,1.355556,1.562222
4,Chelsea,1.225926,1.142222,1.144444,1.202963,1.168889,1.365185


Everything looks good! Note how home stats are typically better than away; higher npxG and lower npxGA i.e. they score more and concede less.

Next we will create Home/Away modifiers. 

# 4. Home/Away Modifiers for npxG/npxGA 

We will consider the impact oh Home/Away games for each team by creating modifiers for both npxG and npxGA


## 4.1. Home/Away Modifier Calculation

Our modifier will consist of two methods given equal weighting; 

1: A team's home/away performance relative to their base performance. 

2: A team's home/away performance relative to  other team's home/away performances (with a MinMax scaling applied to reduce the impact of more extreme results).

In [100]:
#Modifier 1: A team's home/away performance relative to their base performance.
scaler = MinMaxScaler(feature_range=(0.95, 1.05)) #bring in a scaler to lessen effect of outliers. 
df_team['Att_Home_Modifier_1']=df_team['Home npxG']/df_team['npxG']
df_team['Att_Home_Modifier_1']=scaler.fit_transform(df_team['Att_Home_Modifier_1'].values.reshape(-1, 1))

df_team['Att_Away_Modifier_1']=df_team['Away npxG']/df_team['npxG']
df_team['Att_Away_Modifier_1']=scaler.fit_transform(df_team['Att_Away_Modifier_1'].values.reshape(-1, 1))

df_team['Def_Home_Modifier_1']=df_team['Home npxGA']/df_team['npxGA']
df_team['Def_Home_Modifier_1']=scaler.fit_transform(df_team['Def_Home_Modifier_1'].values.reshape(-1, 1))

df_team['Def_Away_Modifier_1']=df_team['Away npxGA']/df_team['npxGA']
df_team['Def_Away_Modifier_1']=scaler.fit_transform(df_team['Def_Away_Modifier_1'].values.reshape(-1, 1))

#Modifier 2: A team's home/away performance relative to  other team's home/away performances.
scaler = MinMaxScaler(feature_range=(0.9, 1.1)) #bring in a scaler to lessen effect of outliers. 
df_team['Att_Home_Modifier_2']=df_team['Home npxG']/(df_team['Home npxG'].median())
df_team['Att_Home_Modifier_2']=scaler.fit_transform(df_team['Att_Home_Modifier_2'].values.reshape(-1, 1))

df_team['Att_Away_Modifier_2']=df_team['Away npxG']/(df_team['Away npxG'].median())
df_team['Att_Away_Modifier_2']=scaler.fit_transform(df_team['Att_Away_Modifier_2'].values.reshape(-1, 1))

df_team['Def_Home_Modifier_2']=df_team['Home npxGA']/(df_team['Home npxGA'].median())     
df_team['Def_Home_Modifier_2']=scaler.fit_transform(df_team['Def_Home_Modifier_2'].values.reshape(-1, 1))

df_team['Def_Away_Modifier_2']=df_team['Away npxGA']/(df_team['Away npxGA'].median())
df_team['Def_Away_Modifier_2']=scaler.fit_transform(df_team['Def_Away_Modifier_2'].values.reshape(-1, 1))

#Final Modifier: Modifier 1 and 2 with equal weighting.                                                        
df_team['npxG_Final_Home_Multiplier']=(df_team['Att_Home_Modifier_1']+df_team['Att_Home_Modifier_2'])/2
df_team['npxG_Final_Away_Multiplier']=(df_team['Att_Away_Modifier_1']+df_team['Att_Away_Modifier_2'])/2
df_team['npxGA_Final_Home_Multiplier']=(df_team['Def_Home_Modifier_1']+df_team['Def_Home_Modifier_2'])/2
df_team['npxGA_Final_Away_Multiplier']=(df_team['Def_Away_Modifier_1']+df_team['Def_Away_Modifier_2'])/2

#Apply modifier to relevant columns.
df_team['Home_npxG']=df_team['Home npxG']*df_team['npxG_Final_Home_Multiplier']
df_team['Away_npxG']=df_team['Away npxG']*df_team['npxG_Final_Away_Multiplier']
df_team['Home_npxGA']=df_team['Home npxGA']*df_team['npxGA_Final_Home_Multiplier']
df_team['Away_npxGA']=df_team['Away npxGA']*df_team['npxGA_Final_Away_Multiplier']

#Clean up df.
df_Team_Final=df_team[['Team', 'npxG', 'Home_npxG', 'Away_npxG', 'npxGA', 'Home_npxGA', 'Away_npxGA', 'npxG_Final_Home_Multiplier', 'npxG_Final_Away_Multiplier', 'npxGA_Final_Home_Multiplier', 'npxGA_Final_Away_Multiplier']]
df_Team_Final.sort_values(by='npxGA', ascending=False, inplace=True)

Let's print out our new updated dataframe. 

In [101]:
df_Team_Final.head()

Unnamed: 0,Team,npxG,Home_npxG,Away_npxG,npxGA,Home_npxGA,Away_npxGA,npxG_Final_Home_Multiplier,npxG_Final_Away_Multiplier,npxGA_Final_Home_Multiplier,npxGA_Final_Away_Multiplier
15,Leicester City,1.305926,0.80443,1.551667,1.781481,1.297524,1.948589,0.930574,1.048949,1.004967,1.047628
5,Everton,0.985185,1.173706,0.745086,1.622222,1.385331,1.946562,0.991554,0.941822,1.024204,1.05325
9,Wolves,0.974074,0.906271,0.818432,1.612593,1.481259,1.311491,0.958829,0.956609,1.037727,0.981982
3,Bournemouth,0.96963,0.904156,0.700373,1.594074,1.391489,1.584889,0.958846,0.934292,1.026508,1.014509
7,Liverpool,1.761481,1.963956,1.754461,1.563704,1.118936,1.771421,1.029647,1.049412,0.989236,1.036592


## 4.2. Dictionaries

Various dictionaries for future use. 

In [102]:
Dict_Att_Home_Mod=dict(zip(df_Team_Final.Team, df_Team_Final.npxG_Final_Home_Multiplier))
Dict_Att_Home=dict(zip(df_Team_Final.Team, df_Team_Final.Home_npxG)) 
Dict_Att_Away_Mod=dict(zip(df_Team_Final.Team, df_Team_Final.npxG_Final_Away_Multiplier))
Dict_Att_Away=dict(zip(df_Team_Final.Team, df_Team_Final.Away_npxG))
Dict_Def_Home=dict(zip(df_Team_Final.Team, df_Team_Final.npxGA_Final_Home_Multiplier))
Dict_Def_Away=dict(zip(df_Team_Final.Team, df_Team_Final.npxGA_Final_Away_Multiplier))

# 5. Fixture Tables

Next we'll move onto creating a fixture fixture table for each team for the next five GameWeeks. 

## 5.1. Home Fixture Table

In [103]:
#Take the current gameweek and make a list of the upcoming five GameWeeks. 
Current_GameWeek=fixtures['GameWeek'].min()
Five_GameWeeks=np.arange(Current_GameWeek, Current_GameWeek+5, 1)

#Function to convert list into concatenated strings. 
def lts(s):
    str1 = ""
    for ele in s:
        str1 += ele
    return str1

awaylist=[]
awaylist1=[]
teamlist=[]

#Go through each team and find all their home fixtures for the next five GameWeeks.
for team in Teams:
    f1=fixtures[fixtures['Home Team'].str.contains(team)]
    awaylist=[]
    for i in Five_GameWeeks:
        f2=f1[f1['GameWeek']==i]
        
        if f2.shape[0]>0:
            away=f2.iloc[0,2] + ' ' + str(i) + ' - '
            awaylist.append(away)
    teamlist.append(team)
    awayfix=lts(awaylist)
    awaylist1.append(awayfix)

d = {'Team':teamlist,'Home Fixtures':awaylist1}
Home_Fixtures = pd.DataFrame(d)
Home_Fixtures.head()

Unnamed: 0,Team,Home Fixtures
0,Arsenal,Brentford 23 - Everton 25 - Bournemouth 26 -
1,Crystal Palace,Brighton 23 - Liverpool 25 - Manchester City 2...
2,Aston Villa,Arsenal 24 - Crystal Palace 26 -
3,Bournemouth,Newcastle Utd 23 - Manchester City 25 - Liverp...
4,Chelsea,Southampton 24 - Leeds United 26 -


We've captured the important data here. We just need to clean it up by seperating each gameweek into a column and placing the fixtures in the appropriate cell. 

In [104]:
#Seperate each GameWeek into a column and place the fixtures for each team in their respective cells. 
def ics(the_list, substring):
    for i, s in enumerate(the_list):
        if substring in s:
              return i
    return -1

dlist=[]
for g in Five_GameWeeks:
    dlist=[]
    for x in Home_Fixtures['Home Fixtures']:
        fixture = x.split(" - ")
        index = ics(fixture, str(g))
        d = fixture[index]
        d = re.sub(r'\d+', '', d)
        d = d.strip()
        dlist.append(d)
    Home_Fixtures['GW' + str(g) + ' Home'] = dlist
Home_Fixtures = Home_Fixtures.drop(columns=['Home Fixtures'])
Home_Fixtures.head()

Unnamed: 0,Team,GW23 Home,GW24 Home,GW25 Home,GW26 Home,GW27 Home
0,Arsenal,Brentford,,Everton,Bournemouth,
1,Crystal Palace,Brighton,,Liverpool,,Manchester City
2,Aston Villa,,Arsenal,,Crystal Palace,
3,Bournemouth,Newcastle Utd,,Manchester City,,Liverpool
4,Chelsea,,Southampton,,Leeds United,


## 5.2. Away Fixture Table

In [105]:
homelist=[]
homelist1=[]
teamlist=[]

for team in Teams:
    f1=fixtures[fixtures['Away Team'].str.contains(team)]
    
    homelist=[]
    for i in Five_GameWeeks:
        f2=f1[f1['GameWeek']==i]
        if f2.shape[0]>0:
            home=f2.iloc[0,1] + ' ' + str(i) + ' - '
            homelist.append(home)
    teamlist.append(team)
    homefix=lts(homelist)
    homelist1.append(homefix)

d = {'Team':teamlist,'Away Fixtures':homelist1}
Away_Fixtures = pd.DataFrame(d)

#------------------------------------------------------------------------------------------------------------------------------#

dlist=[]
for g in Five_GameWeeks:
    dlist=[]
    for x in Away_Fixtures['Away Fixtures']:
        fixture = x.split(" - ")
        index = ics(fixture, str(g))
        d = fixture[index]
        d = re.sub(r'\d+', '', d)
        d = d.strip()
        dlist.append(d)
    Away_Fixtures['GW' + str(g) + ' Away'] = dlist
Away_Fixtures = Away_Fixtures.drop(columns=['Away Fixtures'])
Away_Fixtures.head()

Unnamed: 0,Team,GW23 Away,GW24 Away,GW25 Away,GW26 Away,GW27 Away
0,Arsenal,,Aston Villa,Leicester City,,Fulham
1,Crystal Palace,,Brentford,,Aston Villa,Brighton
2,Aston Villa,Manchester City,,Everton,,West Ham
3,Bournemouth,,Wolves,,Arsenal,
4,Chelsea,West Ham,,Tottenham,,Leicester City


## 5.3. Home Fixture Table - Insert Stats

Here we will be utilizing the dictionaries created earlier to insert attacking/defensive stats. 

In [106]:
Home_Fixtures_Team=Home_Fixtures.copy()
Home_Fixtures_Team['Team npxG']=Home_Fixtures_Team['Team']

#Replace team with their respective defensive away stats. 
for g in Five_GameWeeks:
    Home_Fixtures_Team=Home_Fixtures_Team.replace({"GW" + str(g) + " Home": Dict_Def_Away})
    Home_Fixtures_Team=Home_Fixtures_Team.replace({"Team": Dict_Att_Home})
    
columncount=0
for g in Five_GameWeeks:
    rowcount=0
    columncount = columncount + 1    
    
    for x in Home_Fixtures_Team["GW" + str(g) + " Home"]:
        if x == "":
            Home_Fixtures_Team.iloc[rowcount, columncount]=0 #Replace blank spaces with 0. 
        else:
            Home_Fixtures_Team.iloc[rowcount, columncount]=Home_Fixtures_Team.iloc[rowcount, 0]*x #npxG modified by npxGA of away team
        rowcount = rowcount + 1
        
Home_Fixtures_Team.head()

Unnamed: 0,Team,GW23 Home,GW24 Home,GW25 Home,GW26 Home,GW27 Home,Team npxG
0,2.461233,2.463128,0.0,2.592294,2.496943,0.0,Arsenal
1,0.739883,0.684391,0.0,0.766957,0.0,0.72327,Crystal Palace
2,1.324362,0.0,1.285257,0.0,1.32494,0.0,Aston Villa
3,0.904156,0.900638,0.0,0.883854,0.0,0.937241,Bournemouth
4,1.106504,0.0,1.109284,0.0,1.10923,0.0,Chelsea


## 5.4. Away Fixture Table - Insert Stats

In [107]:
Away_Fixtures_Team=Away_Fixtures.copy()
Away_Fixtures_Team['Team npxG']=Away_Fixtures_Team['Team']
for g in Five_GameWeeks:
    Away_Fixtures_Team=Away_Fixtures_Team.replace({"GW" + str(g) + " Away": Dict_Def_Home})
    Away_Fixtures_Team=Away_Fixtures_Team.replace({"Team": Dict_Att_Away})

columncount=0
for g in Five_GameWeeks:
    rowcount=0
    columncount = columncount + 1    
    
    for x in Away_Fixtures_Team["GW" + str(g) + " Away"]:
        if x == "":
            Away_Fixtures_Team.iloc[rowcount, columncount]=0
        else:
            Away_Fixtures_Team.iloc[rowcount, columncount]=Away_Fixtures_Team.iloc[rowcount, 0]*x
        rowcount = rowcount + 1
Away_Fixtures_Team.head()

Unnamed: 0,Team,GW23 Away,GW24 Away,GW25 Away,GW26 Away,GW27 Away,Team npxG
0,1.718526,0.0,1.817634,1.727061,0.0,1.842231,Arsenal
1,0.833547,0.0,0.857256,0.0,0.881618,0.869279,Crystal Palace
2,0.917416,0.84861,0.0,0.939622,0.0,0.886366,Aston Villa
3,0.700373,0.0,0.726796,0.0,0.652459,0.0,Bournemouth
4,1.136066,1.097616,0.0,1.130971,0.0,1.141708,Chelsea


## 5.5. Finalizing Fixture Table

Let's merge Home and Away Fixtures for each team.

In [108]:
Fixtures_Merged_Team=(pd.merge(Home_Fixtures_Team, Away_Fixtures_Team, how='left', on='Team npxG'))
Fixtures_Merged_Team.head()

Unnamed: 0,Team_x,GW23 Home,GW24 Home,GW25 Home,GW26 Home,GW27 Home,Team npxG,Team_y,GW23 Away,GW24 Away,GW25 Away,GW26 Away,GW27 Away
0,2.461233,2.463128,0.0,2.592294,2.496943,0.0,Arsenal,1.718526,0.0,1.817634,1.727061,0.0,1.842231
1,0.739883,0.684391,0.0,0.766957,0.0,0.72327,Crystal Palace,0.833547,0.0,0.857256,0.0,0.881618,0.869279
2,1.324362,0.0,1.285257,0.0,1.32494,0.0,Aston Villa,0.917416,0.84861,0.0,0.939622,0.0,0.886366
3,0.904156,0.900638,0.0,0.883854,0.0,0.937241,Bournemouth,0.700373,0.0,0.726796,0.0,0.652459,0.0
4,1.106504,0.0,1.109284,0.0,1.10923,0.0,Chelsea,1.136066,1.097616,0.0,1.130971,0.0,1.141708


Not very pretty, let's clean it up.  

In [109]:
#Combine GW'x' Home and Away columns into 1 column
for g in Five_GameWeeks:
    Fixtures_Merged_Team['GW' + str(g)] = Fixtures_Merged_Team['GW' + str(g) + ' Home'] + Fixtures_Merged_Team['GW' + str(g) + ' Away']

#Send Team npxG to last column of dataframe
new_cols = [col for col in Fixtures_Merged_Team.columns if col != 'Team npxG'] + ['Team npxG']
Fixtures_Merged_Team = Fixtures_Merged_Team[new_cols]

#Remove all other columns apart from the relevant GW columns
Fixtures_Merged_Team = Fixtures_Merged_Team.iloc[:, -(np.count_nonzero(Five_GameWeeks)+1):]

#New npxG total columns based on the sum of all GW's and renaming to 'Team' column for clarity 
Fixtures_Merged_Team['npxG Total']=Fixtures_Merged_Team.iloc[:, 0:4].sum(axis=1)
Fixtures_Merged_Team=Fixtures_Merged_Team.rename(columns={'Team npxG':'Team'})

#Reposition team column to first column for clarity and sort by npxG total 
team = Fixtures_Merged_Team['Team']
Fixtures_Merged_Team.drop(labels=['Team'], axis=1,inplace = True)
Fixtures_Merged_Team.insert(0, 'Team', team)
Fixtures_Merged_Team.sort_values(by='npxG Total', ascending=False, inplace=True)
df_Team_Final_Fixture_Predictions = Fixtures_Merged_Team

df_Team_Final_Fixture_Predictions.head()

Unnamed: 0,Team,GW23,GW24,GW25,GW26,GW27,npxG Total
0,Arsenal,2.463128,1.817634,4.319355,2.496943,1.842231,11.09706
7,Liverpool,2.068537,1.67349,3.778607,1.98144,1.800969,9.502074
18,Manchester City,3.64042,1.753756,1.845453,2.05066,1.895733,9.290289
13,Tottenham,1.019454,1.599938,1.655276,1.052686,1.732227,5.327354
16,Brighton,1.47865,1.696751,0.0,1.577241,3.107593,4.752642


Much better - our final Teams-Prediction table! 

# 6. Data Engineering (Player Data) - Fuzzy Matching Included

We calculate the npxG for each player using the same steps used in the Data Engineering (Team Data) section and apply the same fixture tables. 

However, an extra step is required in our player data since the official FPL site and FBref have named certain players differently. This means when joining datasets between our two player dataframes from our different sources, a straight join between the two datasets will be infeasible. We will use fuzzy matching to resolve this. 

## 6.1. Player npxG and xAG

Very similar process to the one used in Team npxG. We will however include expected assisted goals (xAG) as well as npxG.


In [110]:
Players=df_2022['player'].unique().tolist() #Every player in the current season.
player_list=[]
mins=[]
npxG_Player_Weighted_Mean=[]
xAG_Player_Weighted_Mean=[]

#Iterate to find each player's last 10 games.
for player in Players: 
    dfPlayer=dfFinal.loc[dfFinal['player']==player]
    dfPlayer=(dfPlayer.drop_duplicates().reset_index(drop=True))[:10]
    dfPlayer=dfPlayer.iloc[::-1]
    
    #Give weighting ranging from 1 to 2. 
    step=1/(dfPlayer.shape[0])
    weighting=(np.arange(1, 2, step)).tolist()
    dfPlayer['weighting']=weighting
    weightingsum=dfPlayer['weighting'].sum()
    totalmins=dfPlayer['Mins'].sum()
    
    npxG_Weighted=((dfPlayer['npxG']*dfPlayer['weighting']).sum())/weightingsum
    xAG_Weighted=((dfPlayer['xAG']*dfPlayer['weighting']).sum())/weightingsum
    
    #Store results
    mins.append(totalmins)
    npxG_Player_Weighted_Mean.append(npxG_Weighted)
    xAG_Player_Weighted_Mean.append(xAG_Weighted)
    player_list.append(player)

Make a dataframe from our fbref player data. 

In [111]:
df_Players_fbref = pd.DataFrame((zip(player_list, npxG_Player_Weighted_Mean, xAG_Player_Weighted_Mean, mins)),
               columns =['Player', 'npxG/Appearance', 'xAG/Appearance', 'Total Minutes'])
df_Players_fbref['npxG + xAG/Appearance']=df_Players_fbref['npxG/Appearance']+df_Players_fbref['xAG/Appearance']
df_Players_fbref.sort_values(by='npxG + xAG/Appearance', ascending=False, inplace=True)
df_Players_fbref.head()

Unnamed: 0,Player,npxG/Appearance,xAG/Appearance,Total Minutes,npxG + xAG/Appearance
482,João Félix,0.8,0.2,57.0,1.0
104,Darwin Núñez,0.572414,0.283448,722.0,0.855862
503,Tetê,0.8,0.0,82.0,0.8
6,Gabriel Jesus,0.530345,0.166897,876.0,0.697241
9,Martin Ødegaard,0.322759,0.36069,868.0,0.683448


## 6.2. Fuzzy Matching

When merging our data with the player information from the official FPL site we encounter a problem. 

In [112]:
df_player_final=(pd.merge(df_Players_fbref, dfPlayers_FPL, how='left', on='Player'))
df_player_final = df_player_final.reset_index()

df_player_final.head(5)

Unnamed: 0,index,Player,npxG/Appearance,xAG/Appearance,Total Minutes,npxG + xAG/Appearance,Team,Position
0,0,João Félix,0.8,0.2,57.0,1.0,,
1,1,Darwin Núñez,0.572414,0.283448,722.0,0.855862,,
2,2,Tetê,0.8,0.0,82.0,0.8,,
3,3,Gabriel Jesus,0.530345,0.166897,876.0,0.697241,,
4,4,Martin Ødegaard,0.322759,0.36069,868.0,0.683448,Arsenal,MID


Alot of NaN's because not every player name perfectly matches between the two sets as proven below:

In [113]:
dfPlayers_FPL_ManUtd=dfPlayers_FPL.loc[dfPlayers_FPL['Team'] == 'Man Utd']
print ('Official FPL: ' + str(dfPlayers_FPL_ManUtd.iloc[8][0]))
print ('FBref: ' + str(df_player_final.iloc[4][1]))

Official FPL: Bruno Borges Fernandes
FBref: Martin Ødegaard


We will need to use fuzzy-matching to overcome this.

We will use two different algorithms. If the player match reaches a threshhold of 90% accuracy on either algorithm, we will presume the matching is correct. If not, we will manually cross-check with the suggestions our algorithms give us. 

In [114]:
df_player_final_null = df_player_final[df_player_final['Team'].isna()] #Dataframe of players which returned null

First fuzzy-matching algorithm: 

In [115]:
Suggested_Player = []
Score = []
PlayerList = dfPlayers['Player'].to_list()

#Reference each player from the official FPL site with the respective suggested player from FBref. 
for name_to_find in df_player_final_null['Player']:
    Suggested_P =  (process.extractOne(name_to_find, PlayerList))[0] #calling fuzzywords library to fuzzyword match.
    S = (process.extractOne(name_to_find, PlayerList))[1]
    Suggested_Player.append(Suggested_P)
    Score.append(S)

#Create dataframe from the lists created above. 
df_player_final_null['Suggested_Player']=Suggested_Player
df_player_final_null['Score']=Score
df_player_final_null=df_player_final_null.reset_index()
df_player_final_null=df_player_final_null[['Player', 'Suggested_Player', 'index', 'Score']]

df_player_final_null.head()

Unnamed: 0,Player,Suggested_Player,index,Score
0,João Félix,João Félix Sequeira,0,90
1,Darwin Núñez,Darwin Núñez Ribeiro,1,90
2,Tetê,Jean-Philippe Mateta,2,90
3,Gabriel Jesus,Gabriel dos Santos Magalhães,3,86
4,Bruno Fernandes,Bruno Borges Fernandes,6,95


A score of 90 or above demonstrates very high similarity. 

Split matches between between above/below 90 score.

In [116]:
df_player_final_null1=df_player_final_null.query('Score>=90') 
df_player_final_null2=df_player_final_null.query('Score<90')

Join players where suggested players >90 score.

In [117]:
#Merging suggested players with >90 score.
df_player_final_null1=df_player_final_null1[['Suggested_Player', 'index']]
df_player_final1=(pd.merge(df_player_final, df_player_final_null1, how='left', on='index'))
df_player_final1.Suggested_Player.fillna(df_player_final1.Player, inplace=True)
df_player_final1=df_player_final1[['Suggested_Player', 'index', 'npxG/Appearance', 'xAG/Appearance', 'npxG + xAG/Appearance']]

df_player_final1.head()

Unnamed: 0,Suggested_Player,index,npxG/Appearance,xAG/Appearance,npxG + xAG/Appearance
0,João Félix Sequeira,0,0.8,0.2,1.0
1,Darwin Núñez Ribeiro,1,0.572414,0.283448,0.855862
2,Jean-Philippe Mateta,2,0.8,0.0,0.8
3,Gabriel Jesus,3,0.530345,0.166897,0.697241
4,Martin Ødegaard,4,0.322759,0.36069,0.683448


Second fuzzy-matching algorithm:

In [118]:
Suggested_Player = []
Score = []

for name_to_find in df_player_final_null2['Player']:
    Suggested_P =  (process.extractOne(name_to_find, PlayerList, scorer=fuzz.token_sort_ratio))[0]
    S = (process.extractOne(name_to_find, PlayerList, scorer=fuzz.token_sort_ratio))[1]
    Suggested_Player.append(Suggested_P)
    Score.append(S)
    
df_player_final_null2['Suggested_Player1']=Suggested_Player
df_player_final_null2['Score1']=Score
df_player_final_null2=df_player_final_null2.reset_index(drop=True)

df_player_final_null2.head()

Unnamed: 0,Player,Suggested_Player,index,Score,Suggested_Player1,Score1
0,Gabriel Jesus,Gabriel dos Santos Magalhães,3,86,Gabriel Fernando de Jesus,68
1,Diogo Jota,Diogo Teixeira da Silva,22,86,Dango Ouattara,58
2,Andreas Pereira,Andreas Hoelgebaum Pereira,53,86,Andreas Hoelgebaum Pereira,73
3,Casemiro,Carlos Henrique Casimiro,54,79,Asmir Begović,57
4,Emi Buendía,Emiliano Buendía Stati,59,86,Emiliano Buendía Stati,65


Unfortunately at this point we have to manually review the remaining NaN. 

In [119]:
df_player_final_null2.at[0, 'Player'] = 'Gabriel Fernando de Jesus'
df_player_final_null2.at[1, 'Player'] = 'Diogo Teixeira da Silva'
df_player_final_null2.at[2, 'Player'] = 'Diego Da Silva Costa'
df_player_final_null2.at[3, 'Player'] = 'Andreas Hoelgebaum Pereira'
df_player_final_null2.at[4, 'Player'] = 'Ilkay Gündogan'
df_player_final_null2.at[5, 'Player'] = 'Pedro Lomba Neto'
df_player_final_null2.at[6, 'Player'] = 'Lucas Torreira di Pascua'
df_player_final_null2.at[7, 'Player'] = 'Daniel Castelo Podence'
df_player_final_null2.at[8, 'Player'] = 'Emiliano Buendía Stati'
df_player_final_null2.at[9, 'Player'] = 'Fábio Ferreira Vieira'
df_player_final_null2.at[10, 'Player'] = 'Wilfried Gnonto'
df_player_final_null2.at[11, 'Player'] = 'João Filipe Iria Santos Moutinho'
df_player_final_null2.at[12, 'Player'] = 'Joe Ayodele-Aribo'
df_player_final_null2.at[13, 'Player'] = 'Tomas Soucek'
df_player_final_null2.at[14, 'Player'] = 'Bernardo Veiga de Carvalho e Silva'
df_player_final_null2.at[15, 'Player'] = 'Gonçalo Manuel Ganchinho Guedes'
df_player_final_null2.at[16, 'Player'] = 'Bobby De Cordova-Reid'
df_player_final_null2.at[17, 'Player'] = 'Carlos Henrique Casimiro'
df_player_final_null2.at[18, 'Player'] = 'Fábio Freitas Gouveia Carvalho'
df_player_final_null2.at[19, 'Player'] = 'Benjamin White'
df_player_final_null2.at[20, 'Player'] = 'Konstantinos Tsimikas'
df_player_final_null2.at[21, 'Player'] = 'Rasmus Kristensen'
df_player_final_null2.at[22, 'Player'] = 'Rúben da Silva Neves'
df_player_final_null2.at[23, 'Player'] = 'Mateo Kovacic'
df_player_final_null2.at[24, 'Player'] = 'Rúben Gato Alves Dias'
df_player_final_null2.at[25, 'Player'] = 'Thiago Emiliano da Silva'
df_player_final_null2.at[26, 'Player'] = 'Renan Augusto Lodi dos Santos'
df_player_final_null2.at[27, 'Player'] = 'Jonathan Castro Otto'
df_player_final_null2.at[28, 'Player'] = 'Cédric Alves Soares'
df_player_final_null2.at[29, 'Player'] = 'Toti António Gomes'
df_player_final_null2.at[30, 'Player'] = 'Joseph Gomez'
df_player_final_null2.at[31, 'Player'] = 'Rúben Nascimento Vinagre'
df_player_final_null2.at[32, 'Player'] = 'Fabio Henrique Tavares'
df_player_final_null2.at[33, 'Player'] = 'Nélson Cabral Semedo'
df_player_final_null2.at[34, 'Player'] = 'Lucas Rodrigues Moura da Silva'
df_player_final_null2.at[35, 'Player'] = 'Jorge Luiz Frello Filho'
df_player_final_null2.at[36, 'Player'] = 'Mateo Joseph Fernández'
df_player_final_null2.at[37, 'Player'] = 'José Malheiro de Sá'
df_player_final_null2.at[38, 'Player'] = 'Alex Mighten'
df_player_final_null2.at[39, 'Player'] = 'Marcus Oliveira Alencar'
df_player_final_null2.at[40, 'Player'] = 'Carlos Ribeiro Dias'
df_player_final_null2.at[41, 'Player'] = 'Lukasz Fabianski'
df_player_final_null2=df_player_final_null2[['Player', 'index']]

That should be all our players matched. Let's check. 

In [120]:
#Merge Suggested Players with Players and cleaning up. 
df_player_final2=(pd.merge(df_player_final1, df_player_final_null2, how='left', on='index'))
df_player_final2.Player.fillna(df_player_final2.Suggested_Player, inplace=True)
df_player_final2=df_player_final2[['Player', 'npxG/Appearance', 'xAG/Appearance', 'npxG + xAG/Appearance']]
df_Player_Final=(pd.merge(df_player_final2, dfPlayers_FPL, how='left', on='Player'))
df_Player_Final['Team'] = df_Player_Final['Team'].replace({'Newcastle':'Newcastle Utd', 'Spurs':'Tottenham','Leeds':'Leeds United', 'Leicester':'Leicester City', 'Man City':'Manchester City', 'Man Utd':'Manchester Utd', "Nott'm Forest":"Nott'ham Forest"})

df_Player_Final.head()

Unnamed: 0,Player,npxG/Appearance,xAG/Appearance,npxG + xAG/Appearance,Team,Position
0,João Félix Sequeira,0.8,0.2,1.0,Chelsea,FWD
1,Darwin Núñez Ribeiro,0.572414,0.283448,0.855862,Liverpool,FWD
2,Jean-Philippe Mateta,0.8,0.0,0.8,Crystal Palace,FWD
3,Gabriel Fernando de Jesus,0.530345,0.166897,0.697241,Arsenal,FWD
4,Martin Ødegaard,0.322759,0.36069,0.683448,Arsenal,MID


The teams and positions for each player have successfully updated! 

# 7. Players Fixture Tables

Again, the same steps used in the Teams Fixture Tables will be used. We will be adjusting npxG and npxGA based on the home/away multipliers of the teams each player will face. 

## 7.1. Merge with Team Data

We merge our teams dataframe to update the npxG/xAG of each player using the home/away multipliers. 

In [121]:
df_Player_Final=(pd.merge(df_Player_Final, df_Team_Final, how='left', on='Team'))
df_Player_Final.head()

Unnamed: 0,Player,npxG/Appearance,xAG/Appearance,npxG + xAG/Appearance,Team,Position,npxG,Home_npxG,Away_npxG,npxGA,Home_npxGA,Away_npxGA,npxG_Final_Home_Multiplier,npxG_Final_Away_Multiplier,npxGA_Final_Home_Multiplier,npxGA_Final_Away_Multiplier
0,João Félix Sequeira,0.8,0.2,1.0,Chelsea,FWD,1.225926,1.106504,1.136066,1.202963,1.195891,1.374191,0.968729,0.992679,1.0231,1.006597
1,Darwin Núñez Ribeiro,0.572414,0.283448,0.855862,Liverpool,FWD,1.761481,1.963956,1.754461,1.563704,1.118936,1.771421,1.029647,1.049412,0.989236,1.036592
2,Jean-Philippe Mateta,0.8,0.0,0.8,Crystal Palace,FWD,0.777037,0.739883,0.833547,1.371111,1.49267,1.390977,0.950372,0.979363,1.054476,1.000436
3,Gabriel Fernando de Jesus,0.530345,0.166897,0.697241,Arsenal,FWD,2.03037,2.461233,1.718526,0.865926,0.549982,0.956813,1.060876,1.034334,0.931588,0.970472
4,Martin Ødegaard,0.322759,0.36069,0.683448,Arsenal,MID,2.03037,2.461233,1.718526,0.865926,0.549982,0.956813,1.060876,1.034334,0.931588,0.970472


In [122]:
#Calculate our new npxG/xAG's for Home/Away Fixtures and do some cleaning. 

#npxG and xAG - Home
df_Player_Final['npxG/Appearance Home']=df_Player_Final['npxG/Appearance']*df_Player_Final['npxG_Final_Home_Multiplier']
df_Player_Final['xAG/Appearance Home']=df_Player_Final['xAG/Appearance']*df_Player_Final['npxG_Final_Home_Multiplier']

#npxG and xAG - Away
df_Player_Final['npxG/Appearance Away']=df_Player_Final['npxG/Appearance']*df_Player_Final['npxG_Final_Away_Multiplier']
df_Player_Final['xAG/Appearance Away']=df_Player_Final['xAG/Appearance']*df_Player_Final['npxG_Final_Away_Multiplier']

#npxG + xAG - Home = Away
df_Player_Final['npxG + xAG/Appearance Home']=df_Player_Final['npxG/Appearance Home']+df_Player_Final['xAG/Appearance Home']
df_Player_Final['npxG + xAG/Appearance Away']=df_Player_Final['npxG/Appearance Away']+df_Player_Final['xAG/Appearance Away']

#Df cleaning
df_Player_Final=df_Player_Final[['Player', 'Position', 'Team', 'npxG/Appearance', 'xAG/Appearance', 'npxG + xAG/Appearance', 'npxG/Appearance Home', 'xAG/Appearance Home', 'npxG + xAG/Appearance Home', 'npxG/Appearance Away', 'xAG/Appearance Away', 'npxG + xAG/Appearance Away']]
df_Player_Final.head()

Unnamed: 0,Player,Position,Team,npxG/Appearance,xAG/Appearance,npxG + xAG/Appearance,npxG/Appearance Home,xAG/Appearance Home,npxG + xAG/Appearance Home,npxG/Appearance Away,xAG/Appearance Away,npxG + xAG/Appearance Away
0,João Félix Sequeira,FWD,Chelsea,0.8,0.2,1.0,0.774983,0.193746,0.968729,0.794143,0.198536,0.992679
1,Darwin Núñez Ribeiro,FWD,Liverpool,0.572414,0.283448,0.855862,0.589384,0.291852,0.881236,0.600698,0.297454,0.898152
2,Jean-Philippe Mateta,FWD,Crystal Palace,0.8,0.0,0.8,0.760298,0.0,0.760298,0.783491,0.0,0.783491
3,Gabriel Fernando de Jesus,FWD,Arsenal,0.530345,0.166897,0.697241,0.56263,0.177057,0.739687,0.548554,0.172627,0.72118
4,Martin Ødegaard,MID,Arsenal,0.322759,0.36069,0.683448,0.342407,0.382647,0.725054,0.33384,0.373073,0.706914


Split our dataframe into Home/Away Games and collect the next 5 GW's for each Player. 

In [123]:
df_Player_Final_Home=df_Player_Final[['Player', 'Position', 'Team', 'npxG/Appearance Home', 'xAG/Appearance Home', 'npxG + xAG/Appearance Home']]
df_Player_Final_Away=df_Player_Final[['Player', 'Position', 'Team', 'npxG/Appearance Away', 'xAG/Appearance Away', 'npxG + xAG/Appearance Away']]
df_Player_Final_Home=(pd.merge(df_Player_Final_Home, Home_Fixtures, how='left', on='Team'))
df_Player_Final_Away=(pd.merge(df_Player_Final_Away, Away_Fixtures, how='left', on='Team'))

df_Player_Final_Home=df_Player_Final_Home.dropna().reset_index(drop=True)
df_Player_Final_Away=df_Player_Final_Away.dropna().reset_index(drop=True)

df_Player_Final_Home.tail()

Unnamed: 0,Player,Position,Team,npxG/Appearance Home,xAG/Appearance Home,npxG + xAG/Appearance Home,GW23 Home,GW24 Home,GW25 Home,GW26 Home,GW27 Home
512,Tyrese Francois,MID,Fulham,0.0,0.0,0.0,Nott'ham Forest,,Wolves,,Arsenal
513,Jamal Lowe,MID,Bournemouth,0.0,0.0,0.0,Newcastle Utd,,Manchester City,,Liverpool
514,Conor Coventry,MID,West Ham,0.0,0.0,0.0,Chelsea,,Nott'ham Forest,,Aston Villa
515,Alphonse Areola,GKP,West Ham,0.0,0.0,0.0,Chelsea,,Nott'ham Forest,,Aston Villa
516,Facundo Pellistri Rebollo,MID,Manchester Utd,0.0,0.0,0.0,,Leicester City,,,Southampton


## 7.2. Home Fixture Tables

In [124]:
#Home fixture npxG + xAG for each player for the next 5 GW's. 
for g in Five_GameWeeks:
    df_Player_Final_Home=df_Player_Final_Home.replace({"GW" + str(g) + " Home": Dict_Def_Away})   
    
#Replace empty strings with 0.
columncount=5
for g in Five_GameWeeks:
    rowcount=0
    columncount = columncount + 1    
    
    for x in df_Player_Final_Home["GW" + str(g) + " Home"]:
        if x == "":
            df_Player_Final_Home.iloc[rowcount, columncount]=np.float64(0) #specify a numpy 0 since these return "0" when we divide by 0.
        rowcount=rowcount + 1 

#Make new columns where dividing npxG and xAG by the npxGA multiplier 
columncount=6
for g in Five_GameWeeks:   
        npxG_Multiplied=df_Player_Final_Home.iloc[:,3]*df_Player_Final_Home.iloc[:,columncount]
        df_Player_Final_Home["GW" + str(g) + " Home npxG"]=npxG_Multiplied
        columncount=columncount+1


columncount=6
for g in Five_GameWeeks:
    xAG_Multiplied=df_Player_Final_Home.iloc[:,4]*df_Player_Final_Home.iloc[:,columncount]
    df_Player_Final_Home["GW" + str(g) + " Home xAG"]=xAG_Multiplied
    columncount+=1

# #Data cleaning and making sum columns 
df_Player_Final_Home = df_Player_Final_Home.drop(df_Player_Final_Home.columns[2:11], axis=1)
df_Player_Final_Home.head()

Unnamed: 0,Player,Position,GW23 Home npxG,GW24 Home npxG,GW25 Home npxG,GW26 Home npxG,GW27 Home npxG,GW23 Home xAG,GW24 Home xAG,GW25 Home xAG,GW26 Home xAG,GW27 Home xAG
0,João Félix Sequeira,FWD,0.0,0.776931,0.0,0.776892,0.0,0.0,0.194233,0.0,0.194223,0.0
1,Darwin Núñez Ribeiro,FWD,0.620769,0.0,0.578764,0.594631,0.0,0.307393,0.0,0.286593,0.29445,0.0
2,Jean-Philippe Mateta,FWD,0.703276,0.0,0.788119,0.0,0.743227,0.0,0.0,0.0,0.0,0.0
3,Gabriel Fernando de Jesus,FWD,0.563063,0.0,0.59259,0.570793,0.0,0.177193,0.0,0.186485,0.179626,0.0
4,Martin Ødegaard,MID,0.342671,0.0,0.36064,0.347375,0.0,0.382942,0.0,0.403023,0.388199,0.0


## 7.3. Away Fixture Tables

In [125]:
#Away fixture npxG + xAG for each player for the next 5 GW's. 
for g in Five_GameWeeks:
    df_Player_Final_Away=df_Player_Final_Away.replace({"GW" + str(g) + " Away": Dict_Def_Home})   
    
#replace empty strings with 0.
columncount=5
for g in Five_GameWeeks:
    rowcount=0
    columncount = columncount + 1    
    
    for x in df_Player_Final_Away["GW" + str(g) + " Away"]:
        if x == "":
            df_Player_Final_Away.iloc[rowcount, columncount]=np.float64(0)
        rowcount=rowcount + 1 
        
#Make new columns where multiplying npxG and xAG by the npxGA multiplier 
columncount=6
for g in Five_GameWeeks:   
    npxG_Multiplied=df_Player_Final_Away.iloc[:,3]*df_Player_Final_Away.iloc[:,columncount]
    df_Player_Final_Away["GW" + str(g) + " Away npxG"]=npxG_Multiplied
    columncount=columncount+1

columncount=6
for g in Five_GameWeeks:
    xAG_Multiplied=df_Player_Final_Away.iloc[:,4]*df_Player_Final_Away.iloc[:,columncount]
    df_Player_Final_Away["GW" + str(g) + " Away xAG"]=xAG_Multiplied
    columncount+=1
 
#data cleaning and making sum columns 
df_Player_Final_Away = df_Player_Final_Away.drop(df_Player_Final_Away.columns[2:11], axis=1)
df_Player_Final_Away.head()

Unnamed: 0,Player,Position,GW23 Away npxG,GW24 Away npxG,GW25 Away npxG,GW26 Away npxG,GW27 Away npxG,GW23 Away xAG,GW24 Away xAG,GW25 Away xAG,GW26 Away xAG,GW27 Away xAG
0,João Félix Sequeira,FWD,0.767265,0.0,0.790582,0.0,0.798087,0.191816,0.0,0.197645,0.0,0.199522
1,Darwin Núñez Ribeiro,FWD,0.0,0.572975,0.633421,0.0,0.616621,0.0,0.283726,0.313658,0.0,0.305339
2,Jean-Philippe Mateta,FWD,0.0,0.805776,0.0,0.828675,0.817077,0.0,0.0,0.0,0.0,0.0
3,Gabriel Fernando de Jesus,FWD,0.0,0.580189,0.551278,0.0,0.58804,0.0,0.182582,0.173484,0.0,0.185053
4,Martin Ødegaard,MID,0.0,0.353093,0.335498,0.0,0.357871,0.0,0.394589,0.374926,0.0,0.399928


## 7.4. Finalizing Fixture Table

Replace inf values with 0. 

In [126]:
df_Player_Final_Home_Zeroes = df_Player_Final_Home.replace([np.inf], 0)  
df_Player_Final_Away_Zeroes = df_Player_Final_Away.replace([np.inf], 0)  

Merge Home and Away Fixtures.  

In [127]:
df_Player_Final_Fixtures=(pd.merge(df_Player_Final_Home_Zeroes, df_Player_Final_Away_Zeroes, how='left', on='Player'))
df_Player_Final_Fixtures.head()

Unnamed: 0,Player,Position_x,GW23 Home npxG,GW24 Home npxG,GW25 Home npxG,GW26 Home npxG,GW27 Home npxG,GW23 Home xAG,GW24 Home xAG,GW25 Home xAG,GW26 Home xAG,GW27 Home xAG,Position_y,GW23 Away npxG,GW24 Away npxG,GW25 Away npxG,GW26 Away npxG,GW27 Away npxG,GW23 Away xAG,GW24 Away xAG,GW25 Away xAG,GW26 Away xAG,GW27 Away xAG
0,João Félix Sequeira,FWD,0.0,0.776931,0.0,0.776892,0.0,0.0,0.194233,0.0,0.194223,0.0,FWD,0.767265,0.0,0.790582,0.0,0.798087,0.191816,0.0,0.197645,0.0,0.199522
1,Darwin Núñez Ribeiro,FWD,0.620769,0.0,0.578764,0.594631,0.0,0.307393,0.0,0.286593,0.29445,0.0,FWD,0.0,0.572975,0.633421,0.0,0.616621,0.0,0.283726,0.313658,0.0,0.305339
2,Jean-Philippe Mateta,FWD,0.703276,0.0,0.788119,0.0,0.743227,0.0,0.0,0.0,0.0,0.0,FWD,0.0,0.805776,0.0,0.828675,0.817077,0.0,0.0,0.0,0.0,0.0
3,Jean-Philippe Mateta,FWD,0.703276,0.0,0.788119,0.0,0.743227,0.0,0.0,0.0,0.0,0.0,FWD,0.0,0.084051,0.0,0.086439,0.08523,0.0,0.009725,0.0,0.010001,0.009861
4,Gabriel Fernando de Jesus,FWD,0.563063,0.0,0.59259,0.570793,0.0,0.177193,0.0,0.186485,0.179626,0.0,FWD,0.0,0.580189,0.551278,0.0,0.58804,0.0,0.182582,0.173484,0.0,0.185053


Combine GW 'x' Home and Away columns into 1 column. 

In [128]:
for g in Five_GameWeeks:
    df_Player_Final_Fixtures['GW' + str(g) + ' npxG'] = df_Player_Final_Fixtures['GW' + str(g) + ' Home npxG'] + df_Player_Final_Fixtures['GW' + str(g) + ' Away npxG']
    df_Player_Final_Fixtures['GW' + str(g) + ' xAG'] = df_Player_Final_Fixtures['GW' + str(g) + ' Home xAG'] + df_Player_Final_Fixtures['GW' + str(g) + ' Away xAG']
df_Player_Final_Fixtures.head()

Unnamed: 0,Player,Position_x,GW23 Home npxG,GW24 Home npxG,GW25 Home npxG,GW26 Home npxG,GW27 Home npxG,GW23 Home xAG,GW24 Home xAG,GW25 Home xAG,GW26 Home xAG,GW27 Home xAG,Position_y,GW23 Away npxG,GW24 Away npxG,GW25 Away npxG,GW26 Away npxG,GW27 Away npxG,GW23 Away xAG,GW24 Away xAG,GW25 Away xAG,GW26 Away xAG,GW27 Away xAG,GW23 npxG,GW23 xAG,GW24 npxG,GW24 xAG,GW25 npxG,GW25 xAG,GW26 npxG,GW26 xAG,GW27 npxG,GW27 xAG
0,João Félix Sequeira,FWD,0.0,0.776931,0.0,0.776892,0.0,0.0,0.194233,0.0,0.194223,0.0,FWD,0.767265,0.0,0.790582,0.0,0.798087,0.191816,0.0,0.197645,0.0,0.199522,0.767265,0.191816,0.776931,0.194233,0.790582,0.197645,0.776892,0.194223,0.798087,0.199522
1,Darwin Núñez Ribeiro,FWD,0.620769,0.0,0.578764,0.594631,0.0,0.307393,0.0,0.286593,0.29445,0.0,FWD,0.0,0.572975,0.633421,0.0,0.616621,0.0,0.283726,0.313658,0.0,0.305339,0.620769,0.307393,0.572975,0.283726,1.212186,0.600251,0.594631,0.29445,0.616621,0.305339
2,Jean-Philippe Mateta,FWD,0.703276,0.0,0.788119,0.0,0.743227,0.0,0.0,0.0,0.0,0.0,FWD,0.0,0.805776,0.0,0.828675,0.817077,0.0,0.0,0.0,0.0,0.0,0.703276,0.0,0.805776,0.0,0.788119,0.0,0.828675,0.0,1.560304,0.0
3,Jean-Philippe Mateta,FWD,0.703276,0.0,0.788119,0.0,0.743227,0.0,0.0,0.0,0.0,0.0,FWD,0.0,0.084051,0.0,0.086439,0.08523,0.0,0.009725,0.0,0.010001,0.009861,0.703276,0.0,0.084051,0.009725,0.788119,0.0,0.086439,0.010001,0.828456,0.009861
4,Gabriel Fernando de Jesus,FWD,0.563063,0.0,0.59259,0.570793,0.0,0.177193,0.0,0.186485,0.179626,0.0,FWD,0.0,0.580189,0.551278,0.0,0.58804,0.0,0.182582,0.173484,0.0,0.185053,0.563063,0.177193,0.580189,0.182582,1.143868,0.359969,0.570793,0.179626,0.58804,0.185053


General cleanup for clarity. 

In [129]:
#Deleting unneeded columns and new columns for Total npxG, xAG and npxG + xAG. Also reordering column list

#Create lists for reordering column purposes
Player_list=df_Player_Final_Fixtures['Player'] 
Position_list=df_Player_Final_Fixtures['Position_x']

#Specify last 10 columns
df_Player_Final_Fixtures=df_Player_Final_Fixtures.iloc[:,-10:]

#Sum metrics
df_Player_Final_Fixtures['Total npxG']=df_Player_Final_Fixtures.iloc[:, 0:5].sum(axis=1)
df_Player_Final_Fixtures['Total xAG']=df_Player_Final_Fixtures.iloc[:, 5:10].sum(axis=1)
df_Player_Final_Fixtures['Total npxG + xAG']=df_Player_Final_Fixtures['Total npxG']+df_Player_Final_Fixtures['Total xAG']

#Sorting
df_Player_Final_Fixtures.sort_values(by='Total npxG + xAG', ascending=False, inplace=True)

#Reorder columns
df_Player_Final_Fixtures.insert(0, 'Player', Player_list)
df_Player_Final_Fixtures.insert(0, 'Position', Position_list)

#Delete duplicates
df_Player_Final_Fixtures=df_Player_Final_Fixtures.drop_duplicates()

#Impute NaN with 0's. 
df_Player_Final_Fixtures=df_Player_Final_Fixtures.fillna(0)

Whew! Let's check out our final dataframe for predicting player npxG/XAG over the next five gameweeks.

In [130]:
df_Player_Final_Fixtures.head()

Unnamed: 0,Position,Player,GW23 npxG,GW23 xAG,GW24 npxG,GW24 xAG,GW25 npxG,GW25 xAG,GW26 npxG,GW26 xAG,GW27 npxG,GW27 xAG,Total npxG,Total xAG,Total npxG + xAG
1,FWD,Darwin Núñez Ribeiro,0.620769,0.307393,0.572975,0.283726,1.212186,0.600251,0.594631,0.29445,0.616621,0.305339,2.997048,2.411292,5.40834
0,FWD,João Félix Sequeira,0.767265,0.191816,0.776931,0.194233,0.790582,0.197645,0.776892,0.194223,0.798087,0.199522,2.720827,2.16637,4.887196
2,FWD,Jean-Philippe Mateta,0.703276,0.0,0.805776,0.0,0.788119,0.0,0.828675,0.0,1.560304,0.0,2.297171,2.388978,4.686149
4,FWD,Gabriel Fernando de Jesus,0.563063,0.177193,0.580189,0.182582,1.143868,0.359969,0.570793,0.179626,0.58804,0.185053,2.646896,1.883481,4.530377
5,MID,Martin Ødegaard,0.342671,0.382942,0.353093,0.394589,0.696138,0.777949,0.347375,0.388199,0.357871,0.399928,2.169432,2.271323,4.440755


# 8. Final Dataframes 

## 8.1. Team Stats

### Ordered by npxG

In [131]:
df_Team_Final.sort_values(by='npxG', ascending=False)

Unnamed: 0,Team,npxG,Home_npxG,Away_npxG,npxGA,Home_npxGA,Away_npxGA,npxG_Final_Home_Multiplier,npxG_Final_Away_Multiplier,npxGA_Final_Home_Multiplier,npxGA_Final_Away_Multiplier
0,Arsenal,2.03037,2.461233,1.718526,0.865926,0.549982,0.956813,1.060876,1.034334,0.931588,0.970472
18,Manchester City,1.802222,2.058671,1.797796,0.781481,0.489222,0.955099,1.036244,1.052939,0.925,0.977547
7,Liverpool,1.761481,1.963956,1.754461,1.563704,1.118936,1.771421,1.029647,1.049412,0.989236,1.036592
17,Manchester Utd,1.725185,1.751487,1.401488,1.055556,0.841907,1.322036,1.012204,1.005318,0.974764,1.008902
16,Brighton,1.592593,1.621101,1.40226,1.127407,1.279251,0.756444,1.005276,1.011245,1.042867,0.925
10,Newcastle Utd,1.457037,1.87145,1.248571,0.725926,0.585734,1.03374,1.035009,0.9962,0.953849,0.996109
11,Nott'ham Forest,1.348148,1.251429,0.872754,1.162222,0.890237,1.733025,0.978812,0.943329,0.975503,1.053392
15,Leicester City,1.305926,0.80443,1.551667,1.781481,1.297524,1.948589,0.930574,1.048949,1.004967,1.047628
14,Brentford,1.298519,1.407921,0.968618,1.088148,1.167093,1.277279,0.997739,0.961495,1.028444,1.00077
2,Aston Villa,1.237778,1.324362,0.917416,1.505185,1.582588,1.065837,0.992169,0.95638,1.05767,0.954798


### Ordered by npxGA

In [132]:
df_Team_Final.sort_values(by='npxGA', ascending=True)

Unnamed: 0,Team,npxG,Home_npxG,Away_npxG,npxGA,Home_npxGA,Away_npxGA,npxG_Final_Home_Multiplier,npxG_Final_Away_Multiplier,npxGA_Final_Home_Multiplier,npxGA_Final_Away_Multiplier
10,Newcastle Utd,1.457037,1.87145,1.248571,0.725926,0.585734,1.03374,1.035009,0.9962,0.953849,0.996109
18,Manchester City,1.802222,2.058671,1.797796,0.781481,0.489222,0.955099,1.036244,1.052939,0.925,0.977547
0,Arsenal,2.03037,2.461233,1.718526,0.865926,0.549982,0.956813,1.060876,1.034334,0.931588,0.970472
17,Manchester Utd,1.725185,1.751487,1.401488,1.055556,0.841907,1.322036,1.012204,1.005318,0.974764,1.008902
14,Brentford,1.298519,1.407921,0.968618,1.088148,1.167093,1.277279,0.997739,0.961495,1.028444,1.00077
16,Brighton,1.592593,1.621101,1.40226,1.127407,1.279251,0.756444,1.005276,1.011245,1.042867,0.925
19,West Ham,1.040741,1.248668,0.855339,1.155556,0.834472,1.099788,0.996278,0.958264,0.966155,0.972945
11,Nott'ham Forest,1.348148,1.251429,0.872754,1.162222,0.890237,1.733025,0.978812,0.943329,0.975503,1.053392
4,Chelsea,1.225926,1.106504,1.136066,1.202963,1.195891,1.374191,0.968729,0.992679,1.0231,1.006597
12,Southampton,0.883704,0.933004,0.806119,1.295556,1.358312,1.379755,0.968889,0.962211,1.040704,1.002513


## 8.2. Team Fixture Tables

### Home Fixtures

In [133]:
Home_Fixtures

Unnamed: 0,Team,GW23 Home,GW24 Home,GW25 Home,GW26 Home,GW27 Home
0,Arsenal,Brentford,,Everton,Bournemouth,
1,Crystal Palace,Brighton,,Liverpool,,Manchester City
2,Aston Villa,,Arsenal,,Crystal Palace,
3,Bournemouth,Newcastle Utd,,Manchester City,,Liverpool
4,Chelsea,,Southampton,,Leeds United,
5,Everton,,Leeds United,Aston Villa,,Brentford
6,Fulham,Nott'ham Forest,,Wolves,,Arsenal
7,Liverpool,Everton,,Wolves,Manchester Utd,
8,Leeds United,Manchester Utd,,Southampton,,Brighton
9,Wolves,,Bournemouth,,Tottenham,


### Away Fixtures

In [134]:
Away_Fixtures

Unnamed: 0,Team,GW23 Away,GW24 Away,GW25 Away,GW26 Away,GW27 Away
0,Arsenal,,Aston Villa,Leicester City,,Fulham
1,Crystal Palace,,Brentford,,Aston Villa,Brighton
2,Aston Villa,Manchester City,,Everton,,West Ham
3,Bournemouth,,Wolves,,Arsenal,
4,Chelsea,West Ham,,Tottenham,,Leicester City
5,Everton,Liverpool,,Arsenal,Nott'ham Forest,
6,Fulham,,Brighton,,Brentford,
7,Liverpool,,Newcastle Utd,Crystal Palace,,Bournemouth
8,Leeds United,,Everton,,Chelsea,
9,Wolves,Southampton,,Fulham,,Newcastle Utd


## 8.3. Team npxG Predictions 

In [135]:
df_Team_Final_Fixture_Predictions

Unnamed: 0,Team,GW23,GW24,GW25,GW26,GW27,npxG Total
0,Arsenal,2.463128,1.817634,4.319355,2.496943,1.842231,11.09706
7,Liverpool,2.068537,1.67349,3.778607,1.98144,1.800969,9.502074
18,Manchester City,3.64042,1.753756,1.845453,2.05066,1.895733,9.290289
13,Tottenham,1.019454,1.599938,1.655276,1.052686,1.732227,5.327354
16,Brighton,1.47865,1.696751,0.0,1.577241,3.107593,4.752642
15,Leicester City,0.799367,1.512509,0.780676,1.614827,0.809736,4.70738
17,Manchester Utd,1.484967,1.834908,0.0,1.386403,1.755889,4.706277
8,Leeds United,1.062699,1.277909,1.055969,1.276532,0.974323,4.673109
6,Fulham,1.230855,1.069122,1.147414,1.054335,1.133965,4.501726
5,Everton,0.737066,1.176598,1.814765,0.726834,1.17461,4.455263


## 8.4. Player npxG and xAG Predictions

### All Players 

In [136]:
df_Player_Final_Fixtures.head(15)

Unnamed: 0,Position,Player,GW23 npxG,GW23 xAG,GW24 npxG,GW24 xAG,GW25 npxG,GW25 xAG,GW26 npxG,GW26 xAG,GW27 npxG,GW27 xAG,Total npxG,Total xAG,Total npxG + xAG
1,FWD,Darwin Núñez Ribeiro,0.620769,0.307393,0.572975,0.283726,1.212186,0.600251,0.594631,0.29445,0.616621,0.305339,2.997048,2.411292,5.40834
0,FWD,João Félix Sequeira,0.767265,0.191816,0.776931,0.194233,0.790582,0.197645,0.776892,0.194223,0.798087,0.199522,2.720827,2.16637,4.887196
2,FWD,Jean-Philippe Mateta,0.703276,0.0,0.805776,0.0,0.788119,0.0,0.828675,0.0,1.560304,0.0,2.297171,2.388978,4.686149
4,FWD,Gabriel Fernando de Jesus,0.563063,0.177193,0.580189,0.182582,1.143868,0.359969,0.570793,0.179626,0.58804,0.185053,2.646896,1.883481,4.530377
5,MID,Martin Ødegaard,0.342671,0.382942,0.353093,0.394589,0.696138,0.777949,0.347375,0.388199,0.357871,0.399928,2.169432,2.271323,4.440755
6,FWD,Erling Haaland,1.116961,0.154907,0.582285,0.080755,0.612731,0.084977,0.585157,0.081153,0.629425,0.087292,2.547639,1.468005,4.015644
9,MID,Mohamed Salah,0.478665,0.198945,0.441812,0.183628,0.934698,0.388484,0.458511,0.190568,0.475467,0.197616,2.237748,1.710646,3.948394
11,FWD,Eddie Nketiah,0.620175,0.013912,0.639038,0.014335,1.259891,0.028262,0.628689,0.014103,0.647685,0.014529,2.547351,1.333268,3.880619
14,MID,Riyad Mahrez,0.548969,0.584298,0.286184,0.304602,0.301147,0.320528,0.287596,0.306104,0.309352,0.329261,2.0252,1.552841,3.578042
15,MID,Kevin De Bruyne,0.247308,0.884601,0.128924,0.461153,0.135665,0.485265,0.12956,0.463427,0.139362,0.498486,1.857651,1.7161,3.573752


### Forward Players

In [137]:
df_Player_Final_Fixtures_FWD=df_Player_Final_Fixtures.loc[df_Player_Final_Fixtures['Position']=='FWD']
df_Player_Final_Fixtures_FWD.head(15)

Unnamed: 0,Position,Player,GW23 npxG,GW23 xAG,GW24 npxG,GW24 xAG,GW25 npxG,GW25 xAG,GW26 npxG,GW26 xAG,GW27 npxG,GW27 xAG,Total npxG,Total xAG,Total npxG + xAG
1,FWD,Darwin Núñez Ribeiro,0.620769,0.307393,0.572975,0.283726,1.212186,0.600251,0.594631,0.29445,0.616621,0.305339,2.997048,2.411292,5.40834
0,FWD,João Félix Sequeira,0.767265,0.191816,0.776931,0.194233,0.790582,0.197645,0.776892,0.194223,0.798087,0.199522,2.720827,2.16637,4.887196
2,FWD,Jean-Philippe Mateta,0.703276,0.0,0.805776,0.0,0.788119,0.0,0.828675,0.0,1.560304,0.0,2.297171,2.388978,4.686149
4,FWD,Gabriel Fernando de Jesus,0.563063,0.177193,0.580189,0.182582,1.143868,0.359969,0.570793,0.179626,0.58804,0.185053,2.646896,1.883481,4.530377
6,FWD,Erling Haaland,1.116961,0.154907,0.582285,0.080755,0.612731,0.084977,0.585157,0.081153,0.629425,0.087292,2.547639,1.468005,4.015644
11,FWD,Eddie Nketiah,0.620175,0.013912,0.639038,0.014335,1.259891,0.028262,0.628689,0.014103,0.647685,0.014529,2.547351,1.333268,3.880619
12,FWD,Harry Kane,0.456671,0.121599,0.466405,0.124191,0.482537,0.128486,0.471558,0.125563,0.504969,0.134459,1.651402,1.365035,3.016437
10,FWD,Brennan Johnson,0.384966,0.243393,0.364257,0.2303,0.346962,0.219365,0.392466,0.248135,0.357505,0.226031,1.569878,1.443503,3.013381
23,FWD,Diogo Teixeira da Silva,0.380688,0.133877,0.351379,0.123569,0.743377,0.261423,0.364659,0.12824,0.378145,0.132982,1.732889,1.265449,2.998339
255,FWD,Jean-Philippe Mateta,0.073359,0.008488,0.805776,0.0,0.082209,0.009512,0.828675,0.0,0.894603,0.00897,0.969832,1.74176,2.711592


### Midfield Players 

In [138]:
df_Player_Final_Fixtures_MID=df_Player_Final_Fixtures.loc[df_Player_Final_Fixtures['Position']=='MID']
df_Player_Final_Fixtures_MID.head(15)

Unnamed: 0,Position,Player,GW23 npxG,GW23 xAG,GW24 npxG,GW24 xAG,GW25 npxG,GW25 xAG,GW26 npxG,GW26 xAG,GW27 npxG,GW27 xAG,Total npxG,Total xAG,Total npxG + xAG
5,MID,Martin Ødegaard,0.342671,0.382942,0.353093,0.394589,0.696138,0.777949,0.347375,0.388199,0.357871,0.399928,2.169432,2.271323,4.440755
9,MID,Mohamed Salah,0.478665,0.198945,0.441812,0.183628,0.934698,0.388484,0.458511,0.190568,0.475467,0.197616,2.237748,1.710646,3.948394
14,MID,Riyad Mahrez,0.548969,0.584298,0.286184,0.304602,0.301147,0.320528,0.287596,0.306104,0.309352,0.329261,2.0252,1.552841,3.578042
15,MID,Kevin De Bruyne,0.247308,0.884601,0.128924,0.461153,0.135665,0.485265,0.12956,0.463427,0.139362,0.498486,1.857651,1.7161,3.573752
20,MID,Bukayo Saka,0.226983,0.325098,0.233886,0.334985,0.461117,0.660439,0.230099,0.329561,0.237051,0.339519,1.582069,1.796668,3.378738
8,MID,Morgan Gibbs-White,0.250367,0.387058,0.236899,0.366237,0.225651,0.348847,0.255245,0.394599,0.232508,0.359448,1.466212,1.590647,3.056859
24,MID,Jack Grealish,0.394062,0.53538,0.205429,0.2791,0.21617,0.293693,0.206442,0.280477,0.22206,0.301695,1.630142,1.304367,2.934509
34,MID,Gabriel Martinelli Silva,0.216,0.226983,0.222569,0.233886,0.438805,0.461117,0.218965,0.230099,0.225581,0.237051,1.338243,1.372813,2.711056
7,MID,Bruno Borges Fernandes,0.201286,0.476769,0.200382,0.474626,0.0,0.0,0.187926,0.445123,0.191752,0.454187,1.353062,1.278988,2.63205
22,MID,James Maddison,0.246166,0.221295,0.272191,0.244689,0.24041,0.21612,0.290604,0.261242,0.249359,0.224165,1.224751,1.24149,2.466242


### Defender Players 

In [139]:
df_Player_Final_Fixtures_DEF=df_Player_Final_Fixtures.loc[df_Player_Final_Fixtures['Position']=='DEF']
df_Player_Final_Fixtures_DEF.head(15)

Unnamed: 0,Position,Player,GW23 npxG,GW23 xAG,GW24 npxG,GW24 xAG,GW25 npxG,GW25 xAG,GW26 npxG,GW26 xAG,GW27 npxG,GW27 xAG,Total npxG,Total xAG,Total npxG + xAG
43,DEF,Trent Alexander-Arnold,0.046371,0.344788,0.042801,0.318243,0.090549,0.673274,0.044418,0.330271,0.046061,0.342485,0.842751,1.436509,2.27926
62,DEF,Andrew Robertson,0.093489,0.245316,0.086291,0.226429,0.182558,0.479033,0.089553,0.234987,0.092865,0.243677,0.834083,1.140114,1.974197
44,DEF,Jan Paul van Hecke,0.38388,0.0,0.378788,0.0,0.0,0.0,0.352108,0.0,0.74779,0.0,0.762668,1.099898,1.862566
30,DEF,Kieran Trippier,0.010579,0.425969,0.011099,0.446911,0.0,0.0,0.009533,0.383846,0.010514,0.423367,0.894558,0.82726,1.721818
60,DEF,Benoît Badiashile,0.13587,0.167839,0.137581,0.169954,0.139999,0.17294,0.137575,0.169945,0.141328,0.174582,0.751243,0.796369,1.547612
122,DEF,James Bree,0.0,0.190286,0.0,0.196888,0.0,0.203905,0.0,0.203007,0.0,0.381513,0.387174,0.788425,1.175599
109,DEF,Ivan Perišić,0.083093,0.130381,0.084864,0.13316,0.087799,0.137766,0.085801,0.134631,0.09188,0.14417,0.519296,0.594249,1.113544
132,DEF,Ben Godfrey,0.131721,0.042408,0.140531,0.045244,0.257893,0.083029,0.129892,0.041819,0.140293,0.045168,0.617797,0.440201,1.057998
108,DEF,Jordan Zemura,0.048744,0.16204,0.04948,0.164487,0.047835,0.159021,0.044419,0.147664,0.050725,0.168626,0.472587,0.570454,1.043041
86,DEF,Fabian Schär,0.181954,0.081103,0.190899,0.085091,0.0,0.0,0.163961,0.073083,0.180842,0.080608,0.539047,0.498494,1.037541
