# Fantasy Premier League 2021/2022 Season Analysis:

**This notebook covers how to access the Fantasy Premier League API, build a dataframe, and analyze  the data using Jupyter, Python, Pandas, and Matplotlib.**

**About the game**: Each FPL manager is given a starting budget of £100 million and must pick a total of 15 players: two goalkeepers, five defenders, five midfielders and three forwards. You are limited to a maximum of three players from each Premier League team. Players in your team score points based on real-life performances with the biggest contributors being goals, assists and cleansheets.

**Analysis Limits**: The data used in this notebook includes all season totals at the end of Gameweek 38 of the 2021/2022 season. This is more of a season overview analysis than a week-to-week analysis, therefore, form and fixtures were not accounted for. The prices used are provided by the FPL API and have already changed to 2022/2023 prices. I have created a 'price range' category to make the analysis more accurate as price changes generally happen within their price range.

## 1. Data Loading

First, we import the necessary libraries that we will use throughout the analysis. We then use the requests library retrieve data via the FPL API endpoint. (https://fantasy.premierleague.com/api/bootstrap-static/). The next step is to convert the response data into a json object and then again into a pandas Dataframe. 


In [1]:
import requests
import pandas as pd
import numpy as np

In [2]:
#Get data and convert to json

url = 'https://fantasy.premierleague.com/api/bootstrap-static/'
r = requests.get(url)
fpl_json = r.json()

In [3]:
#View dict keys
fpl_json.keys()

dict_keys(['events', 'game_settings', 'phases', 'teams', 'total_players', 'elements', 'element_stats', 'element_types'])

In [4]:
#Convert json objects to dataframe. 
elements_df = pd.DataFrame(fpl_json['elements'])
elements_types_df = pd.DataFrame(fpl_json['element_types'])
element_stats_df = pd.DataFrame(fpl_json['element_stats'])
teams_df = pd.DataFrame(fpl_json['teams'])

#Save as csv
elements_df.to_csv('data/21_22_season_data/elements.csv')
elements_types_df.to_csv('data/21_22_season_data/element_types.csv')
element_stats_df.to_csv('data/21_22_season_data/element_stats.csv')
teams_df.to_csv('data/21_22_season_data/teams.csv')

## 2. Data Cleaning

The next step involves cleaing the data

### Understand data

In [5]:
elements_df.head()

Unnamed: 0,chance_of_playing_next_round,chance_of_playing_this_round,code,cost_change_event,cost_change_event_fall,cost_change_start,cost_change_start_fall,dreamteam_count,element_type,ep_next,...,threat_rank,threat_rank_type,ict_index_rank,ict_index_rank_type,corners_and_indirect_freekicks_order,corners_and_indirect_freekicks_text,direct_freekicks_order,direct_freekicks_text,penalties_order,penalties_text
0,,,58822,0,0,0,0,0,2,2.3,...,238,80,198,63,2.0,,3.0,,,
1,,,80201,0,0,0,0,0,1,2.7,...,396,27,345,24,,,,,,
2,,,84450,0,0,0,0,0,3,2.0,...,147,84,101,62,,,,,,
3,,,153256,0,0,0,0,0,3,1.5,...,290,144,285,129,,,,,,
4,,,156074,0,0,0,0,0,2,2.3,...,291,112,294,107,,,,,,


In [6]:
elements_df.shape

(528, 67)

In [7]:
elements_df.columns

Index(['chance_of_playing_next_round', 'chance_of_playing_this_round', 'code',
       'cost_change_event', 'cost_change_event_fall', 'cost_change_start',
       'cost_change_start_fall', 'dreamteam_count', 'element_type', 'ep_next',
       'ep_this', 'event_points', 'first_name', 'form', 'id', 'in_dreamteam',
       'news', 'news_added', 'now_cost', 'photo', 'points_per_game',
       'second_name', 'selected_by_percent', 'special', 'squad_number',
       'status', 'team', 'team_code', 'total_points', 'transfers_in',
       'transfers_in_event', 'transfers_out', 'transfers_out_event',
       'value_form', 'value_season', 'web_name', 'minutes', 'goals_scored',
       'assists', 'clean_sheets', 'goals_conceded', 'own_goals',
       'penalties_saved', 'penalties_missed', 'yellow_cards', 'red_cards',
       'saves', 'bonus', 'bps', 'influence', 'creativity', 'threat',
       'ict_index', 'influence_rank', 'influence_rank_type', 'creativity_rank',
       'creativity_rank_type', 'threat_rank'

### Data Tansformation

Get only the interesting columns for analysis

In [8]:
cols = ['element_type', 'first_name', 'now_cost', 'points_per_game',
       'second_name', 'selected_by_percent', 'team', 'total_points',
        'value_season', 'web_name', 'minutes', 'goals_scored',
       'assists', 'clean_sheets', 'goals_conceded',
       'saves', 'bonus', 'bps', 'influence', 'creativity', 'threat',
       'ict_index', 'influence_rank', 'creativity_rank', 'threat_rank', 
        'ict_index_rank']

df = elements_df[cols]

Check data type

In [9]:
df.dtypes

element_type            int64
first_name             object
now_cost                int64
points_per_game        object
second_name            object
selected_by_percent    object
team                    int64
total_points            int64
value_season           object
web_name               object
minutes                 int64
goals_scored            int64
assists                 int64
clean_sheets            int64
goals_conceded          int64
saves                   int64
bonus                   int64
bps                     int64
influence              object
creativity             object
threat                 object
ict_index              object
influence_rank          int64
creativity_rank         int64
threat_rank             int64
ict_index_rank          int64
dtype: object

In [10]:
cols = ['points_per_game', 'selected_by_percent', 'value_season', 
        'influence', 'creativity', 'threat', 'ict_index']

df[cols] = df.loc[:, tuple(cols)].astype('float32')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[k1] = value[k2]


In [11]:
# Add position attrribute
pos_dict = dict(zip(elements_types_df['id'], elements_types_df['singular_name']))
df.loc[:, 'position'] = df.loc[:,'element_type'].apply(lambda x: pos_dict[x])

#Map team to team name
team_codes =list(df['team'].unique())
team_names = list(teams_df['short_name'].unique())
teams_dict = dict(zip(team_codes, team_names))

df.loc[:, 'team'] = df.loc[:,'team'].apply(lambda x: teams_dict[x])

#Create cost column
df.loc[:, 'price'] = df.loc[:,'now_cost'] / 10

#Create G+A column

df.loc[:, 'G+A'] = df.loc[:,'goals_scored'] + df.loc[:,'assists']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(ilocs[0], value)


Drop uninteresting columns

In [12]:
df = df.drop(['element_type', 'now_cost'], axis=1)

Check for duplicated rows

In [13]:
df.duplicated().sum()

0

Discretization and Binning

In [14]:
df['position'].value_counts().index

Index(['Midfielder', 'Defender', 'Forward', 'Goalkeeper'], dtype='object')

In [15]:
#Create labels
labels = ['Budget', 'Mid', 'Premium']

#Create bins
gk_bins = [4.0, 5.0, 6.0, 7.0]
def_bins = [4.0, 5.0, 6.0, 8.0]
mid_bins = [4.0, 6.0, 9.0, 14.0]
for_bins = [4.0, 6.0, 9.0, 14.0]

#Create list of tuples
tup = [(df, gk_bins, 'Goalkeeper', labels),
       (df, def_bins, 'Defender', labels),
       (df, mid_bins, 'Midfielder', labels),
       (df, for_bins, 'Forward', labels)
       ]

#Discretinization function
def categorize_price(dataframe, bins, position, labels):
    grp = dataframe.groupby(['position']).get_group(position)
    grp_prices = grp['price']
    grp.loc[:,'price_range'] = pd.cut(grp_prices, bins, labels=labels, right=False)
    
    grp_lst.append(grp)
    return grp

#Create grp_list
grp_lst = list()

for item in tup:
    df = item[0]
    bins = item[1]
    pos = item[2]
    labels = item[3]
    categorize_price(df, bins, pos, labels)
    
frames = grp_lst
df = pd.concat(frames)


In [16]:
#Create labels
labels = ['Differential', 'High']

#Create bins
selected_by = [0.0, 15.0, 99.0]

#Create list of tuples
tup = (df, selected_by, labels)

#Discretinization function
def categorize_own(dataframe, bins, labels):
    ownership = dataframe['selected_by_percent']
    dataframe['ownership'] = pd.cut(ownership, bins, labels=labels, right=False)

    return dataframe

df = categorize_own(df, selected_by, labels)

### Handle missing data

In [17]:
df.isnull().sum()

first_name             0
points_per_game        0
second_name            0
selected_by_percent    0
team                   0
total_points           0
value_season           0
web_name               0
minutes                0
goals_scored           0
assists                0
clean_sheets           0
goals_conceded         0
saves                  0
bonus                  0
bps                    0
influence              0
creativity             0
threat                 0
ict_index              0
influence_rank         0
creativity_rank        0
threat_rank            0
ict_index_rank         0
position               0
price                  0
G+A                    0
price_range            0
ownership              0
dtype: int64

### Save clean data

In [18]:
df.to_csv('data/21_22_season_data/fpl_clean.csv', index=False)