# Visualizing FF 2020 Data -- PWL

https://www.pro-football-reference.com/years/2020/fantasy.htm

https://beautiful-soup-4.readthedocs.io/en/latest/index.html

This notebook will explore 2020 fantasy football data and explore how to pull data from Pro Football Reference. We will examine this through the lens of the PWL scoring structure. This is purely positional and focuses on QB, RB, WR, and TE.

After exploring overall league trends with traditional PFR fantasy value rankings, we will recreate the scoring system our league followed and compare differences. This will hope to inform future decisions about scoring in our league and personal drafting strategy.

We can also go through data since 2010 and compare groups of years, i.e. 2010 vs 2011, 2011 vs 2012, to create a rolling visualization of how scoring and positional importance fluctuates over the years.

In [45]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import requests
from bs4 import BeautifulSoup
pd.set_option('display.max_columns', None)

In [146]:
# define URL and year to get data
url = 'https://www.pro-football-reference.com/'
year = 2020

# get webpage
r = requests.get(url + '/years/' + str(year) + '/fantasy.htm')
# parse HTML
soup = BeautifulSoup(r.content, 'html.parser')
# find first table with all player data
parsed = soup.find_all('table')[0]

In [149]:
# example of first player in table, Derrick Henry
# first 2 indices are headers
parsed.find_all('tr')[2]

<tr><th class="right" csk="1" data-stat="ranker" scope="row">1</th><td class="left" csk="Henry,Derrick" data-append-csv="HenrDe00" data-stat="player"><a href="/players/H/HenrDe00.htm">Derrick Henry </a>*+</td><td class="left" data-stat="team"><a href="/teams/oti/2020.htm" title="Tennessee Titans">TEN</a></td><td class="right" csk="20" data-stat="fantasy_pos">RB</td><td class="right" data-stat="age">26</td><td class="right" data-stat="g">16</td><td class="right" data-stat="gs">16</td><td class="right iz" data-stat="pass_cmp">0</td><td class="right iz" data-stat="pass_att">0</td><td class="right iz" data-stat="pass_yds">0</td><td class="right iz" data-stat="pass_td">0</td><td class="right iz" data-stat="pass_int">0</td><td class="right" data-stat="rush_att">378</td><td class="right" data-stat="rush_yds">2027</td><td class="right" data-stat="rush_yds_per_att">5.36</td><td class="right" data-stat="rush_td">17</td><td class="right" data-stat="targets">31</td><td class="right" data-stat="rec

In [154]:
for i, row in enumerate(parsed.find_all('tr')):
    # try-except is to skip the rows that are not players
    try:
        # find players by row
        # gets only the attibute relating to "player" from above excerpt
        dat = row.find('td', attrs={'data-stat': 'player'})
        # get player name
        # gets the text within the <a> tags
        name = dat.a.get_text()
        # get link to player page
        # href is defined by the <a href> tag
        stub = dat.a.get('href')
        
        #print(dat)
        #print(name)
        #print(stub)
    except:
        pass

Awesome! Up to this point, we are able to extract the `.htm` for each player and their names, accessed by `print(name)` and `print(stub)` within the loop. This is a neat introduction to how BeautifulSoup functions and the way we can extract HTML data. Now, we will save these figures to a dataframe and export summaries to a .csv file for ease of use.

In [133]:
df = []
maxp = 10

for i, row in enumerate(parsed.find_all('tr')):
    if i % 10 == 0: print(i, end=' ')
    if i >= maxp:
        print('\nComplete.')
        break
    
    try:
        # find players by row
        dat = row.find('td', attrs={'data-stat': 'player'})
        # get fantasy position of player
        fpos = row.find('td', attrs={'data-stat': 'fantasy_pos'}).get_text()
        # get name of player
        name = dat.a.get_text()
        # get URL stub to attach to PFR homepage
        stub = dat.a.get('href')
        stub = stub[:-4] + '/gamelog/' + str(year)
        print(stub)
        
        # get fantasy data for this player
        temp_df = pd.read_html(url + stub)[0]
        # fix column names to drill down
        # this joins the MultiIndex column labels by delimiter
        # allows us to retain info about rush yds, pass yds
        temp_df.columns = temp_df.columns.map('_'.join)
        newcol = []
        for colname in temp_df.columns:
            # clean up empty column df names from PFR
            if 'Unnamed' in colname:
                # rpartition will split the strings by the given delimiter
                # ex: with character '_', then "this_variable" --> ['this','_','variable']
                # use of [-1]: takes final cut of string
                newcol.append(colname.rpartition('_')[-1])
            else:
                newcol.append(colname)
        temp_df.columns = newcol
        temp_df = temp_df.drop('Rk', axis=1)
        
        # fix Away indicator variable
        temp_df = temp_df.rename(columns={'1': 'Away'})
        temp_df['Away'] = [1 if r=='@' else 0 for r in temp_df['Away']]
        
        # add supplemental info
        temp_df['Name'] = name
        temp_df['Position'] = fpos
        temp_df['Season'] = year
        
        df.append(temp_df)
    except:
        pass
    
df = pd.concat(df).reset_index(drop=True)

0 /players/H/HenrDe00/gamelog/2020
/players/K/KamaAl00/gamelog/2020
/players/C/CookDa01/gamelog/2020
/players/K/KelcTr00/gamelog/2020
/players/A/AdamDa01/gamelog/2020
/players/H/HillTy00/gamelog/2020
/players/A/AlleJo02/gamelog/2020
/players/R/RodgAa00/gamelog/2020
10 
Complete.


In [156]:
df.tail(17)

Unnamed: 0,Date,G#,Week,Age,Tm,Away,Opp,Result,GS,Rushing_Att,Rushing_Yds,Rushing_Y/A,Rushing_TD,Receiving_Tgt,Receiving_Rec,Receiving_Yds,Receiving_Y/R,Receiving_TD,Receiving_Ctch%,Receiving_Y/Tgt,Scoring_2PM,Scoring_TD,Scoring_Pts,Fumbles_Fmb,Fumbles_FL,Fumbles_FF,Fumbles_FR,Fumbles_Yds,Fumbles_TD,Off. Snaps_Num,Off. Snaps_Pct,Def. Snaps_Num,Def. Snaps_Pct,ST Snaps_Num,ST Snaps_Pct,Name,Position,Season,Kick Returns_Rt,Kick Returns_Yds,Kick Returns_Y/Rt,Kick Returns_TD,Punt Returns_Ret,Punt Returns_Yds,Punt Returns_Y/R,Punt Returns_TD,Passing_Cmp,Passing_Att,Passing_Cmp%,Passing_Yds,Passing_TD,Passing_Int,Passing_Rate,Passing_Sk,Passing_Yds.1,Passing_Y/A,Passing_AY/A
119,2020-09-13,1.0,1.0,36.286,GNB,1,MIN,W 43-34,*,1,2,2.0,0,0,0,0,,0,0.0%,,,0,0,0,0,0,0,0,0,76.0,97%,0.0,0%,0.0,0%,Aaron Rodgers,QB,2020,,,,,,,,,32,44,72.73,364,4,0,127.5,0,0,8.27,10.09
120,2020-09-20,2.0,2.0,36.293,GNB,0,DET,W 42-21,*,2,12,6.0,0,1,1,-6,-6.0,0,100.0%,-6.0,,0,0,0,0,0,0,0,0,68.0,93%,0.0,0%,0.0,0%,Aaron Rodgers,QB,2020,,,,,,,,,18,30,60.0,240,2,0,107.6,1,11,8.0,9.33
121,2020-09-27,3.0,3.0,36.3,GNB,1,NOR,W 37-30,*,2,12,6.0,0,0,0,0,,0,0.0%,,,0,0,0,0,0,0,0,0,62.0,100%,0.0,0%,0.0,0%,Aaron Rodgers,QB,2020,,,,,,,,,21,32,65.63,283,3,0,124.9,1,12,8.84,10.72
122,2020-10-05,4.0,4.0,36.308,GNB,0,ATL,W 30-16,*,1,5,5.0,0,0,0,0,,0,0.0%,,,0,0,0,0,0,0,0,0,62.0,98%,0.0,0%,0.0,0%,Aaron Rodgers,QB,2020,,,,,,,,,27,33,81.82,327,4,0,147.5,1,12,9.91,12.33
123,2020-10-18,5.0,6.0,36.321,GNB,1,TAM,L 10-38,*,2,14,7.0,0,0,0,0,,0,0.0%,,,0,0,0,0,0,0,0,0,60.0,95%,0.0,0%,0.0,0%,Aaron Rodgers,QB,2020,,,,,,,,,16,35,45.71,160,0,2,35.4,4,42,4.57,2.0
124,2020-10-25,6.0,7.0,36.328,GNB,1,HOU,W 35-20,*,0,0,,0,0,0,0,,0,0.0%,,,0,0,0,0,0,0,0,0,60.0,97%,0.0,0%,0.0,0%,Aaron Rodgers,QB,2020,,,,,,,,,23,34,67.65,283,4,0,132.4,0,0,8.32,10.68
125,2020-11-01,7.0,8.0,36.335,GNB,0,MIN,L 22-28,*,2,9,4.5,0,0,0,0,,0,0.0%,,,0,0,1,1,0,0,0,0,74.0,100%,0.0,0%,0.0,0%,Aaron Rodgers,QB,2020,,,,,,,,,27,41,65.85,291,3,0,110.9,1,0,7.1,8.56
126,2020-11-05,8.0,9.0,36.339,GNB,1,SFO,W 34-17,*,1,7,7.0,0,0,0,0,,0,0.0%,,,0,0,0,0,0,0,0,0,62.0,94%,0.0,0%,0.0,0%,Aaron Rodgers,QB,2020,,,,,,,,,25,31,80.65,305,4,0,147.2,1,11,9.84,12.42
127,2020-11-15,9.0,10.0,36.349,GNB,0,JAX,W 24-20,*,3,4,1.33,1,0,0,0,,0,0.0%,,,1,6,0,0,0,0,0,0,65.0,100%,0.0,0%,0.0,0%,Aaron Rodgers,QB,2020,,,,,,,,,24,34,70.59,325,2,1,108.1,1,10,9.56,9.41
128,2020-11-22,10.0,11.0,36.356,GNB,1,IND,L 31-34,*,3,13,4.33,0,0,0,0,,0,0.0%,,,0,0,1,1,0,0,-1,0,60.0,100%,0.0,0%,0.0,0%,Aaron Rodgers,QB,2020,,,,,,,,,27,38,71.05,311,3,1,110.7,1,10,8.18,8.58
