# Position Players for the Green Bay Packers to Target in the 2019 NFL Draft
###### By Tommy Evans-Barton, Wannabe Data Scientist and Diehard Packers Fan

## Introduction

This year, the Green Bay Packers are in the alien position of having a new head coach, Matt LaFleur, and with him, an entirely new offensive scheme predicated on the "illusion of complexity" and the zone running scheme. While we could go on and on analyzing this scheme, a lot of people have frankly already done this (I particularly like __[this](http://www.packtothefuture.com/articles/xs-and-os-lafleurs-pass-game-in-week-2/)__ series of breakdowns by *Pack to the Future's* Ben Clubb), and maybe I will too in the future. However, for now, I wanted to try to do something a little bit different.


It's no secret that there have been some issues with the skill positions (Wide Receivers, Tight Ends, and Running Backs) in the past for the Packers. For Wide Receivers, it's a case of injuries and inexperience, with rookies playing heavy snap percentages on offense (Data courtesy of __[Football Outsiders](https://www.footballoutsiders.com/stats/snapcounts)__):

| Rookie | Offensive Snap Percentage|
|---|---|
| M. Valdes-Scantling | 64.4% |
| E. St. Brown | 33.3% |
| J. Moore | 6.9% |
| J. Kumerow | 12.7% |


due to injuries to Randall Cobb and Geronimo Allison. For the Tight End room, Jimmy Graham and Marcedes Lewis have been hit hard by father time, and Lance Kendricks, in my dad's words, is J.A.G (Just A Guy). Running backs have also felt the sting of injury, with Aaron Jones having the second knee injury of his NFL career in 2018, but unlike in the case of the other two positions, their borderline *criminal* misusage last year makes me hold off judgement, at least for another season.


All of this is to say that the skill players could use an upgrade, or at the very least some reenforcements. But who will fit this new coach's schemes? Who can answer the call? That's what we hope to find out, so don't touch that dial.


In terms of success in this scheme, no two coaches have shown more promise than Los Angeles's Sean McVay and San Francisco's Kyle Shanahan, coincidentally Matt LaFleur's two mentors. It seems reasonable, therefore, that with the relative success of these two schemes (yes I know the 49ers went 4-12, but considering they were starting Nick Mullens, I'll cut Shanahan some slack) LaFleur may try to acquire similar skill players for our team too.


In this analysis, we're going to go in depth on the position players that have recently fit this new age scheme, and how analyzing their attributes, testing, and college statistics may lead us to find similar players and contributors in this year's draft. We'll be looking Ready? Let's do this.

In [16]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [17]:
import SkillPlayerFunctions as spf

In [18]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import glob
from scipy.stats import linregress
import os

***

## A Note on Coding, and Some Terms to Know

I know a lot of people who may be reading this just looked at the cell above and said "Uh-uh this ain't it chief," or "Well that's the nerdiest thing I've ever seen" and to be honest you guys are spot on with that second one. I'm doing this analysis in the only way I know how, with Python, but I know a lot of people reading this won't have a very strong background in coding or statistics, and may find a lot of the code hard to follow, or even intimidating. But I want this breakdown to be able to cater to everyone: the stats nerds (like me), the arm-chair GM's (also me), and the casual fans who tune in every other week. That means I'll be doing my best to explain:

1. Why I decided to do what I did when it comes to the data
2. What each method that I wrote is aiming to do
3. How the code aims to do this (where an explanation is easy)
4. What the statistical conclusions mean

and anything else that may come up along the way.

There are, however, some terms I may use casually that you might want to know, such as:

- **Cell** - what you're reading this in right now! It can either be filled with text, like this one, or with code, like the one above.

- **Function** - essentially a section of code that will give you a value or values at the end. What this value is (a number vs. a group of numbers vs. a word) depends on the function, and can really be almost anything! 

- **Null/NaN(Not a Number)** - this means exactly what you think it means: Nothing. This happens in a set of data when there's just nothing put in. It can either happen on purpose (i.e. if a receiver didn't play any games, their receiving yards might be `Null/NaN`, with no value entered for them) or on accident (i.e. someone forgot to put a piece of data in). The second of these is unlikely in our dataset, but just keep in mind that it CAN happen.

- **DataFrame** - whenever I say this, you can pretty much replace it with the word 'table'. They're basically the same thing, DataFrame is just the coding word for this specific type of table.



So try not to get too worried about the code, or the tables, and just get as deep into it as you want. Any feedback is more than welcome, so hit me up with some comments, etc. And for those of you who DO find code interesting, the python file is in the directory. So, without further ado, let's do some cool ass football data shit!

***

## The Data
Our data is going to come from a mix of places, but mostly __[Pro Football Reference](https://www.pro-football-reference.com/)__, so a huge shoutout to them for having a good source to draw from. What we're going to do first is look at the 49ers' and Rams' rosters. I've loaded them into tables below, and printed out the first few lines.

In [198]:
sf_full_roster = pd.read_csv('data/49ersRoster.csv') #This is the complete San Francisco 49ers 2018 Roster
rams_full_roster = pd.read_csv('data/RamsRoster.csv') #This is the complete Los Angeles Rams 2018 Roster

In [199]:
sf_full_roster.head()

Unnamed: 0,No.,Player,Age,Pos,G,GS,Wt,Ht,College/Univ,BirthDate,Yrs,AV,Drafted (tm/rnd/yr),Salary
0,4.0,Nick Mullens\MullNi00,23.0,QB,8,8.0,210.0,6-1,Southern Miss,3/21/1995,Rook,6.0,,$129200
1,10.0,Jimmy Garoppolo\GaroJi00,27.0,qb,3,3.0,225.0,6-2,East. Illinois,11/2/1991,4,2.0,New England Patriots / 2nd / 62nd pick / 2014,$6200000
2,3.0,C.J. Beathard\BeatC.00,25.0,qb,6,5.0,215.0,6-2,Iowa,11/16/1993,1,3.0,San Francisco 49ers / 3rd / 104th pick / 2017,$625393
3,41.0,Jeff Wilson\WilsJe01,23.0,rb,6,2.0,194.0,6-0,,11/16/1995,Rook,2.0,,$129200
4,36.0,Alfred Morris\MorrAl00,30.0,rb,12,1.0,224.0,5-10,Florida Atlantic,12/12/1988,6,3.0,Washington Redskins / 6th / 173rd pick / 2012,$790000


In [200]:
rams_full_roster.head()

Unnamed: 0,No.,Player,Age,Pos,G,GS,Wt,Ht,College/Univ,BirthDate,Yrs,AV,Drafted (tm/rnd/yr),Salary
0,8.0,Brandon Allen\AlleBr00,26.0,,1,0.0,209.0,6-2,Arkansas,9/5/1992,Rook,0.0,Jacksonville Jaguars / 6th / 201st pick / 2016,$630000
1,55.0,Brian Allen\AlleBr02,23.0,,12,0.0,303.0,6-2,Michigan St.,10/11/1995,Rook,1.0,Los Angeles Rams / 4th / 111th pick / 2018,$480000
2,35.0,C.J. Anderson\AndeC.00,27.0,rb,2,2.0,225.0,5-8,California,2/10/1991,5,2.0,,$92941
3,26.0,Mark Barron\BarrMa00,29.0,LB,12,12.0,230.0,6-2,Alabama,10/27/1989,6,5.0,Tampa Bay Buccaneers / 1st / 7th pick / 2012,$6499999
4,66.0,Austin Blythe\BlytAu00,26.0,RG,16,16.0,298.0,6-3,Iowa,6/16/1992,2,10.0,Indianapolis Colts / 7th / 248th pick / 2016,$630000


***

### Data Cleaning

Unfortunately, our data is neither complete nor perfect, and there's a lot of work to be done on it. To be honest, unless you're extremely interested in the nitty-gritty of data science, I recommend you skip through this part, because it's a little annoying, even more boring, and overall doesn't need to be fully read through to understand the analysis at the end.

As we said before, we only care about the skill positions, but this data has every player, so we're going to filter out the players we don't care about with the function below, as well as get rid of entries that don't have a position listed (`NaN` entries) and clean the position data so that it's uniformly formatted (in this case this just means that all the positions are upper case. It'll just make working with the data easier later).

***

As we said before, we only care about the skill positions, but this data has every player, so we're going to filter out the players we don't care about with the function below, as well as get rid of entries that don't have a position listed (`NaN` entries) and clean the position data so that it's uniformly formatted (in this case this just means that all the positions are upper case. It'll just make working with the data easier later).

In [201]:
sf_skill_positions = spf.clean_position(sf_full_roster)
rams_skill_positions = spf.clean_position(rams_full_roster)
#See the attached .py file for the code that does this

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  df['Pos'] = [str.upper(x) for x in df['Pos']]


In [202]:
sf_skill_positions

Unnamed: 0,No.,Player,Age,Pos,G,GS,Wt,Ht,College/Univ,BirthDate,Yrs,AV,Drafted (tm/rnd/yr),Salary
0,41.0,Jeff Wilson\WilsJe01,23.0,RB,6,2.0,194.0,6-0,,11/16/1995,Rook,2.0,,$129200
1,36.0,Alfred Morris\MorrAl00,30.0,RB,12,1.0,224.0,5-10,Florida Atlantic,12/12/1988,6,3.0,Washington Redskins / 6th / 173rd pick / 2012,$790000
2,22.0,Matt Breida\BreiMa00,23.0,RB,14,13.0,190.0,5-10,Georgia Southern,2/28/1995,1,6.0,,$555000
3,18.0,Dante Pettis\PettDa00,23.0,WR,12,7.0,195.0,6-1,Washington,10/23/1995,Rook,3.0,San Francisco 49ers / 2nd / 44th pick / 2018,$480000
4,13.0,Richie James\JameRi00,23.0,WR,13,2.0,185.0,5-9,Middle Tenn. St.,9/5/1995,Rook,2.0,San Francisco 49ers / 7th / 240th pick / 2018,$480000
5,11.0,Marquise Goodwin\GoodMa00,28.0,WR,11,8.0,180.0,5-9,Texas,11/19/1990,5,3.0,Buffalo Bills / 3rd / 78th pick / 2013,$1450000
6,15.0,Pierre Garcon\GarcPi00,32.0,WR,8,8.0,211.0,6-0,Mount Union,8/8/1986,10,2.0,Indianapolis Colts / 6th / 205th pick / 2008,$6625000
7,84.0,Kendrick Bourne\BourKe00,23.0,WR,16,8.0,203.0,6-1,East. Washington,8/4/1995,1,4.0,,$555000
8,85.0,George Kittle*\KittGe00,25.0,TE,16,16.0,250.0,6-4,Iowa,10/9/1993,1,10.0,San Francisco 49ers / 5th / 146th pick / 2017,$555000
9,88.0,Garrett Celek\CeleGa00,30.0,TE,15,1.0,252.0,6-5,Michigan St.,5/29/1988,6,1.0,,$1650000


In [203]:
rams_skill_positions

Unnamed: 0,No.,Player,Age,Pos,G,GS,Wt,Ht,College/Univ,BirthDate,Yrs,AV,Drafted (tm/rnd/yr),Salary
0,35.0,C.J. Anderson\AndeC.00,27.0,RB,2,2.0,225.0,5-8,California,2/10/1991,5,2.0,,$92941
1,12.0,Brandin Cooks\CookBr00,25.0,WR,16,16.0,183.0,5-10,Oregon St.,9/25/1993,4,13.0,New Orleans Saints / 1st / 20th pick / 2014,$4000000
2,30.0,Todd Gurley*+\GurlTo01,24.0,RB,14,14.0,224.0,6-1,Georgia,8/3/1994,3,16.0,St. Louis Rams / 1st / 10th pick / 2015,$950000
3,89.0,Tyler Higbee\HigbTy00,25.0,TE,16,16.0,255.0,6-6,Western Kentucky,1/1/1993,2,3.0,Los Angeles Rams / 4th / 110th pick / 2016,$630000
4,18.0,Cooper Kupp\KuppCo00,25.0,WR,8,8.0,208.0,6-2,East. Washington,6/15/1993,1,6.0,Los Angeles Rams / 3rd / 69th pick / 2017,$640000
5,83.0,Josh Reynolds\ReynJo00,23.0,WR,16,8.0,196.0,6-3,Texas A&M,2/16/1995,1,4.0,Los Angeles Rams / 4th / 117th pick / 2017,$555000
6,17.0,Robert Woods\WoodRo02,26.0,WR,16,16.0,195.0,6-0,USC,4/10/1992,5,14.0,Buffalo Bills / 2nd / 41st pick / 2013,$790000


So now we've cleaned this data a little bit, but we've still got a few problems with it:


1. We have columns we don't care about. We care about the player, their traits, their stats, their college and when they were drafted (because this will help us find some more information on them later), and that's about it. We should get rid of all the other extra stuff.

2. If you look at the player entries, there's a whole bunch of nonsense after they're names because of the way the data was received. We wanna get rid of this, just to make it look prettier.

To do the first one, we'll write a quick function to get rid of the columns we don't want and run each DataFrame through it:

In [204]:
sf_skill_positions = spf.drop_unnecessary_columns_rosters(sf_skill_positions)
rams_skill_positions = spf.drop_unnecessary_columns_rosters(rams_skill_positions)

And similarly with the second issue, we'll have to write a (slightly more complicated) function, but still just a function:

In [210]:
sf_skill_positions = spf.clean_player_column(sf_skill_positions)
rams_skill_positions = spf.clean_player_column(rams_skill_positions)

So we've made some progress, but we probably want some stats to go along with these players, or else this is all just useless information. So we're going to have to go get more data, and for our purposes we probably want:

- Their NFL Statistics, while in this system, to see how successful they've been
- Their College Statistics, to compare their production to the people in this year's draft
- Their Combine Data, to compare athletic profiles

Time to go hunting. First the NFL stats. We'll be using 2017 and 2018 because these are the years that McVay and Shanahan were head coaches for their respective teams.

In [265]:
rams_stats_2018 = pd.read_csv('data/RamsRecAndRush2018.csv', skiprows = [0])
rams_stats_2017 = pd.read_csv('data/RamsRecAndRush2017.csv', skiprows = [0])
sf_stats_2018 = pd.read_csv('data/49ersRecAndRush2018.csv', skiprows = [0])
sf_stats_2017 = pd.read_csv('data/49ersRecAndRush2017.csv', skiprows = [0])

In [266]:
rams_stats_2018.head()

Unnamed: 0,No.,Player,Age,Pos,G,GS,Att,Yds,TD,Lng,...,TD.1,Lng.1,R/G,Y/G.1,Ctch%,Touch,Y/Tch,YScm,RRTD,Fmb
0,30.0,Todd Gurley*+\GurlTo01,24.0,RB,14,14.0,256,1251,17,36.0,...,4.0,56.0,4.2,41.4,72.8%,315.0,5.8,1831,21,1
1,34.0,Malcolm Brown\BrowMa03,25.0,,12,0.0,43,212,0,19.0,...,1.0,18.0,0.4,4.3,71.4%,48.0,5.5,264,1,0
2,35.0,C.J. Anderson\AndeC.00,27.0,rb,2,2.0,43,299,2,46.0,...,0.0,13.0,2.0,8.5,66.7%,47.0,6.7,316,2,0
3,16.0,Jared Goff*\GoffJa00,24.0,QB,16,16.0,43,108,2,16.0,...,,,,,,43.0,2.5,108,2,12
4,42.0,John Kelly\KellJo00,22.0,,4,0.0,27,74,0,7.0,...,0.0,18.0,0.5,6.8,66.7%,29.0,3.5,101,0,0


The data loaded in above is the stats for the Rams and 49ers for 2018 and 2017, but it needs to get worked on a little bit before we can combine it with our other data.

First, the column names don't exactly make it clear as to what the values are, so we'll rename them by looking at the original data.

In [267]:
rams_stats_2018 = spf.fix_column_names(rams_stats_2018)
rams_stats_2017 = spf.fix_column_names(rams_stats_2017)
sf_stats_2018 = spf.fix_column_names(sf_stats_2018)
sf_stats_2017 = spf.fix_column_names(sf_stats_2017)

In [268]:
rams_stats_2018.head()

Unnamed: 0,No.,Player,Age,Pos,G,GS,Rush Att,Rush Yds,Rush TD,Rush Lng,...,Receiving TD,Receiving Lng,R/G,Receiving Y/G,Ctch%,Touch,Y/Tch,YScm,RRTD,Fmb
0,30.0,Todd Gurley*+\GurlTo01,24.0,RB,14,14.0,256,1251,17,36.0,...,4.0,56.0,4.2,41.4,72.8%,315.0,5.8,1831,21,1
1,34.0,Malcolm Brown\BrowMa03,25.0,,12,0.0,43,212,0,19.0,...,1.0,18.0,0.4,4.3,71.4%,48.0,5.5,264,1,0
2,35.0,C.J. Anderson\AndeC.00,27.0,rb,2,2.0,43,299,2,46.0,...,0.0,13.0,2.0,8.5,66.7%,47.0,6.7,316,2,0
3,16.0,Jared Goff*\GoffJa00,24.0,QB,16,16.0,43,108,2,16.0,...,,,,,,43.0,2.5,108,2,12
4,42.0,John Kelly\KellJo00,22.0,,4,0.0,27,74,0,7.0,...,0.0,18.0,0.5,6.8,66.7%,29.0,3.5,101,0,0


We also have the same issue with the player names that we had before, but we can just use our old function for these data sets too!

In [319]:
rams_stats_2018 = spf.clean_player_column(rams_stats_2018)
rams_stats_2017 = spf.clean_player_column(rams_stats_2017)
sf_stats_2018 = spf.clean_player_column(sf_stats_2018)
sf_stats_2017 = spf.clean_player_column(sf_stats_2017)

In [320]:
rams_stats_2018.head()

Unnamed: 0,No.,Player,Age,Pos,G,GS,Rush Att,Rush Yds,Rush TD,Rush Lng,...,Receiving TD,Receiving Lng,R/G,Receiving Y/G,Ctch%,Touch,Y/Tch,YScm,RRTD,Fmb
0,30.0,Todd Gurley,24.0,RB,14,14.0,256,1251,17,36.0,...,4.0,56.0,4.2,41.4,72.8%,315.0,5.8,1831,21,1
1,34.0,Malcolm Brown,25.0,,12,0.0,43,212,0,19.0,...,1.0,18.0,0.4,4.3,71.4%,48.0,5.5,264,1,0
2,35.0,C.J. Anderson,27.0,rb,2,2.0,43,299,2,46.0,...,0.0,13.0,2.0,8.5,66.7%,47.0,6.7,316,2,0
3,16.0,Jared Goff,24.0,QB,16,16.0,43,108,2,16.0,...,,,,,,43.0,2.5,108,2,12
4,42.0,John Kelly,22.0,,4,0.0,27,74,0,7.0,...,0.0,18.0,0.5,6.8,66.7%,29.0,3.5,101,0,0


Now, since we have two years worth of stats, we want to find a good way to combine these two seasons. I have decided that it's best to go on a per game basis for attempts, receptions, and touchdowns, and a per reception/per attempt for yards, as I believe this will limit the affect that missing games will have, and with two years (a maximum of 32 games) as our sample size, this should mitigate the risk of having large or small outliers (looking at you Derrick Henry) greatly affect our data. Also, catch percentage will be updated to be included over a two year span.

Before combining, however, we can fill all null values in these stats DataFrames with zero, as after exploring the data, this seems to be what's called a `Null` that is `Missing by Design`, or that the `NaN` in this case just refers to 0 targets/receptions/etc.

In [450]:
rams_stats_2018 = rams_stats_2018.fillna(0)
rams_stats_2017 = rams_stats_2017.fillna(0)
sf_stats_2018 = sf_stats_2018.fillna(0)
sf_stats_2017 = sf_stats_2017.fillna(0)

In [451]:
rams_stats = spf.merge_stats_cols(rams_stats_2017, rams_stats_2018)
sf_stats = spf.merge_stats_cols(sf_stats_2017, sf_stats_2018)

In [452]:
rams_stats.head()

Unnamed: 0,Player,G,GS,Rush Att,Rush Yds,Rush TD,Tgt,Rec,Receiving Yds,Receiving TD,Touch,Catch Percentage
0,Todd Gurley,29.0,29.0,535.0,2556.0,30.0,168.0,123.0,1368.0,10.0,658.0,0.732143
1,Malcolm Brown,23.0,1.0,106.0,458.0,1.0,18.0,14.0,105.0,1.0,120.0,0.777778
2,Tavon Austin,16.0,9.0,59.0,270.0,1.0,22.0,13.0,47.0,0.0,72.0,0.590909
3,Jared Goff,31.0,31.0,71.0,159.0,3.0,0.0,0.0,0.0,0.0,71.0,
4,Lance Dunbar,4.0,0.0,11.0,51.0,1.0,3.0,1.0,1.0,0.0,12.0,0.333333


In [454]:
sf_stats.head()

Unnamed: 0,Player,G,GS,Rush Att,Rush Yds,Rush TD,Tgt,Rec,Receiving Yds,Receiving TD,Touch,Catch Percentage
0,Carlos Hyde,16.0,16.0,240.0,938.0,8.0,88.0,59.0,350.0,0.0,299.0,0.670455
1,Matt Breida,30.0,13.0,258.0,1279.0,5.0,67.0,48.0,441.0,3.0,306.0,0.716418
2,C.J. Beathard,13.0,10.0,45.0,205.0,4.0,0.0,0.0,0.0,0.0,45.0,
3,Jimmy Garoppolo,9.0,8.0,23.0,44.0,1.0,1.0,1.0,-6.0,0.0,24.0,1.0
4,Kyle Juszczyk,30.0,24.0,15.0,61.0,0.0,83.0,63.0,639.0,2.0,78.0,0.759036


Ok, so we've been able to combine each team's years together, and now the easy part, converting each statistic to a per game statistic.