In [1]:
## Importing packages

# This R environment comes with all of CRAN and many other helpful packages preinstalled.
# You can see which packages are installed by checking out the kaggle/rstats docker image: 
# https://github.com/kaggle/docker-rstats

library(tidyverse) # metapackage with lots of helpful functions
library(data.table)
library(stargazer)
## Running code

# In a notebook, you can run a single code cell by clicking in the cell and then hitting 
# the blue arrow to the left, or by clicking in the cell and pressing Shift+Enter. In a script, 
# you can run code by highlighting the code you want to run and then clicking the blue arrow
# at the bottom of this window.

## Reading in files

# You can access files from datasets you've added to this kernel in the "../input/" directory.
# You can see the files added to this kernel by running the code below. 

#list.files(path = "../input")

## Saving data

# If you save any files or images, these will be put in the "output" directory. You 
# can see the output directory by committing and running your kernel (using the 
# Commit & Run button) and then checking out the compiled version of your kernel.

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.2.1 ──

[32m✔[39m [34mggplot2[39m 3.2.1.[31m9000[39m     [32m✔[39m [34mpurrr  [39m 0.3.3     
[32m✔[39m [34mtibble [39m 2.1.3          [32m✔[39m [34mdplyr  [39m 0.8.3     
[32m✔[39m [34mtidyr  [39m 1.0.0          [32m✔[39m [34mstringr[39m 1.4.0     
[32m✔[39m [34mreadr  [39m 1.3.1          [32m✔[39m [34mforcats[39m 0.4.0     

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()


Attaching package: ‘data.table’


The following objects are masked from ‘package:dplyr’:

    between, first, last


The following object is masked from ‘package:purrr’:

    transpose



Please cite as: 


 Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.

 R pac

In [48]:
list.files(path = "../input/nfl-playing-surface-analytics")
di <- fread("../input/nfl-playing-surface-analytics/InjuryRecord.csv")
dpt <- fread("../input/nfl-playing-surface-analytics/PlayerTrackData.csv")
dpl <- fread("../input/nfl-playing-surface-analytics/PlayList.csv")

## Data prep. Create new severity column as sum of the rest

In [49]:
di[,Surface:=as.factor(Surface)]
di[,BodyPart:=as.factor(BodyPart)]
dpl[,StadiumType:=as.factor(StadiumType)]
dpl[,FieldType:=as.factor(FieldType)]
dpl[,Weather:=as.factor(Weather)]
dpl[,PlayType:=as.factor(PlayType)]
dpl[,Position:=as.factor(Position)]

di[,Severity:=DM_M1+DM_M7+DM_M28+DM_M42]


### pivot data and we see that Synthetic turf has higher incidence of injuries across the three severity levels

In [50]:
di[, .(DM_M1 =sum(DM_M1), DM_M7 =sum(DM_M7), 
                                 DM_M28=sum(DM_M28),DM_M42=sum(DM_M42) ), by=c("Surface")]


Surface,DM_M1,DM_M7,DM_M28,DM_M42
<fct>,<int>,<int>,<int>,<int>
Synthetic,57,41,22,16
Natural,48,35,15,13


### Breaking down further. 

In [51]:
di[order(BodyPart,Surface), .(DM_M1 =sum(DM_M1),DM_M7 =sum(DM_M7), 
                                 DM_M28=sum(DM_M28),DM_M42=sum(DM_M42) ), by=c("Surface","BodyPart")]


Surface,BodyPart,DM_M1,DM_M7,DM_M28,DM_M42
<fct>,<fct>,<int>,<int>,<int>,<int>
Natural,Ankle,17,9,3,3
Synthetic,Ankle,25,17,10,8
Natural,Foot,5,5,5,4
Synthetic,Foot,2,2,2,1
Natural,Heel,1,1,0,0
Natural,Knee,24,19,7,6
Synthetic,Knee,24,18,9,7
Natural,Toes,1,1,0,0
Synthetic,Toes,6,4,1,0


In [60]:
#merge(d1, d2, by.x="a", by.y="b", all=TRUE)
d <- merge(dpl,di,by=c("PlayerKey", "GameID"), all.x=TRUE)
d[is.na(DM_M1)|is.na(DM_M7)|is.na(DM_M28)|is.na(DM_M42),':='(DM_M1=0,DM_M7=0,DM_M28=0,DM_M42=0)]

In [85]:
lm.1 <- d[,glm(DM_M42~Position*Surface, family=binomial,maxit = 100)]
summary(lm.1)



Call:
glm(formula = DM_M42 ~ BodyPart * Surface, family = binomial, 
    maxit = 100)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.22931  -0.76003  -0.50246  -0.00022   2.14837  

Coefficients: (1 not defined because of singularities)
                              Estimate Std. Error z value Pr(>|z|)    
(Intercept)                    -2.2030     0.1384 -15.921   <2e-16 ***
BodyPartFoot                    4.6009     0.2955  15.569   <2e-16 ***
BodyPartHeel                  -15.3631   907.6100  -0.017    0.986    
BodyPartKnee                    0.1971     0.1857   1.062    0.288    
BodyPartToes                  -15.3631   548.6235  -0.028    0.978    
SurfaceSynthetic                1.3969     0.1539   9.075   <2e-16 ***
BodyPartFoot:SurfaceSynthetic  -4.6667     0.3918 -11.910   <2e-16 ***
BodyPartHeel:SurfaceSynthetic       NA         NA      NA       NA    
BodyPartKnee:SurfaceSynthetic  -0.4851     0.2193  -2.212    0.027 *  
BodyPartToes:SurfaceSyn