# The three hypotheses/questions we have for this data are:
##        1. Did NFL home teams perform better in 2018?
##        2. Did Offenses get better as the season progressed?
##        3. Are passes thrown to the right side of the field more? And how do throws to the         middle and right compare to throws to the left in terms of yards in the air?

The data for this notebook can be found at https://www.kaggle.com/maxhorowitz/nflplaybyplay2009to2016?select=NFL+Play+by+Play+2009-2018+%28v5%29.csv
Please download the csv that has the NFL games from 2009 to 2018

In [None]:
import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt
import requests
from io import StringIO

In [None]:
#This code is not working so will hav to download the file to your computer

#Read the csv into a pandas dataframe
#orig_url = 'https://drive.google.com/file/d/1JHf7_TE1t35QXq_2n3VYngD9BhMpTUl1/view?usp=sharing'
#file_id = orig_url.split('/')[-2]
#dwn_url='https://drive.google.com/uc?export=download&id=' + file_id
#url = requests.get(dwn_url).text
#csv_raw = StringIO(url)
#NFL = pd.read_csv(csv_raw)

In [None]:
#Take the location of the downloaded csv on your computer and enter it into the input
file = input('Enter the location of the downloaded CSV: ')

In [None]:
NFL = pd.read_csv(file)

### First we entered the location of the NFL CSV which will then be read into a pandas dataframe. Next we will create a dataframe to only include games that are from 2018. 

In [None]:
NFL2018 = NFL.copy()
NFL2018['game_date'] = pd.to_datetime(NFL2018['game_date'], format = '%Y-%m-%d')
NFL2018 = NFL2018.loc[NFL2018['game_date'].astype(str).str[:4] == '2018']
NFL2018

## Each play of a certain game has that game's game_id. Next we will find the last play of each game to find the final score and compare how many times the home team won vs how many times the away team won. 

In [None]:
game = NFL2018.groupby('game_id')
FinalResults = game.last()
Scores = FinalResults[['total_home_score','total_away_score']]
Scores

In [None]:
# We will query here to count the instances where the home team won and the instances where the away team won
HomeWins = len(Scores.query('total_home_score > total_away_score'))
AwayWins = len(Scores.query('total_home_score < total_away_score'))

In [None]:
# Now visualize the counts of home vs. away wins in a bar chart
teams = ['Home', 'Away']
results = [HomeWins, AwayWins]
plt.bar(teams,results)
plt.xlabel('Home/Away')
plt.ylabel('Number of Wins')
plt.title('2018 NFL Home vs. Away Team Wins')
plt.show()

### It looks like it was advantageous to be the Home team in 2018.

## Next we will try to see if offenses gained momentum throughout the year, comparing the total points scored in the first half of the year vs the second.

In [None]:
totalscore = Scores['total_home_score'] + Scores['total_away_score']

In [None]:
Scores2 = Scores.copy()
Scores2['points_scored'] = totalscore

In [None]:
Scores2

In [None]:
# We will take the total points scored from the first half of the dataframe and compare it to the total in the second half
First_Half_Scores = Scores2[:112]
Second_Half_Scores = Scores2[112:]

In [None]:
# Now we visualize the difference in points scored in a bar chart
hlfs = ['First Half', 'Second Half']
points = [First_Half_Scores['points_scored'].sum(), Second_Half_Scores['points_scored'].sum()]
plt.bar(hlfs,points)
plt.xlabel('First/Second Half of the year')
plt.ylabel('Total Points Scored')
plt.title('2018 Points Scored in First Half vs. Second Half of Season')
plt.show()

### This is surprising since many believe offenses struggle to reach their potential in the beginning of the season

### A better way to measure offensive efficiency throughout the year may be to find the average yards per play of the offenses week by week as a multitude of different factors other than offensive efficiency can contribute to the points scored we measured earlier. 

In [None]:
# To better measure offensive efficiency we can average the yards per play on play types that are a run and pass as kickoffs, punts, and field goals are not done by the offense.

In [None]:
#Create a dataframe called plays that is indexed by the play type
Plays = NFL2018.set_index('play_type')

In [None]:
#Now create the offensive plays dataframe by including only plays that were a run or a pass
Off_Plays = Plays.loc[["pass", "run"]]

In [None]:
#Sort the df by date 
Off_Plays = Off_Plays.sort_values(by="game_date")

In [None]:
# Get a list of the dates of games so that we can calculate the average yards per play per week
Off_Plays.game_date.unique()

In [None]:
# The NFL throughout the season has thursday night games and monday night games so we will combine those games 
# into the week of games they belong to to get average yards per play of all the games played that week

In [None]:
# Change the game_date type to string to more easily group together games on certain dates
Off_Plays['game_date']= Off_Plays['game_date'].astype(str)

In [None]:
# index the Off_plays df by the game date
Off_Plays = Off_Plays.set_index('game_date')

In [None]:
#Week 1 average yards per play
week1 = Off_Plays.loc[['2018-09-06', '2018-09-09', '2018-09-10']]
avg1 = week1['yards_gained'].mean()

#Week 2 average yards per play
week2 = Off_Plays.loc[['2018-09-13', '2018-09-16', '2018-09-17']]
avg2 = week2['yards_gained'].mean()

#Week 3 average yards per play
week3 = Off_Plays.loc[['2018-09-20', '2018-09-23', '2018-09-24']]
avg3 = week3['yards_gained'].mean()

#Week 4 average yards per play
week4 = Off_Plays.loc[['2018-09-27', '2018-09-30', '2018-10-01']]
avg4 = week4['yards_gained'].mean()

#Week 5 average yards per play
week5 = Off_Plays.loc[['2018-10-04', '2018-10-07', '2018-10-08']]
avg5 = week5['yards_gained'].mean()

#Week 6 average yards per play
week6 = Off_Plays.loc[['2018-10-11', '2018-10-14', '2018-10-15']]
avg6 = week6['yards_gained'].mean()

#Week 7 average yards per play
week7 = Off_Plays.loc[['2018-10-18', '2018-10-21', '2018-10-22']]
avg7 = week7['yards_gained'].mean()

#Week 8 average yards per play
week8 = Off_Plays.loc[['2018-10-25', '2018-10-28', '2018-10-29']]
avg8 = week8['yards_gained'].mean()

#Week 9 average yards per play
week9 = Off_Plays.loc[['2018-11-01', '2018-11-04', '2018-11-05']]
avg9 = week9['yards_gained'].mean()

#Week 10 average yards per play
week10 = Off_Plays.loc[['2018-11-08', '2018-11-11', '2018-11-12']]
avg10 = week10['yards_gained'].mean()

#Week 11 average yards per play
week11 = Off_Plays.loc[['2018-11-15', '2018-11-18', '2018-11-19']]
avg11 = week11['yards_gained'].mean()

#Week 12 average yards per play
week12 = Off_Plays.loc[['2018-11-22', '2018-11-25', '2018-11-26']]
avg12 = week12['yards_gained'].mean()

#Week 13 average yards per play
week13 = Off_Plays.loc[['2018-11-29', '2018-12-02', '2018-12-03']]
avg13 = week13['yards_gained'].mean()

#Week 14 average yards per play
week14 = Off_Plays.loc[['2018-12-06', '2018-12-09', '2018-12-10']]
avg14 = week14['yards_gained'].mean()

#Week 15 average yards per play
week15 = Off_Plays.loc[['2018-12-13', '2018-12-15', '2018-12-16', '2018-12-17']]
avg15 = week15['yards_gained'].mean()

In [None]:
#Now we can put those averages into a line graph to chart the offenses progress over the weeks
plt.plot(['Wk1', 'Wk2', 'Wk3', 'Wk4', 'Wk5','Wk6', 'Wk7', 'Wk8', 'Wk9', 'Wk10','Wk11', 'Wk12', 'Wk13', 'Wk14', 'Wk15'],[avg1, avg2, avg3, avg4, avg5, avg6, avg7, avg8, avg9, avg10, avg11, avg12, avg13, avg14, avg15])
plt.xlabel("Week of the Season")
plt.ylabel("Average Offensive Yards Per Play")
plt.title('2018 Weekly Average Yards Per Play')
plt.ylim(ymin=0)
plt.ylim(ymax=8)
plt.xticks(rotation = 'vertical')

#### I would have expected the yards per play to increase by week as offenses begin to reach their potential week after week. However this answers that hypothesis by showing that no, offenses on average do not get better as the season progresses.

## Since most NFL quarterbacks are right handed, their vision may favor the right sight of the field, lets evaluate our hypotheses about throwing the ball to the right or left side of the field

## First, are passes thrown to the right side of the field more than the left?

In [None]:
#Take Plays dataframe to create a dataframe with only passes

In [None]:
Pass_Plays = Plays.loc[["pass"]]

In [None]:
#Count the instances a pass was thrown to the right and left
TotalPasses = Pass_Plays.shape[0]

In [None]:
Pass_Plays['pass_location'].value_counts()

In [None]:
print('A pass was thrown to the right:', (6473/TotalPasses)*100,'percent of the throws.')
print('A pass was thrown to the left:', (5947/TotalPasses)*100,'percent of the throws.')

### Yes passes are thrown more to the right side of the field, but only slightly. Quarterbacks did not seem to mind throwing to the left side of the field. Let's compare the distance of the passes thrown to the middle and right compared to the left since the middle and right of the field are more in the quarterback's vision

In [None]:
#Create seperate right/middle and left dataframes depending on the location of the pass in Pass_Plays df
R_M_Passes = Pass_Plays.loc[Pass_Plays['pass_location'] != 'left']
L_Passes = Pass_Plays.loc[Pass_Plays['pass_location'] == 'left']

In [None]:
# Take the average distance of throws in the air of the two different dataframes
RM_Pass_Length = R_M_Passes['air_yards'].mean()
L_Pass_Length = L_Passes['air_yards'].mean()

In [None]:
#Now plot the averages to see if there is a notable difference
PassLoc = ['Right or Middle', 'Left']
passlength = [RM_Pass_Length, L_Pass_Length]
plt.bar(PassLoc,passlength)
plt.xlabel('Pass Location')
plt.ylabel('Average Distance Thrown (Yards)')
plt.title('Average Distance of Right/Middle vs. Left Passes')
plt.show()

## Another result I did not expect. There is barely any difference between the percentage of passes thrown to the right versus left and there is barely any difference between the average length of passes thrown to the Middle/Right versus the left