# NBA Player Data

### One .csv from:
https://www.kaggle.com/drgilermo/nba-players-stats

### Data collected by Omri Goldstein from Kaggle with 162 votes by the community. He scraped this from basketball-reference.com.

### Data Contents: NBA Player Stats from 1950-2017


* Seasons_Stats.csv: Very detailed season stats for a given player. Describes important game statistics as column headers including: 

    Team, games, games started, minutes played, player efficiency rating, true shooting percent, 3 point attempt rate, free throw rate, offensive rebound percentage, etc. 
    
    Size - 24.7k x 53




# Why this dataset?

## These datasets are required to answer the following questions


# Questions:
* How have knee injuries impacted player performance?
 
    * Do they perform the same or worse after returning?
    * Case study: Grant Hill, hall of famer known for his injury plagued career. 
    
* How do Lebron James' stats reflect his age? Is Lebron slowing down with age, staying the same, or is he getting even better? 

* Which center has the best FT %? 

In [244]:
import pandas as pd 
import numpy as np
from functools import reduce


In [245]:
season_stats = pd.read_csv("nba-players-stats/Seasons_Stats.csv")

In [246]:
season_stats[:5]

Unnamed: 0.1,Unnamed: 0,Year,Player,Pos,Age,Tm,G,GS,MP,PER,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,0,1950.0,Curly Armstrong,G-F,31.0,FTW,63.0,,,,...,0.705,,,,176.0,,,,217.0,458.0
1,1,1950.0,Cliff Barker,SG,29.0,INO,49.0,,,,...,0.708,,,,109.0,,,,99.0,279.0
2,2,1950.0,Leo Barnhorst,SF,25.0,CHS,67.0,,,,...,0.698,,,,140.0,,,,192.0,438.0
3,3,1950.0,Ed Bartels,F,24.0,TOT,15.0,,,,...,0.559,,,,20.0,,,,29.0,63.0
4,4,1950.0,Ed Bartels,F,24.0,DNN,13.0,,,,...,0.548,,,,20.0,,,,27.0,59.0


# Cleaning:

* Problems: Some entries are NaN for a few reasons. Some stats didn't exist in certain years, eg. 3 pointers weren't added until '79. In other years, certain stats (TOV, ORB, DRB, etc) were just simply not recorded. For these entries, the csv records null. 


* This problem only applies to "antique" player records. We do not have this entry discrepency with modern players. 

# Question 1
## Starting with a given player, find the year when their injury happened and look at some essential stats:
* G: Games
* MP: Minutes Played
* PER: Player Efficiency Rating
    * This is the official "all in one basketball rating, which attempts to boil down all of a player's contributions into one number" - (https://en.wikipedia.org/wiki/Player_efficiency_rating)




# The Case of Grant Hill

* The years we're interested in are his full, healthy seasons before injury: 1995-2001 and his seasons after: 2002-2013

## First, we will examine his PER. 

In [283]:
# Collect averages of a given stat between the given years
# stat parameter should be given a stat that is calculatable, i.e floats and integers only
def average_stats(player_name, dataframe, stat, year_start, year_end):
    
    if(np.issubdtype(dataframe[stat].dtype, np.number)):
        
    
        #return dataframe of a given player_name
        player_stats = dataframe.loc[dataframe["Player"] == player_name] 


        #get range of rows of desired year onwards
        player_stats_year = player_stats[(player_stats["Year"] >= year_start) & (player_stats["Year"] <= year_end)] 


        #get all rows of the 'stat' parameter column 
        stat_values = player_stats_year.loc[:, stat] 

        average = reduce((lambda x, y: x + y), stat_values)/ len(stat_values)
    else:
        return "Error: Select a column that consists of numbers. "
    
    print (f'{average} is {player_name}\'s average {stat} score between {year_start} to {year_end}.')
    
    return average

    
    
    
        

In [284]:
beforeInjury = average_stats("Grant Hill", season_stats, "PER", 1995, 2001)
afterInjury = average_stats("Grant Hill", season_stats, "PER", 2002, 2013)
injury_impact = beforeInjury - afterInjury



21.87142857142857 is Grant Hill's average PER score between 1995 to 2001.
15.409090909090908 is Grant Hill's average PER score between 2002 to 2013.


## Secondly, we'll look at his Games played and Minutes played
* 2 important stats that can serve as a general indicator of a player's consistency and overall fitness. 

In [285]:
# *years contains tuple of years where indices 0,1 are years before injury and 2,3 years after
def injury_impact(player_name, stat, *years):
    
    before = average_stats(player_name, season_stats, stat, years[0], years[1])
    after = average_stats(player_name, season_stats, stat, years[2], years[3])
    
    injury_impact = before - after
    
    if(injury_impact < 0):
        return f'The impact of injury is: {injury_impact}. Fortunately, this player has improved after injury.'
         
    else:
        return f'The impact of injury is: {injury_impact}. Unfortunately, this player has declined after injury.'


In [282]:
print(injury_impact("Grant Hill", "G", 1995, 2001, 2002, 2013))

62.714285714285715 is Grant Hill's average G score between 1995 to 2001.
53.36363636363637 is Grant Hill's average G score between 2002 to 2013.
The impact of injury is: 9.350649350649348. Unfortunately, this player has declined after injury.


# Conclusion to Q1:
* Both Grant Hill's average PER and G/MP declined after the year 2001, which is when his injury happened. 
* While this code doesn't take into account other factors that can affect these 3 stats, such as age, team changes, position changes, etc, it can give us a general idea of the impact of his injury on his NBA career.
* The functions used to answer Q1 can be used to analyze other players as well by changing the parameters. 

# Question 2