# **1. Reading the CSV File:**

In [2]:
import pandas as pd

acc_players = pd.read_csv('/Users/emorywise/Desktop/DS2002 Data/acc_players-2324F.csv')

# These two lines of code provided by ChatGPT:
acc_players.columns = acc_players.iloc[0] # Make the first row the column headers
acc_players = acc_players[1:].reset_index(drop=True) # Drop the first row, which is now redundant

acc_players.head()

Unnamed: 0,Rk,Player,Class,Pos,School,G,MP,TRB,AST,STL,...,TOV,PF,PTS,FG%,2P%,3P%,FT%,PER,WS,BPM
0,1,Amaree Abram,SO,G,Georgia Tech,10,108,17,11,1,...,9,10,34,0.262,0.318,0.2,0.615,4.1,-0.1,-6.4
1,2,Sola Adebisi,FR,F,Florida State,7,9,1,1,0,...,1,1,2,0.5,0.5,,,3.5,0.0,-6.7
2,3,Prince Aligbe,SO,F,Boston College,35,651,119,21,15,...,30,51,164,0.435,0.522,0.147,0.62,9.4,0.9,-1.3
3,4,Abe Atiyeh,SR,G,Boston College,4,6,0,0,0,...,1,0,3,0.333,0.0,1.0,,1.1,0.0,-8.2
4,5,Zack Austin,JR,F,Pittsburgh,33,746,137,29,32,...,13,35,216,0.417,0.563,0.295,0.737,18.0,2.9,7.9


# 2. Basic Analysis:

In [40]:
acc_players['PTS'] = pd.to_numeric(acc_players['PTS'], errors='coerce') # Changes 'PTS' to numeric so I can sum them
acc_players['MP'] = pd.to_numeric(acc_players['MP'], errors='coerce') # Changes MP to numeric
acc_players['TRB'] = pd.to_numeric(acc_players['TRB'], errors='coerce') # Changes TRB to numeric

total_pts = acc_players['PTS'].sum() # Sum points

max_mins = acc_players[['Player', 'MP']].sort_values(by='MP', ascending=False).head(1).reset_index(drop=True) # Finds the player with the highest MP

top_rbds = acc_players[['Player', 'TRB']].sort_values(by='TRB', ascending=False).head(5).reset_index(drop=True) # Finds the top 5 players with the most rebounds

print('Total combined points: ')
print(total_pts)

print('\nPlayer with the most minutes: ')
print(max_mins)

print('\nTop 5 players with the most rebounds: ')
print(top_rbds)

Total combined points: 
38411

Player with the most minutes: 
0         Player    MP
0  Casey Morsell  1333

Top 5 players with the most rebounds: 
0           Player  TRB
0    Armando Bacot  380
1  Ian Schieffelin  340
2  Harrison Ingram  327
3   Mohamed Diarra  311
4    Norchad Omier  309


# 3. Player filtering:

In [41]:
acc_players['AST'] = pd.to_numeric(acc_players['AST'], errors='coerce') # Changes AST to numeric
acc_players['BLK'] = pd.to_numeric(acc_players['BLK'], errors='coerce') # Changes BLK to numeric

most_active_players = acc_players[acc_players['MP'] > 500].reset_index(drop=True) # DataFrame of only players with over 500 MP

top_assists_player = most_active_players[['Player', 'AST']].sort_values(by='AST', ascending=False).head(1).reset_index(drop=True) # Finds the player with the highest total assists

top_three_assists = most_active_players[['Player', 'AST']].sort_values(by='AST', ascending=False).head(3).reset_index(drop=True) # Finds the three players with the highest total assists

top_three_blockers = most_active_players[['Player', 'BLK']].sort_values(by='BLK', ascending=False).head(3).reset_index(drop=True) # Finds the three players with the highest blocks

print('The player with the highest total assists: ')
print(top_assists_player)

print('\nTop 3 players with the most assists: ')
print(top_three_assists)

print('\nTop 3 players with the most blocks: ')
print(top_three_blockers)

The player with the highest total assists: 
0         Player  AST
0  Reece Beekman  212

Top 3 players with the most assists: 
0          Player  AST
0   Reece Beekman  212
1  Jaeden Zackery  152
2   Elliot Cadeau  150

Top 3 players with the most blocks: 
0         Player  BLK
0      Ryan Dunn   77
1   Quinten Post   61
2  Armando Bacot   56


In [52]:
grouped_school = acc_players.groupby('School') # Groups players by school

points_per_school = grouped_school['PTS'].sum() # Finds the points scored per school

assists_per_school = grouped_school['AST'].sum() # Finds the assists per school

top_three_points = points_per_school.sort_values(ascending=False).head(3) # Finds the top three teams with the most points scored, along with their points

print('Points scored per school: ')
print(points_per_school)

print('\nAssists per school: ')
print(assists_per_school)

print('\nTop three teams with the most points scored: ')
top_three_points

Points scored per school: 
School
Boston College    2667
Clemson           2785
Duke              2830
Florida State     2526
Georgia Tech      2272
Louisville        2304
Miami (FL)        2424
NC State          3101
North Carolina    3032
Notre Dame        2113
Pittsburgh        2495
Syracuse          2442
Virginia          2140
Virginia Tech     2547
Wake Forest       2733
Name: PTS, dtype: int64

Assists per school: 
School
Boston College    509
Clemson           533
Duke              551
Florida State     406
Georgia Tech      425
Louisville        356
Miami (FL)        454
NC State          536
North Carolina    536
Notre Dame        335
Pittsburgh        452
Syracuse          442
Virginia          509
Virginia Tech     514
Wake Forest       429
Name: AST, dtype: int64

Top three teams with the most points scored: 


School
NC State          3101
North Carolina    3032
Duke              2830
Name: PTS, dtype: int64

# 4. Reflection:

**What did you learn about working with CSV files and pandas DataFrames in this assignment?**

This assignment was extremely beneficial in terms of improving my ability to manipulate and filter data. Firstly, it helped me to realize the importance of inspecting and cleaning your data before conducting any analysis. Throughout the assignment, I was constantly changing the data types of variables and manipulating the DataFrame in various other ways. This method is unorganized and inefficient. Instead, I should be doing the data cleaning and preparation in advance of trying to answer any questions. Second, I improved my ability to recognize how individual lines of code are structured and on what data types I can use certain methods.

**What was the most challenging aspect of this assignment, and how did you overcome it?**

The most challenging aspect of this assignment was the final question. I was struggling with applying methods to and understanding the grouped object. To overcome this, I had to really force myself to learn more about that data type. I displayed the data in several different ways and was finally able to understand it enough to know what code to use and to be able to explain exactly what I was doing.

**How do you think the insights gained from analyzing ACC basketball statistics could be applied to other real-world datasets?**

The data analysis techniques I used here can be applied to a variety of other situations. In particular, the analysis structure that I realized in retrospect that I should've been using works in any context. The ability to understand what parts of your data need cleaning and to efficiently undertake the cleaning is something that I didn't realize would be such a big part of data analysis. Having already cleaned data when working through analysis makes the analysis more effective and less time-consuming. Additionally, it prevents you from having to look through code to understand where you made a change. In the future, I will be applying the methods I learned in this assignment to any analysis I do.