# Data Acquisition via File Ingestion (JSON Format)

---

## Setup and Ingestion

### Import Modules

In [1]:
# Import necessary modules: 
# json for handling native JSON file reading.
# pandas for data ingestion, manipulation, and analysis.
import json
import pandas as pd
from datetime import datetime

print("Modules imported.")

Modules imported.


### Define File Path and Ingestion Method
This cell defines the file path and uses the robust Python method for loading a standard JSON file (a list of records) into a native Python list.

In [3]:
# Define the path to the JSON file
FILE_PATH = "nba_records.json" 

# --- 1. Load the JSON data from the file into a Python list ---
try:
    with open(FILE_PATH, 'r') as file_pointer:
        # json.load() is used because the file is a standard list of records
        nba_data_list = json.load(file_pointer)
    
    print(f"File '{FILE_PATH}' successfully loaded into a Python list.")
    print(f"Loaded {len(nba_data_list)} records.")
    
except FileNotFoundError:
    print(f"ERROR: File not found at {FILE_PATH}. Please ensure the file is in the correct directory.")

File 'nba_records.json' successfully loaded into a Python list.
Loaded 30 records.


### Take a Look at Some of the Raw Data

In [4]:
nba_data_list[0:3]

[{'TEAM_ID': '1610612744',
  'TEAM_CITY': 'Golden State',
  'TEAM_NAME': 'Warriors',
  'TEAM_ABBREVIATION': 'GSW',
  'TEAM_CODE': '',
  'GP': 82,
  'MIN': 48.7,
  'PTS': 114.9,
  'PTS_DRIVE': 14.9,
  'FGP_DRIVE': 0.498,
  'PTS_CLOSE': 16.7,
  'FGP_CLOSE': 0.645,
  'PTS_CATCH_SHOOT': 33.7,
  'FGP_CATCH_SHOOT': 0.428,
  'PTS_PULL_UP': 21.5,
  'FGP_PULL_UP': 0.418,
  'FGA_DRIVE': 11.0,
  'FGA_CLOSE': 11.1,
  'FGA_CATCH_SHOOT': 28.3,
  'FGA_PULL_UP': 21.5,
  'EFG_PCT': 0.563,
  'CFGM': 21.4,
  'CFGA': 44.8,
  'CFGP': 0.478,
  'UFGM': 21.2,
  'UFGA': 42.5,
  'UFGP': 0.497,
  'CFG3M': 2.3,
  'CFG3A': 6.3,
  'CFG3P': 0.363,
  'UFG3M': 10.8,
  'UFG3A': 25.3,
  'UFG3P': 0.429},
 {'TEAM_ID': '1610612759',
  'TEAM_CITY': 'San Antonio',
  'TEAM_NAME': 'Spurs',
  'TEAM_ABBREVIATION': 'SAS',
  'TEAM_CODE': '',
  'GP': 82,
  'MIN': 48.3,
  'PTS': 103.5,
  'PTS_DRIVE': 14.8,
  'FGP_DRIVE': 0.481,
  'PTS_CLOSE': 17.8,
  'FGP_CLOSE': 0.611,
  'PTS_CATCH_SHOOT': 27.1,
  'FGP_CATCH_SHOOT': 0.419,
  'PTS_P

### Convert to Pandas DataFrame
This is the key step that transforms the raw Python list of dictionaries into a structured, tabular object for analysis.

In [5]:
# Convert the list of dictionaries directly into a Pandas DataFrame
nba_df = pd.DataFrame(nba_data_list)

print("List converted into a DataFrame.")
print(f"DataFrame shape: {nba_df.shape}")
print("\nFirst 3 rows of the full DataFrame:")
display(nba_df.head(3))

List converted into a DataFrame.
DataFrame shape: (30, 33)

First 3 rows of the full DataFrame:


Unnamed: 0,TEAM_ID,TEAM_CITY,TEAM_NAME,TEAM_ABBREVIATION,TEAM_CODE,GP,MIN,PTS,PTS_DRIVE,FGP_DRIVE,...,CFGP,UFGM,UFGA,UFGP,CFG3M,CFG3A,CFG3P,UFG3M,UFG3A,UFG3P
0,1610612744,Golden State,Warriors,GSW,,82,48.7,114.9,14.9,0.498,...,0.478,21.2,42.5,0.497,2.3,6.3,0.363,10.8,25.3,0.429
1,1610612759,San Antonio,Spurs,SAS,,82,48.3,103.5,14.8,0.481,...,0.506,18.3,39.8,0.46,0.9,2.6,0.341,6.1,15.9,0.381
2,1610612739,Cleveland,Cavaliers,CLE,,82,48.7,104.3,16.9,0.481,...,0.473,18.2,40.7,0.447,1.7,5.7,0.299,9.0,23.9,0.378


---

## Data Filtering and Analysis

### Filter Columns
We filter the DataFrame to keep only the columns relevant for the current analysis: TEAM_CITY, TEAM_NAME, and PTS.

In [6]:
# Select only the desired columns and create a copy of the new DataFrame
team_analysis_df = nba_df[['TEAM_CITY', 'TEAM_NAME', 'PTS']].copy()

print("DataFrame filtered to only display TEAM_CITY, TEAM_NAME, and PTS.")
display(team_analysis_df.head())

DataFrame filtered to only display TEAM_CITY, TEAM_NAME, and PTS.


Unnamed: 0,TEAM_CITY,TEAM_NAME,PTS
0,Golden State,Warriors,114.9
1,San Antonio,Spurs,103.5
2,Cleveland,Cavaliers,104.3
3,Los Angeles,Clippers,104.5
4,Oklahoma City,Thunder,110.2


### Calculate Average Points (Average_PTS)
We calculate a single scalar value—the average points—which will be used as the baseline for our analysis.

In [7]:
# Calculate the Average_PTS across all teams using the .mean() method
average_pts = team_analysis_df['PTS'].mean()

print(f"Calculated Average Points (Average_PTS) across all teams: {average_pts:.2f}")

Calculated Average Points (Average_PTS) across all teams: 102.66


### Calculate the Delta Column
This demonstrates vectorization in Pandas by subtracting the single scalar (average_pts) from the entire PTS column at once.

In [8]:
# Create the Delta column: PTS - Average_PTS
# This shows how much each team scored above (positive) or below (negative) the league average.
team_analysis_df['Delta'] = team_analysis_df['PTS'] - average_pts

print("New 'Delta' column created.")
display(team_analysis_df.head())

New 'Delta' column created.


Unnamed: 0,TEAM_CITY,TEAM_NAME,PTS,Delta
0,Golden State,Warriors,114.9,12.24
1,San Antonio,Spurs,103.5,0.84
2,Cleveland,Cavaliers,104.3,1.64
3,Los Angeles,Clippers,104.5,1.84
4,Oklahoma City,Thunder,110.2,7.54


---

## Output and Load

### Sort and Finalize
We sort the DataFrame by the new Delta column to clearly show the best and worst performing teams relative to the average.

In [9]:
# Sort the DataFrame from greatest to least based on the Delta column
team_analysis_df = team_analysis_df.sort_values(by='Delta', ascending=False)

print("DataFrame sorted by Delta (greatest difference above average first).")
display(team_analysis_df.head(5))

DataFrame sorted by Delta (greatest difference above average first).


Unnamed: 0,TEAM_CITY,TEAM_NAME,PTS,Delta
0,Golden State,Warriors,114.9,12.24
4,Oklahoma City,Thunder,110.2,7.54
8,Sacramento,Kings,106.7,4.04
6,Houston,Rockets,106.5,3.84
23,Boston,Celtics,105.6,2.94


### Export to CSV
This is the final Load (L) phase, saving the cleaned and analyzed data to a persistent file.

In [10]:
# Define the output filename
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_filename = f'nba_team_points_analysis_{timestamp}.csv'

# Export the final DataFrame to a CSV file
team_analysis_df.to_csv(output_filename, index=False)

print(f"--- Analysis and Load Phase Complete ---")
print(f"Clean, analyzed data successfully exported to: {output_filename}")

--- Analysis and Load Phase Complete ---
Clean, analyzed data successfully exported to: nba_team_points_analysis_20250929_234803.csv
