# Data Acquisition via File Ingestion (JSON Format)

---

## Setup and Ingestion

### Import Modules

In [None]:
# Import necessary modules: 
# json for handling native JSON file reading.
# pandas for data ingestion, manipulation, and analysis.
import json
import pandas as pd
from datetime import datetime

print("Modules imported.")

### Define File Path and Ingestion Method
This cell defines the file path and uses the robust Python method for loading a standard JSON file (a list of records) into a native Python list.

In [None]:
# Define the path to the JSON file
FILE_PATH = "nba_records.json" 

# --- 1. Load the JSON data from the file into a Python list ---
try:
    with open(FILE_PATH, 'r') as file_pointer:
        # json.load() is used because the file is a standard list of records
        nba_data_list = json.load(file_pointer)
    
    print(f"File '{FILE_PATH}' successfully loaded into a Python list.")
    print(f"Loaded {len(nba_data_list)} records.")
    
except FileNotFoundError:
    print(f"ERROR: File not found at {FILE_PATH}. Please ensure the file is in the correct directory.")

### Take a Look at Some of the Raw Data

In [None]:
nba_data_list[0:3]

### Convert to Pandas DataFrame
This is the key step that transforms the raw Python list of dictionaries into a structured, tabular object for analysis.

In [None]:
# Convert the list of dictionaries directly into a Pandas DataFrame
nba_df = pd.DataFrame(nba_data_list)

print("List converted into a DataFrame.")
print(f"DataFrame shape: {nba_df.shape}")
print("\nFirst 3 rows of the full DataFrame:")
display(nba_df.head(3))

---

## Data Filtering and Analysis

### Filter Columns
We filter the DataFrame to keep only the columns relevant for the current analysis: TEAM_CITY, TEAM_NAME, and PTS.

In [None]:
# Select only the desired columns and create a copy of the new DataFrame
team_analysis_df = nba_df[['TEAM_CITY', 'TEAM_NAME', 'PTS']].copy()

print("DataFrame filtered to only display TEAM_CITY, TEAM_NAME, and PTS.")
display(team_analysis_df.head())

### Calculate Average Points (Average_PTS)
We calculate a single scalar value—the average points—which will be used as the baseline for our analysis.

In [None]:
# Calculate the Average_PTS across all teams using the .mean() method
average_pts = team_analysis_df['PTS'].mean()

print(f"Calculated Average Points (Average_PTS) across all teams: {average_pts:.2f}")

### Calculate the Delta Column
This demonstrates vectorization in Pandas by subtracting the single scalar (average_pts) from the entire PTS column at once.

In [None]:
# Create the Delta column: PTS - Average_PTS
# This shows how much each team scored above (positive) or below (negative) the league average.
team_analysis_df['Delta'] = team_analysis_df['PTS'] - average_pts

print("New 'Delta' column created.")
display(team_analysis_df.head())

---

## Output and Load

### Sort and Finalize
We sort the DataFrame by the new Delta column to clearly show the best and worst performing teams relative to the average.

In [None]:
# Sort the DataFrame from greatest to least based on the Delta column
team_analysis_df = team_analysis_df.sort_values(by='Delta', ascending=False)

print("DataFrame sorted by Delta (greatest difference above average first).")
display(team_analysis_df.head(5))

### Export to CSV
This is the final Load (L) phase, saving the cleaned and analyzed data to a persistent file.

In [None]:
# Define the output filename
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_filename = f'nba_team_points_analysis_{timestamp}.csv'

# Export the final DataFrame to a CSV file
team_analysis_df.to_csv(output_filename, index=False)

print(f"--- Analysis and Load Phase Complete ---")
print(f"Clean, analyzed data successfully exported to: {output_filename}")