# Final Integration: AQI, Race, and Median Household Income

### Objective
This notebook documents the final step in our data preparation pipeline: combining air quality data (AQI), racial demographic data, and economic data (Median Household Income) into a single, unified dataset. This integrated dataset is essential for a comprehensive analysis of environmental justice.

In [None]:
import pandas as pd
import os

# File Paths
AQI_RACE_PATH = '../JOINED-aqi-race/aqi_race_joined.csv'
AQI_INCOME_PATH = '../JOINED-aqi-householdincome-dataset/aqi_income_joined.csv'
OUTPUT_PATH = 'aqi_income_race_joined.csv'

def load_data():
    race_df = pd.read_csv(AQI_RACE_PATH)
    income_df = pd.read_csv(AQI_INCOME_PATH)
    return race_df, income_df

race_df, income_df = load_data()

print(f"AQI/Race Dataset: {race_df.shape[0]} rows")
print(f"AQI/Income Dataset: {income_df.shape[0]} rows")

## 1. Triple Join Strategy

Both source datasets already include the baseline AQI data (`median_aqi`, `Year`, `sample_weight`). We will join them on `State` and `County` using an inner join. 

Since the AQI columns are identical in both, we will drop the redundant ones from one of the datasets before merging to keep the output clean.

In [None]:
# Drop redundant AQI columns from the income dataset before merging
income_df_subset = income_df.drop(columns=['Year', 'median_aqi', 'sample_weight'])

# Final inner join
final_df = pd.merge(
    race_df, 
    income_df_subset, 
    on=['State', 'County'], 
    how='inner'
)

print(f"Final Integrated Dataset: {final_df.shape[0]} rows")
final_df.head(10)

## 2. Exporting results

We save the final combined dataset to `aqi_income_race_joined.csv`.

In [None]:
final_df.to_csv(OUTPUT_PATH, index=False)
print(f"Final dataset successfully exported to {OUTPUT_PATH}")