## EPA-Justice QC

This notebook runs an independent validation of the results produced in the `fetch_data_and_export.ipynb` notebook. The goals of the QC are to:


- **Validate the API:** The functions used in the `fetch_data_and_export.ipynb` notebook utilize APIs to find census data, but here we will actually use data from some downloaded CSVs to compare against the results in `data_to_export.csv`. This should prove that the data gathered from the API matches what we see in the raw tabular data.

- **Check the math:** We will do an independent aggregation of the original data to check our math, and keep those original data values visible for inspection.

- **Ensure reasonable values:** A number of QC checks are done to assure that values are within expected ranges, and also to assess missing data.


Note: The 2020 US Census data CSVs (DHC tables P12 and P9, and ACS 5 year tables S1810 and S2701) were downloaded from the [data.census.gov](data.census.gov) online application, after filtering for all counties, places, and tracts in Alaska. The CDC PLACES and SDOH data CSVs for counties, places, and tracts in Alaska were downloaded from the CDC [data portal](https://data.cdc.gov/browse?category=500+Cities+%26+Places).

In [26]:
import pandas as pd
import random
from utilities.luts import *

places = pd.read_csv('tbl/NCRPlaces_Census_04192024.csv')
results = pd.read_csv('tbl/data_to_export.csv')

Choose 20 random places for QC, and get corresponding results.

In [27]:
qc_places = places.where(places['id'].isin(random.sample(places['id'].to_list(), 20))).dropna(subset='id')
qc_results = results.where(results['id'].isin(qc_places['id'])).dropna(subset='id')