In [1]:
from fidap import fidap_client
from config import api_key
import pandas as pd

# instantiate api connection
fidap = fidap_client(api_key=api_key)

### Zip Codes to Census Tracts  
  
Zip Codes are maintained by the USPS and can be changed as frequently as every quarter. The USPS Zip Codes are associated with mail delivery routes and service areas than strict geospatial statistical units to be maintained over time.   
  
On the other hand, Census Tracts, Block Groups, and Blocks are stable statistical units maintained by the US Census Bureau. A glossary of their definitions can be found [here](https://www.census.gov/programs-surveys/geography/about/glossary.html).   
  
However, as data can be reported on either scale, HUD has created a crosswalk table that lists the USPS Zip Code that each Census Tract intersects with. In other words, some Census Tracts can intersect multiple Zip Codes. 

In [4]:
census_tracts_zipcode_count = fidap.sql("""
WITH zipcode_count AS (
SELECT census_tract_geoid, COUNT(zip_code) AS zip_count
FROM  bigquery-public-data.hud_zipcode_crosswalk.zipcode_to_census_tracts
GROUP BY census_tract_geoid)

SELECT *
FROM zipcode_count
WHERE zip_count > 1
""")

census_tracts_zipcode_count.sort_values('zip_count', ascending=False).head(n = 10)

Unnamed: 0,census_tract_geoid,zip_count
228,11001006202,44
233,11001010700,30
236,11001005800,29
238,11001010200,28
270,54011000600,25
227,54039000900,23
265,18003001300,22
30,51770000602,20
237,11001010100,20
243,36001001100,19


In the table above, I looked at the number of zip codes that each Census Tract intersects with and sorted them in descending order. The top four are Tracts found in DC, followed by two others in West Virginia. 

There is another geospatial statistical concept/product that we might come into contact with - Zip Code Tabulation Areas (ZCTAs). An in-depth explanation can be found [here](https://www.census.gov/programs-surveys/geography/guidance/geo-areas/zctas.html).  
  
To cut a long story short, ZCTAs are created by the US Census Bureau to associate each Census Block with the Zip Code that occurred most commonly among addresses in that Block. As I have pointed out above, each Tract (and Block) can intersect multiple Zip Codes. As ZCTAs pick the most commonly occurring Zip Code, there can be cases that an address in a block is assigned to a Zip Code that is different to its USPS mailing address Zip Code. For the most part however, the ZCTA code is th same as the ZIP code for an area.   
  
Fidap should pipe in the ZCTA data file found [here](https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2020&layergroup=ZIP+Code+Tabulation+Areas).  
  
As ZIP codes/ZCTAs are usually larger in size than Census Tracts, it can be difficult or downright impossible to subdivide or break down data collected at the scale of ZIP Codes or ZCTAs to Census Tracts and Blocks without some assumptions made along the way.  
  
On the flip side, data collected on the scale of Census Blocks can sometimes be aggregated to approximate the ZCTA or ZIP Code equivalent.   