In [82]:
from fidap import fidap_client
from config import api_key
import pandas as pd
import geopandas as gpd
import folium

# instantiate api connection
fidap = fidap_client(api_key=api_key)

### Zip Codes and Census Tracts  
  
Zip Codes are maintained by the USPS and can be changed as frequently as every quarter. The USPS Zip Codes are associated with mail delivery routes and service areas than strict geospatial statistical units to be maintained over time.   
  
On the other hand, Census Tracts, Block Groups, and Blocks are stable statistical units maintained by the US Census Bureau. A glossary of their definitions can be found [here](https://www.census.gov/programs-surveys/geography/about/glossary.html).   
  
However, as data can be reported on either scale, HUD has created a crosswalk table that lists the USPS Zip Code that each Census Tract intersects with. In other words, some Census Tracts can intersect multiple Zip Codes. 

In [4]:
census_tracts_zipcode_count = fidap.sql("""
WITH zipcode_count AS (
SELECT census_tract_geoid, COUNT(zip_code) AS zip_count
FROM  bigquery-public-data.hud_zipcode_crosswalk.zipcode_to_census_tracts
GROUP BY census_tract_geoid)

SELECT *
FROM zipcode_count
WHERE zip_count > 1
""")

census_tracts_zipcode_count.sort_values('zip_count', ascending=False).head(n = 10)

Unnamed: 0,census_tract_geoid,zip_count
228,11001006202,44
233,11001010700,30
236,11001005800,29
238,11001010200,28
270,54011000600,25
227,54039000900,23
265,18003001300,22
30,51770000602,20
237,11001010100,20
243,36001001100,19


In the table above, I looked at the number of zip codes that each Census Tract intersects with and sorted them in descending order. The top four are Tracts found in DC, followed by two others in West Virginia. 

There is another geospatial statistical concept/product that we might come into contact with - ZIP Code Tabulation Areas (ZCTAs). An in-depth explanation can be found [here](https://www.census.gov/programs-surveys/geography/guidance/geo-areas/zctas.html).  
  
To cut a long story short, ZCTAs are created by the US Census Bureau to associate each Census Block with the Zip Code that occurs most commonly among addresses in the Block. As I have pointed out above, each Tract (and Block) can intersect multiple ZIP Codes. As ZCTAs pick the most commonly occurring ZIP Code, there can be cases where an address in a block is assigned to a ZCTA that is different from its USPS mailing address ZIP Code. For the most part however, the ZCTA code for each block is the same as its USPS ZIP code for every address in the area.   
  
Fidap should pipe in the ZCTA data file found [here](https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2020&layergroup=ZIP+Code+Tabulation+Areas).  
  
As ZIP codes/ZCTAs are usually larger in size than Census Tracts, it can be difficult or downright impossible to subdivide or break down data collected at the scale of ZIP Codes or ZCTAs to Census Tracts and Blocks without some assumptions made along the way.  
  
For example, say we have a DMV dataset on the total number of cars registered in a ZCTA with 20 blocks. If we want to find out how many cars there are in each Block, we can do so according to the proportion of the total ZCTA population that each Block represents, assuming equal ability to own cars among the Blocks.  
  
What if we want to find out the volume of mail that each household in a Tract receives on average? Suppose the USPS releases data on the volume of mail it receives and delivers per USPS ZIP Code. We are interested in Census Tract `11001006202` which crosses 44 ZIP Codes. How are we supposed to do this? Do we count the number of addresses in the portion of Census Tract `11001006202` that is in each ZIP Code, take the mean for each ZIP Code, multiply by the number of addresses with that ZIP Code and sum it up? To do this, we will need a list of all possible addresses in the Census Tract!    
  
On the flip side, because Census Blocks and Census Tracts are much smaller geographical units that are likelier to reside in its entirety within a ZIP Code, data collected on the scale of Census Blocks can sometimes be aggregated to approximate the ZIP Code equivalent. That being said, I can think of situations where certain broad assumptions have to be made.  
  
Suppose the US Census Bureau has a dataset on the number of credit cards per person within a Census Tract, say Census Tract `06037265420` in West Los Angeles. This tract crosses the two ZIP Codes `90024` and `90049`. A bank might want data on the existing number of credit card holders in the `CA 90024` ZIP code to figure out how best to target the large numbers of UCLA students in the area. But to piece together the picture for `CA 90024`, the bank would have to stitch together data which includes the part of the Tract that crosses into `CA 90049`. `CA 90049` is Brentwood. It is a markedly different and more upscale residential community from Westwood, which is `CA 90024`. Data from Brentwood would probably skew the data.  
  
Even though all data collected in each Census Block can be ascribed to exactly one ZCTA, but beacause each Tract is made up of multiple Blocks, one Tract can still be ascribed to multiple ZCTAs. Further, the ZCTA might not match the USPS ZIP perfectly. We can see this in the example below where we used the US Census Bureau ZCTA to Tract relationship to illustrate this issue.   

In [None]:
zcta_tract_rs = pd.read_csv("https://www2.census.gov/geo/docs/maps-data/data/rel/zcta_tract_rel_10.txt")
zcta_tract_rs_repeat = zcta_tract_rs.copy().loc[:,"GEOID"]
zcta_tract_rs_repeat = zcta_tract_rs_repeat[zcta_tract_rs_repeat.duplicated()]

In [102]:
repeat_tract = zcta_tract_rs_repeat.tail(n = 1).tolist()
zcta_tract_rs.loc[zcta_tract_rs.loc[:,"GEOID"] == repeat_tract[0],:]

Unnamed: 0,ZCTA5,STATE,COUNTY,TRACT,GEOID,POPPT,HUPT,AREAPT,AREALANDPT,ZPOP,ZHU,ZAREA,ZAREALAND,TRPOP,TRHU,TRAREA,TRAREALAND,ZPOPPCT,ZHUPCT,ZAREAPCT,ZAREALANDPCT,TRPOPPCT,TRHUPCT,TRAREAPCT,TRAREALANDPCT
148886,99903,2,275,300,2275000300,31,69,456712842,401532954,31,69,456712842,401532954,2369,1428,8966687298,6582411037,100.0,100.0,100.0,100.0,1.31,4.83,5.09,6.1
148896,99929,2,275,300,2275000300,2338,1339,6235969341,5598556684,2338,1339,6235969341,5598556684,2369,1428,8966687298,6582411037,100.0,100.0,100.0,100.0,98.69,93.77,69.55,85.05


We can also see how this might work in real life using the map below.

In [3]:
# importing ZCTA, USPS ZIPs, and LA's Census Tracts respectively
tl_19_zcta = gpd.read_file("./tl_2019_us_zcta510/tl_2019_us_zcta510.shp")
la_zip_codes = gpd.read_file("./Zip_Codes_(LA_County)/Zip_Codes_(LA_County).shp")
la_census_tracts_json_url = "https://opendata.arcgis.com/datasets/7d2bb4e7c31e4c64b18479c9eb3b63d4_0.geojson"
la_census_tracts_json = gpd.read_file(la_census_tracts_json_url)

In [33]:
# filtering for CA 90024, CA 90049, and Census Tract 06037265420
west_side_la_zcta = tl_19_zcta.loc[(tl_19_zcta.loc[:,"ZCTA5CE10"] == "90024") | (tl_19_zcta.loc[:,"GEOID10"] == "90049"),:]
west_side_la_zcta_json = west_side_la_zcta.to_json()
west_side_la_USPS = la_zip_codes.loc[(la_zip_codes.loc[:,"ZIPCODE"] == "90024") | (la_zip_codes.loc[:,"ZIPCODE"] == "90049"), :]
west_side_la_USPS_json = west_side_la_USPS.to_json()
brentwood_la_census_tracts_json = la_census_tracts_json.loc[la_census_tracts_json.GEOID10 == '06037265420', :].to_json()

In [79]:
# setting up map
west_side = folium.Map(
    location = [34.0703288, -118.4572098], zoom_start = 17
)

# setting up styling functions
def style_function_zcta(feature): 
    return {
        'fillColor': "#fb8072",
        'color': "#fb8072"
}
style_function_zip = lambda x: {
    'fillColor': "#fdb462",
    'color': "#fdb462",
}

# adding layers
folium.GeoJson(west_side_la_zcta_json, name = "ZCTA", style_function=style_function_zcta).add_to(west_side)
folium.GeoJson(west_side_la_USPS_json, name = "USPS ZIP", style_function= style_function_zip).add_to(west_side)
folium.GeoJson(brentwood_la_census_tracts_json, name = "Sample Census Tract").add_to(west_side)
folium.LayerControl().add_to(west_side)
loc = "Differences between US Census Bureau ZCTA and USPS ZIP Codes, and Census Tracts"
title_html = '''
             <h4 align="center" style="font-size:14px"><b>{}</b></h4>
             '''.format(loc)   

west_side.get_root().html.add_child(folium.Element(title_html))

<branca.element.Element at 0x1d6de629e48>

The map below shows:  
1) US Census Bureau ZCTA Boundaries (Red)  
2) USPS ZIP Code Boundaries (Orange)  
3) US Census Bureau Overlapping Census Tract (Blue)

In [80]:
west_side

### Recommendations  
  
There is no universal panacea to this problem of non-equivalence and non-compatability between ZIP Codes and ZCTAs, and Census Tracts.  
  
What can be done is to use a ZCTA to Census Tract relationship file as shown below, or a USPS ZIP Code to Census Tract relationship file from the HUD, to map either of the two to Census Tracts. Most data points collected at the Census Block level are not available anyway, except at the Tract level. To estimate at the ZCTA level, an estimation using proportion of population will most probably be required.    
  
As for mapping between ZCTA and ZIP Codes, it might not be worth the effort to be too particular because the differences are likely to be small.   
  
Fidap can also import the ZCTA relationship file found [here](https://www2.census.gov/geo/docs/maps-data/data/rel/zcta_tract_rel_10.txt) to supplement the existing HUD one that maps USPS ZIPs to Census Tracts.

In [88]:
zcta_90024 = zcta_tract_rs.loc[zcta_tract_rs.ZCTA5 == 90024,:]
zcta_90024.loc[:, "ZCTA5":"GEOID"].head()

Unnamed: 0,ZCTA5,STATE,COUNTY,TRACT,GEOID
131094,90024,6,37,265100,6037265100
131095,90024,6,37,265201,6037265201
131096,90024,6,37,265202,6037265202
131097,90024,6,37,265301,6037265301
131098,90024,6,37,265303,6037265303
