# Zip Codes

Exploratory data analysis of the raw TIGER/Line Shapefile for zip codes.

#### Summary

The dataset contains 33,791 records. Apart from the geometry column, relevant fields include the zip code tabulation area (ZCTA) number (`ZCTA5CE20` or `GEOID20`), which represents a valid USPS ZIP Code as of January, 1, 2020. There are 57 ZCTAs in the District of Columbia and 148 in the U.S. territories. The dataset has a coordinate reference system (CRS) of EPSG:4269, which is standard for federal agencies.

It is important to note that ZIP Codes differ from ZCTAs, as [documented](https://www.census.gov/programs-surveys/geography/guidance/geo-areas/zctas.html) by the U.S. Census Bureau:

>First introduced in 1963, ZIP Code is a trademark of the USPS created to coordinate mail handling and delivery. The USPS assigns ZIP Code ranges to regional post offices, which in turn assign ZIP Codes to delivery routes. Each delivery route is composed of street networks, and/or individual units with high mail volumes, such as high-rise buildings or individual business locations. Within the Census Bureau’s MAF/TIGER System, ZIP Codes are stored as one component of discrete addresses tied to delivery points, including specific housing unit locations. The result is a point-based dataset unsuitable for mapping and many analysis applications.

>ZCTAs are generalized areal representations of the geographic extent and distribution of the point-based ZIP Codes built using 2020 Census tabulation blocks. The Census Bureau is restricted by Title 13 from releasing individual housing unit addresses or location information to the public, so point-based ZIP Code data are unsuitable for distribution and publication. However, by aggregating the points and extrapolating them to a polygonal-based unit (census tabulation blocks), disclosure concerns can be addressed and “geographic subtraction*” is prevented.



### Exploration

In [1]:
import geopandas as gpd

In [2]:
gdf = gpd.read_file("../data/raw/census/zip_codes/tl_2020_us_zcta520.zip")
gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 33791 entries, 0 to 33790
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   ZCTA5CE20   33791 non-null  object  
 1   GEOID20     33791 non-null  object  
 2   CLASSFP20   33791 non-null  object  
 3   MTFCC20     33791 non-null  object  
 4   FUNCSTAT20  33791 non-null  object  
 5   ALAND20     33791 non-null  int64   
 6   AWATER20    33791 non-null  int64   
 7   INTPTLAT20  33791 non-null  object  
 8   INTPTLON20  33791 non-null  object  
 9   geometry    33791 non-null  geometry
dtypes: geometry(1), int64(2), object(7)
memory usage: 2.6+ MB


In [3]:
gdf.head(2)

Unnamed: 0,ZCTA5CE20,GEOID20,CLASSFP20,MTFCC20,FUNCSTAT20,ALAND20,AWATER20,INTPTLAT20,INTPTLON20,geometry
0,35592,35592,B5,G6350,S,298552385,235989,33.7427261,-88.0973903,"POLYGON ((-88.24735 33.65390, -88.24713 33.654..."
1,35616,35616,B5,G6350,S,559506992,41870756,34.7395036,-88.0193814,"POLYGON ((-88.13997 34.58184, -88.13995 34.582..."


In [4]:
all(gdf["ZCTA5CE20"] == gdf["GEOID20"])

True

In [5]:
# See: https://en.wikipedia.org/wiki/List_of_ZIP_Code_prefixes
international_prefixes = ["001", "002", "004", "006", "007", "008", "009", "969"]
dc_prefixes = ["200", "202", "203", "204", "205"]
gdf["prefix"] = gdf["ZCTA5CE20"].str[:3]

In [6]:
len(gdf.query("prefix in @dc_prefixes").sort_values(by="ZCTA5CE20"))

57

In [7]:
len(gdf.query("prefix in @international_prefixes").sort_values(by="ZCTA5CE20"))

148

In [8]:
gdf.crs

<Geographic 2D CRS: EPSG:4269>
Name: NAD83
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. Puerto Rico. United States (USA) - Alabama; Alaska; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Hawaii; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming. US Virgin Islands. British Virgin Islands