# Census Tracts

Exploratory data analysis of the raw 2020 TIGER/Line Shapefiles for U.S. census tracts.

### Summary

The raw dataset of U.S. census tracts for 2020 is split into 56 files, each representing a single state or state-equivalent (i.e., the District of Columbia, American Samoa, Guam, Northern Mariana Islands, Puerto Rico, and the U.S. Virgin Islands). There are 85,528 records total (84,414 in the 50 U.S. states and 1,114 in the six U.S. territories). In addition to the geometry column, relevant columns include the FIPS Codes for states (`STATEFP`), counties (`COUNTYFP`), and tracts (`TRACTCE`); the unique geography identifier, created from FIPS codes (`GEOID`); and the full geography name (`NAMELSAD`). The dataset has a coordinate reference system (CRS) of EPSG:4269, which is standard for federal agencies.

It is necessary to examine 2010 U.S. census tracts for some of the island territories that have not been evaluated for low-income community status at the time of writing (October 9th, 2023): American Samoa, Guam, Northern Mariana Islands, and the U.S. Vigin Islands. This dataset contains 132 records and also has a CRS of EPSG:4269. In addition to the geometry column, relevant columns include the FIPS Codes for states (`STATEFP10`), counties (`COUNTYFP10`), and tracts (`TRACTCE10`); the unique geography identifier, created from FIPS codes (`GEOID10`); and the full geography name (`NAMELSAD10`)

### Exploration

In [1]:
import glob
import geopandas as gpd

Examine tracts from the 2020 U.S. census. 

In [2]:
fpaths = glob.glob("../data/raw/tl_2020_**_tract.zip")
print(len(fpaths))

56


In [3]:
gdf = gpd.read_file(fpaths[0])
gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 2672 entries, 0 to 2671
Data columns (total 13 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   STATEFP   2672 non-null   object  
 1   COUNTYFP  2672 non-null   object  
 2   TRACTCE   2672 non-null   object  
 3   GEOID     2672 non-null   object  
 4   NAME      2672 non-null   object  
 5   NAMELSAD  2672 non-null   object  
 6   MTFCC     2672 non-null   object  
 7   FUNCSTAT  2672 non-null   object  
 8   ALAND     2672 non-null   int64   
 9   AWATER    2672 non-null   int64   
 10  INTPTLAT  2672 non-null   object  
 11  INTPTLON  2672 non-null   object  
 12  geometry  2672 non-null   geometry
dtypes: geometry(1), int64(2), object(10)
memory usage: 271.5+ KB


In [4]:
gdf.head(2)

Unnamed: 0,STATEFP,COUNTYFP,TRACTCE,GEOID,NAME,NAMELSAD,MTFCC,FUNCSTAT,ALAND,AWATER,INTPTLAT,INTPTLON,geometry
0,37,141,920300,37141920300,9203,Census Tract 9203,G5020,S,236115472,605423,34.6713171,-78.0020072,"POLYGON ((-78.15648 34.67909, -78.15458 34.680..."
1,37,141,990100,37141990100,9901,Census Tract 9901,G5020,S,0,133298393,34.3492469,-77.6031267,"POLYGON ((-77.71433 34.29117, -77.71406 34.291..."


In [5]:
gdf.crs

<Geographic 2D CRS: EPSG:4269>
Name: NAD83
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. Puerto Rico. United States (USA) - Alabama; Alaska; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Hawaii; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming. US Virgin Islands. British Virgin Islands

In [6]:
num_tracts_states = 0
num_tracts_territories = 0
crs_name = []

for fpath in fpaths:
    # Load file
    gdf = gpd.read_file(fpath)

    # Record number of tracts
    is_terr = fpath.split("/")[-1].split("_")[2] >= '60'
    if is_terr:
        num_tracts_territories += len(gdf)
    else:
        num_tracts_states += len(gdf)

    # Record CRS
    crs_name.append(gdf.crs.name)

In [7]:
print(f"There are {num_tracts_states:,} tract(s) in the 50 U.S. states "
      f"and {num_tracts_territories:,} tract(s) in the U.S. territories, "
      f"({num_tracts_states + num_tracts_territories:,} total).")

There are 84,414 tract(s) in the 50 U.S. states and 1,114 tract(s) in the U.S. territories, (85,528 total).


In [8]:
any([name != "NAD83" for name in crs_name])

False

Examine tracts from the 2010 U.S. census.

In [9]:
fpaths = glob.glob("../data/raw/tl_2010_**_tract10.zip")
print(len(fpaths))

4


In [10]:
gdf = gpd.read_file(fpaths[0])
gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 57 entries, 0 to 56
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   STATEFP10   57 non-null     object  
 1   COUNTYFP10  57 non-null     object  
 2   TRACTCE10   57 non-null     object  
 3   GEOID10     57 non-null     object  
 4   NAME10      57 non-null     object  
 5   NAMELSAD10  57 non-null     object  
 6   MTFCC10     57 non-null     object  
 7   FUNCSTAT10  57 non-null     object  
 8   ALAND10     57 non-null     int64   
 9   AWATER10    57 non-null     int64   
 10  INTPTLAT10  57 non-null     object  
 11  INTPTLON10  57 non-null     object  
 12  geometry    57 non-null     geometry
dtypes: geometry(1), int64(2), object(10)
memory usage: 5.9+ KB


In [11]:
gdf.head(2)

Unnamed: 0,STATEFP10,COUNTYFP10,TRACTCE10,GEOID10,NAME10,NAMELSAD10,MTFCC10,FUNCSTAT10,ALAND10,AWATER10,INTPTLAT10,INTPTLON10,geometry
0,66,10,950401,66010950401,9504.01,Census Tract 9504.01,G5020,S,3012167,0,13.5634095,144.8537892,"POLYGON ((144.85748 13.56992, 144.85768 13.569..."
1,66,10,951901,66010951901,9519.01,Census Tract 9519.01,G5020,S,1135163,0,13.5124768,144.8141771,"POLYGON ((144.81719 13.50794, 144.81685 13.507..."


In [12]:
num_tracts_territories = 0
crs_name = []

for fpath in fpaths:
    gdf = gpd.read_file(fpath)
    num_tracts_territories += len(gdf)
    crs_name.append(gdf.crs.name)

In [13]:
print(f"There are {num_tracts_territories:,} census tracts (2010) "
      "for the remaining U.S. territories.")

There are 132 census tracts (2010) for the remaining U.S. territories.


In [14]:
any([name != "NAD83" for name in crs_name])

False