# Municipalities

Exploratory data analysis of the raw 2020 TIGER/Line Shapefiles for U.S. census places and county subdivisions.

### Summary

We initially considered both U.S. census places and county subdivisions to generate the dataset:

**Places.** The raw dataset of U.S. places for 2020 is split into 56 files, each representing a "concentration of population [... that] may or may not have legally prescribed limits, powers, or functions. This concentration of population must have a name, be locally recognized, and not be part of any other place" ([Reference](https://www2.census.gov/geo/pdfs/reference/GARM/Ch9GARM.pdf)). There are 32,188 place(s) total spanning the 50 U.S. states, the District of Columbia, American Samoa, Guam, Northern Mariana Islands, Puerto Rico, and the U.S. Virgin Islands. In addition to the geometry column, relevant columns include the geography name (`NAME`) and computed name (`NAMELSAD`), which references the political subdivision (i.e., town, village, etc.). The dataset has a coordinate reference system (CRS) of EPSG:4269, which is standard for federal agencies.

**County Subdivisions.** The U.S. Census Bureau defines county subdivisions as "minor civil divisions (MCDs) or census county divisions (CCDs). A State has either MCDs or their statistical quivalents, or CCDs; it cannot contain both. [...] In the State of Alaska, whih has no counties and no MCDs, the Census Bureau and State officials have established census subareas (CSAs) as the statistical equivalents of MCDs" ([Reference](https://www2.census.gov/geo/pdfs/reference/GARM/Ch8GARM.pdf)). This raw dataset of U.S. county subdivisions is also split into 56 files for the same geographic extent and contains 36,639 records and a CRS of EPSG:4269. Relevant columns include the geometry, name (`NAME`), and computed name (`NAMELSAD`).

However, we later determined that county subdivisions were more accurate representations of municipalities and only used that dataset for the pipeline.

### Exploration

Examine census places.

In [1]:
import geopandas as gpd
import glob

In [2]:
fpaths = glob.glob("../data/raw/census/places/tl_2020_39_place.zip")
fpaths.sort()

In [3]:
gdf = gpd.read_file(fpaths[0])
gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 1265 entries, 0 to 1264
Data columns (total 17 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   STATEFP   1265 non-null   object  
 1   PLACEFP   1265 non-null   object  
 2   PLACENS   1265 non-null   object  
 3   GEOID     1265 non-null   object  
 4   NAME      1265 non-null   object  
 5   NAMELSAD  1265 non-null   object  
 6   LSAD      1265 non-null   object  
 7   CLASSFP   1265 non-null   object  
 8   PCICBSA   1265 non-null   object  
 9   PCINECTA  1265 non-null   object  
 10  MTFCC     1265 non-null   object  
 11  FUNCSTAT  1265 non-null   object  
 12  ALAND     1265 non-null   int64   
 13  AWATER    1265 non-null   int64   
 14  INTPTLAT  1265 non-null   object  
 15  INTPTLON  1265 non-null   object  
 16  geometry  1265 non-null   geometry
dtypes: geometry(1), int64(2), object(14)
memory usage: 168.1+ KB


In [4]:
gdf.head(2)

Unnamed: 0,STATEFP,PLACEFP,PLACENS,GEOID,NAME,NAMELSAD,LSAD,CLASSFP,PCICBSA,PCINECTA,MTFCC,FUNCSTAT,ALAND,AWATER,INTPTLAT,INTPTLON,geometry
0,39,58912,2399590,3958912,Osgood,Osgood village,47,C1,N,N,G4110,A,902595,0,40.339539,-84.4960179,"POLYGON ((-84.50548 40.33951, -84.50548 40.339..."
1,39,83972,2400154,3983972,Weston,Weston village,47,C1,N,N,G4110,A,2951747,6786,41.3459807,-83.7946092,"POLYGON ((-83.80560 41.35394, -83.80521 41.353..."


In [5]:
gdf.crs

<Geographic 2D CRS: EPSG:4269>
Name: NAD83
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. Puerto Rico. United States (USA) - Alabama; Alaska; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Hawaii; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming. US Virgin Islands. British Virgin Islands

In [6]:
num_places = 0
class_name = []
crs_name = []

for fpath in fpaths:
    gdf = gpd.read_file(fpath)
    num_places += len(gdf)
    class_name.extend(gdf["CLASSFP"].unique().tolist())
    crs_name.append(gdf.crs.name)

In [7]:
print(f"There are {num_places:,} place(s) spanning the 50 U.S. states, "
      f"the District of Columbia, and the U.S. territories.")

There are 1,265 place(s) spanning the 50 U.S. states, the District of Columbia, and the U.S. territories.


In [8]:
for p in gdf["NAMELSAD"].sort_values().tolist():
    print(p)

Aberdeen village
Ada village
Adamsville village
Addyston village
Adelphi village
Adena village
Ai CDP
Akron city
Albany village
Alexandria village
Alger village
Alliance city
Alvordton CDP
Amanda village
Amberley village
Amelia CDP
Amesville village
Amherst city
Amsterdam village
Andersonville CDP
Andover village
Anna village
Ansonia village
Antioch village
Antwerp village
Apple Creek village
Apple Valley CDP
Aquilla village
Arcadia village
Arcanum village
Archbold village
Arlington Heights village
Arlington village
Ashland city
Ashley village
Ashtabula city
Ashville village
Athalia village
Athens city
Attica village
Atwater CDP
Aurora city
Austinburg CDP
Austintown CDP
Avon Lake city
Avon city
Bailey Lakes village
Bainbridge CDP
Bainbridge village
Bairdstown village
Ballville CDP
Baltic village
Baltimore village
Bannock CDP
Barberton city
Barnesville village
Barnhill village
Bascom CDP
Bass Lake CDP
Batavia village
Batesville village
Bay View village
Bay Village city
Beach City villag

In [9]:
set(sorted(class_name))
# Exclude C9, M2, 

{'C1', 'C2', 'C5', 'C6', 'M2', 'U1', 'U2'}

In [10]:
set(sorted(crs_name))

{'NAD83'}

Examine county subdivisions.

In [11]:
fpaths = glob.glob("../data/raw/census/county_subdivisions/tl_2020_20_cousub.zip")
fpaths.sort()
print(len(fpaths))

1


In [12]:
gdf = gpd.read_file(fpaths[0])
gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 1531 entries, 0 to 1530
Data columns (total 19 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   STATEFP   1531 non-null   object  
 1   COUNTYFP  1531 non-null   object  
 2   COUSUBFP  1531 non-null   object  
 3   COUSUBNS  1531 non-null   object  
 4   GEOID     1531 non-null   object  
 5   NAME      1531 non-null   object  
 6   NAMELSAD  1531 non-null   object  
 7   LSAD      1531 non-null   object  
 8   CLASSFP   1531 non-null   object  
 9   MTFCC     1531 non-null   object  
 10  CNECTAFP  0 non-null      float64 
 11  NECTAFP   0 non-null      float64 
 12  NCTADVFP  0 non-null      float64 
 13  FUNCSTAT  1531 non-null   object  
 14  ALAND     1531 non-null   int64   
 15  AWATER    1531 non-null   int64   
 16  INTPTLAT  1531 non-null   object  
 17  INTPTLON  1531 non-null   object  
 18  geometry  1531 non-null   geometry
dtypes: float64(3), geometry(1), int64(2), ob

In [13]:
gdf.query("GEOID == '2015171242'")

Unnamed: 0,STATEFP,COUNTYFP,COUSUBFP,COUSUBNS,GEOID,NAME,NAMELSAD,LSAD,CLASSFP,MTFCC,CNECTAFP,NECTAFP,NCTADVFP,FUNCSTAT,ALAND,AWATER,INTPTLAT,INTPTLON,geometry
909,20,151,71242,470371,2015171242,10,Township 10,45,T1,G4040,,,,A,187584433,1333,37.5123571,-98.8940644,"POLYGON ((-99.01404 37.55687, -99.00651 37.556..."


In [14]:
gdf.query("NAMELSAD == 'Adams township'")

Unnamed: 0,STATEFP,COUNTYFP,COUSUBFP,COUSUBNS,GEOID,NAME,NAMELSAD,LSAD,CLASSFP,MTFCC,CNECTAFP,NECTAFP,NCTADVFP,FUNCSTAT,ALAND,AWATER,INTPTLAT,INTPTLON,geometry
125,20,131,275,472955,2013100275,Adams,Adams township,44,T1,G4040,,,,A,93063030,60835,39.783312,-95.9578619,"POLYGON ((-96.01448 39.76902, -96.01429 39.783..."


In [15]:
gdf.head(2)

Unnamed: 0,STATEFP,COUNTYFP,COUSUBFP,COUSUBNS,GEOID,NAME,NAMELSAD,LSAD,CLASSFP,MTFCC,CNECTAFP,NECTAFP,NCTADVFP,FUNCSTAT,ALAND,AWATER,INTPTLAT,INTPTLON,geometry
0,20,79,63825,473687,2007963825,Sedgwick,Sedgwick township,44,T1,G4040,,,,A,92612259,0,37.9554353,-97.4266635,"POLYGON ((-97.48222 37.96047, -97.48222 37.963..."
1,20,79,59350,473711,2007959350,Richland,Richland township,44,T1,G4040,,,,A,94141352,194211,37.9628333,-97.2169326,"POLYGON ((-97.26373 37.99964, -97.26358 37.999..."


In [16]:
gdf.crs

<Geographic 2D CRS: EPSG:4269>
Name: NAD83
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. Puerto Rico. United States (USA) - Alabama; Alaska; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Hawaii; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming. US Virgin Islands. British Virgin Islands

In [17]:
num_divisions = 0
class_name = []
crs_name = []

for fpath in fpaths:
    gdf = gpd.read_file(fpath)
    num_divisions += len(gdf)
    class_name.extend(gdf["CLASSFP"].unique().tolist())
    crs_name.append(gdf.crs.name)

In [18]:
print(f"There are {num_divisions:,} county subdivision(s) spanning the 50 U.S. "
      f"states, the District of Columbia, and the U.S. territories.")

There are 1,531 county subdivision(s) spanning the 50 U.S. states, the District of Columbia, and the U.S. territories.


In [19]:
for p in gdf["NAMELSAD"].sort_values().tolist():
    print(p)

Abilene city
Achilles township
Adams township
Adell township
Adrian township
Aetna township
Afton township
Agency township
Agnes City township
Alamota township
Albano township
Albion township
Albion township
Albion township
Alexander-Belle Prairie township
Alexandria township
Allen township
Allen township
Allison township
Allodium township
Alma township
Almena-District 4 township
Alta township
Altory township
Americus township
Anthony city
Appanoose township
Appleton township
Arcade township
Arion township
Arkansas City city
Arlington township
Arvonia township
Ash Creek township
Ash Valley township
Asherville township
Ashland township
Atchison city
Athelstane township
Athens township
Atlanta township
Attica township
Atwood township
Aubry township
Auburn township
Augusta city
Augusta township
Augustine township
Aurora township
Avilla township
Avon township
Avon township
Bachelor township
Baker township
Baker township
Bala township
Balderson township
Banner township
Banner township
Banne

In [20]:
set(sorted(class_name))
# Exlude T9, Z1 (inactive), z9

{'C5', 'T1', 'T9'}

In [21]:
set(sorted(crs_name))

{'NAD83'}