# Counties

Exploratory data analysis of the raw TIGER/Line Shapefile for counties.

#### Summary

The dataset contains 3,234 counties and county equivalents. Relevant fields include the state and county FIPS Codes, `STATEFP` and `COUNTYFP`, which are combined together in the `GEOID` column, as well as the full name column, `NAMELSAD`. There are 91 records based in the following U.S. territories:

1. American Samoa
2. Commonwealth of the Northern Mariana Islands
3. District of Columbia
4. Guam
5. Puerto Rico
6. United States Virgin Islands

One should note that only 1,923 unique county names exist (`NAME`) while 1,969 full names exist (`NAMELSAD`).  It will be necessary to pair the full names with a state identifier to ensure that every county name is unique. Finally, the dataset has a coordinate reference system (CRS) of EPSG:4269, which is standard for federal agencies.

#### Exploration

In [1]:
import geopandas as gpd

In [2]:
gdf = gpd.read_file("../data/raw/census/counties/tl_2020_us_county.zip")
gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 3234 entries, 0 to 3233
Data columns (total 18 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   STATEFP   3234 non-null   object  
 1   COUNTYFP  3234 non-null   object  
 2   COUNTYNS  3234 non-null   object  
 3   GEOID     3234 non-null   object  
 4   NAME      3234 non-null   object  
 5   NAMELSAD  3234 non-null   object  
 6   LSAD      3234 non-null   object  
 7   CLASSFP   3234 non-null   object  
 8   MTFCC     3234 non-null   object  
 9   CSAFP     1256 non-null   object  
 10  CBSAFP    1916 non-null   object  
 11  METDIVFP  110 non-null    object  
 12  FUNCSTAT  3234 non-null   object  
 13  ALAND     3234 non-null   int64   
 14  AWATER    3234 non-null   int64   
 15  INTPTLAT  3234 non-null   object  
 16  INTPTLON  3234 non-null   object  
 17  geometry  3234 non-null   geometry
dtypes: geometry(1), int64(2), object(15)
memory usage: 454.9+ KB


In [3]:
gdf.head(5)

Unnamed: 0,STATEFP,COUNTYFP,COUNTYNS,GEOID,NAME,NAMELSAD,LSAD,CLASSFP,MTFCC,CSAFP,CBSAFP,METDIVFP,FUNCSTAT,ALAND,AWATER,INTPTLAT,INTPTLON,geometry
0,31,39,835841,31039,Cuming,Cuming County,6,H1,G4020,,,,A,1477645345,10690204,41.9158651,-96.7885168,"POLYGON ((-97.01952 42.00410, -97.01952 42.004..."
1,53,69,1513275,53069,Wahkiakum,Wahkiakum County,6,H1,G4020,,,,A,680976231,61568965,46.2946377,-123.4244583,"POLYGON ((-123.43639 46.23820, -123.44759 46.2..."
2,35,11,933054,35011,De Baca,De Baca County,6,H1,G4020,,,,A,6016818946,29090018,34.3592729,-104.3686961,"POLYGON ((-104.56739 33.99757, -104.56772 33.9..."
3,31,109,835876,31109,Lancaster,Lancaster County,6,H1,G4020,339.0,30700.0,,A,2169272970,22847034,40.7835474,-96.6886584,"POLYGON ((-96.91075 40.78494, -96.91075 40.790..."
4,31,129,835886,31129,Nuckolls,Nuckolls County,6,H1,G4020,,,,A,1489645188,1718484,40.1764918,-98.0468422,"POLYGON ((-98.27367 40.08940, -98.27367 40.089..."


In [4]:
len(gdf.NAME.unique())

1923

In [5]:
len(gdf.NAMELSAD.unique())

1969

In [6]:
gdf["count"] = 1
state_county_counts = gdf.groupby(["STATEFP", "NAMELSAD"]).count()
state_county_counts["count"].describe()

count    3234.0
mean        1.0
std         0.0
min         1.0
25%         1.0
50%         1.0
75%         1.0
max         1.0
Name: count, dtype: float64

In [7]:
counties_outside_states = gdf.query("STATEFP >= '60'")
len(counties_outside_states)

91

In [8]:
counties_outside_states["STATEFP"].sort_values().unique()

array(['60', '66', '69', '72', '78'], dtype=object)

In [9]:
gdf.crs

<Geographic 2D CRS: EPSG:4269>
Name: NAD83
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. Puerto Rico. United States (USA) - Alabama; Alaska; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Hawaii; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming. US Virgin Islands. British Virgin Islands