# How to Build a Geo Lookup Table
The purpose of this notebook is to illustrate how to build a Geo-Lookup table, to use for geocoding addresses. In the ZRP pipeline, data is inputted as a dataframe with the following column: first name, middle name, last name, house number, street address (street name), city, state, zip code, and zest key. The 'zest key' must be specified to establish correspondence between inputs and outputs; it's effectively used as an index for the data table. The address data is mapped to a geocoded location (block group, census tract, or zipcode) using the lookup tables generated via the processes demonstrated in this example. This geocoded address will then be cross referenced with the ACS tables to determine the ACS features that will be a part of the feature vector ultimately trained on. In this example Alabama county level Census Tigerline shapefiles will be used generate a lookup table.

In [1]:
%load_ext autoreload
%autoreload 2
%config Completer.use_jedi=False

In [2]:
from os.path import join, expanduser, dirname
import pandas as pd
import sys
import os
import re
import warnings

In [3]:
warnings.filterwarnings(action='ignore')
home = expanduser('~')

src_path = '{}/zrp'.format(home)
sys.path.append(src_path)

Predefine paths & required parameters

In [12]:
# Support files path pointing to where the raw tigerline shapefile data is stored
support_files_path = "INSERT-PATH-HERE"
# Year of shapefile data
year = "2019"
# Geo level to build lookup table at
st_cty_code = "01001"

Import Geo Lookup Functions

In [5]:
from zrp.prepare.geo_lookup import *

  version = LooseVersion(pd.__version__)
  return f(*args, **kwds)


### Initialize `GeoLookUpBuilder`
This class constructs geographic lookup tables that enable geocoding. Census Tigerline shapefiles are required for this module to run. You can retrieve 2019 shapefiles from [https://www2.census.gov/geo/tiger/TIGER2019/](https://www2.census.gov/geo/tiger/TIGER2019/)

In [8]:
geo_build = GeoLookUpBuilder(support_files_path = support_files_path, year = year)

### Run `GeoLookUpBuilder`
Provide the state-county fips code to build a county level lookup table.
- No data is out because `save_table` is set to False. If True then the data will be saved to a file 

In [9]:
%%time
output = geo_build.transform(st_cty_code, save_table = False)

Directory already exists
Directory already exists
Shapefile input: /d/shared/zrp/shared_data/raw/geo/2019
Lookup Table output: /d/shared/zrp/shared_data/processed/geo/2019_backup

 ... Loading requirements
 ... Creating lookup table
 ... Formatting lookup table
   [Start] Processing lookup data
     ...processing
   [Completed] Processing lookup data
     Number of observations: 6134
     Is key unique: False
{'is_empty': False, 'is_all_missing': False, 'n_obs': 6134, 'is_unique_key': False, 'pct_na': {'TLID': 0.0, 'TFID': 0.0, 'ARID': 0.0, 'LINEARID': 0.0, 'ZEST_FULLNAME': 0.0, 'FROMHN': 0.0, 'TOHN': 0.0, 'ZEST_ZIP': 0.0, 'EDGE_MTFCC': 0.0, 'ROAD_MTFCC': 0.0, 'PARITY': 0.0, 'FROMTYP': 0.7508966416693837, 'TOTYP': 0.7425823280078252, 'OFFSET': 0.0, 'PLUS4': 1.0, 'STATEFP': 0.0, 'COUNTYFP': 0.0, 'FROMADD': 0.0, 'TOADD': 0.0, 'SIDE': 0.0, 'STATEFP10': 0.0, 'COUNTYFP10': 0.0, 'TRACTCE10': 0.0, 'BLKGRPCE10': 0.0, 'BLOCKCE10': 0.0, 'ZCTA5CE10': 0.0, 'PUMACE10': 0.0, 'TRACTCE': 0.0, 'BLKGRPC

### Inspect the output


In [10]:
output.head()

Unnamed: 0,TLID,TFID,ARID,LINEARID,ZEST_FULLNAME,FROMHN,TOHN,ZEST_ZIP,EDGE_MTFCC,ROAD_MTFCC,...,PUMACE,RAW_ZEST_ZIP,RAW_ZEST_STATEFP,RAW_ZEST_COUNTYFP,RAW_ZEST_FULLNAME,RAW_ZEST_TRACTCE,RAW_ZEST_BLKGRPCE,GEOID_ZIP,GEOID_CT,GEOID_BG
0,2824306,215953570,400540115507,110585092994,VERNON SHEPPARD RD,1601,1699,36758,S1400,S1400,...,2100,36758,1,1,VERNON SHEPPARD RD,21000,1,36758,1001021000,10010210001
1,2827185,215951565,400540110481,110585080961,HUIE ST,1201,1257,36066,S1400,S1400,...,2100,36066,1,1,HUIE ST,20400,3,36066,1001020400,10010204003
2,2827183,215951565,400540112823,110585081823,PLUM ST,1246,1200,36066,S1400,S1400,...,2100,36066,1,1,PLUM ST,20400,3,36066,1001020400,10010204003
3,2827193,215951562,400540110742,110585082908,ADELL PL,1249,1299,36066,S1400,S1400,...,2100,36066,1,1,ADELL PL,20400,3,36066,1001020400,10010204003
5,632515963,215954868,4003991212476,1104281921831,US HWY 31,1837,1853,36067,S1200,S1200,...,2100,36067,1,1,US HWY 31,20802,4,36067,1001020802,10010208024


In [11]:
output.tail()

Unnamed: 0,TLID,TFID,ARID,LINEARID,ZEST_FULLNAME,FROMHN,TOHN,ZEST_ZIP,EDGE_MTFCC,ROAD_MTFCC,...,PUMACE,RAW_ZEST_ZIP,RAW_ZEST_STATEFP,RAW_ZEST_COUNTYFP,RAW_ZEST_FULLNAME,RAW_ZEST_TRACTCE,RAW_ZEST_BLKGRPCE,GEOID_ZIP,GEOID_CT,GEOID_BG
3819,639781811,263372395,400540115283,110585091166,OREGON CT,300,422,36067,S1400,S1400,...,2100,36067,1,1,OREGON CT,20100,2,36067,1001020100,10010201002
3820,2837232,263372395,4005599626924,110585080948,GREG ST,408,498,36067,S1400,S1400,...,2100,36067,1,1,GREG ST,20100,2,36067,1001020100,10010201002
3821,639781818,263372395,4005599633498,110585091751,BIRDSONG ST,299,201,36067,S1400,S1400,...,2100,36067,1,1,BIRDSONG ST,20100,2,36067,1001020100,10010201002
3822,2843384,263675520,4003990667645,110585093255,AUTAUGA COUNTY 68,171,143,36022,S1400,S1400,...,2100,36022,1,1,AUTAUGA COUNTY 68,20900,3,36022,1001020900,10010209003
3823,2832920,263675520,4005554327180,1104281921831,US HWY 31,2754,2758,36022,S1200,S1200,...,2100,36022,1,1,US HWY 31,20900,3,36022,1001020900,10010209003
