# How to Build a Geo Lookup Table
The purpose of this notebook is to illustrate how to build a Geo-Lookup table, to use for geocoding addresses. In the ZRP pipeline, data is inputted as a dataframe with the following column: first name, middle name, last name, house number, street address (street name), city, state, zip code, and zest key. The 'zest key' must be specified to establish correspondence between inputs and outputs; it's effectively used as an index for the data table. The address data is mapped to a geocoded location (block group, census tract, or zipcode) using the lookup tables generated via the processes demonstrated in this example. This geocoded address will then be cross referenced with the ACS tables to determine the ACS features that will be a part of the feature vector ultimately trained on. In this example Alabama county level Census Tigerline shapefiles will be used generate a lookup table.

In [1]:
%load_ext autoreload
%autoreload 2
%config Completer.use_jedi=False

In [2]:
from os.path import join, expanduser, dirname
import pandas as pd
import sys
import os
import re
import warnings

In [3]:
warnings.filterwarnings(action='ignore')
home = expanduser('~')

src_path = '{}/zrp'.format(home)
sys.path.append(src_path)

Predefine paths & required parameters

In [4]:
# Support files path pointing to where the raw tigerline shapefile data is stored
support_files_path = "INSERT-PATH-HERE"
# Year of shapefile data
year = "2019"
# Geo level to build lookup table at
st_cty_code = "01001"

Import Geo Lookup Functions

In [5]:
from zrp.prepare.geo_lookup import *

### Initialize `GeoLookUpBuilder`
This class constructs geographic lookup tables that enable geocoding. Census Tigerline shapefiles are required for this module to run. You can retrieve 2019 shapefiles from [https://www2.census.gov/geo/tiger/TIGER2019/](https://www2.census.gov/geo/tiger/TIGER2019/)

In [6]:
geo_build = GeoLookUpBuilder(support_files_path = support_files_path, year = year, output_folder_suffix='_out00')

### Run `GeoLookUpBuilder`
Provide the state-county fips code to build a county level lookup table.
- No data is out because `save_table` is set to False. If True then the data will be saved to a file 

In [7]:
%%time
output = geo_build.transform(st_cty_code, save_table = False)

Directory already exists
Directory already exists
Shapefile input: /d/shared/zrp/shared_data/raw/geo/2019
Lookup Table output: /d/shared/zrp/shared_data/processed/geo/2019__out00

 ... Loading requirements
 ... Creating lookup table
 ... Formatting lookup table
   [Start] Processing lookup data
     ...processing
         ...Base
         ...Map street suffixes...
         ...Mapped & split by street suffixes...
         ...Number processing...

         Address dataframe expansion is complete! (n=7169)
   [Completed] Processing lookup data
     Number of observations: 7174
     Is key unique: False
No tables were saved
CPU times: user 17 s, sys: 180 ms, total: 17.2 s
Wall time: 17.2 s


### Inspect the output


In [8]:
output.head()

Unnamed: 0,STATEFP,COUNTYFP,TRACTCE,BLKGRPCE,ZEST_FULLNAME,FROMHN,TOHN,ZEST_ZIP,ZCTA5CE,ZCTA5CE10,FROMHN_LEFT,FROMHN_RIGHT,TOHN_LEFT,TOHN_RIGHT,PARITY
0,1,1,21100,3,ACADEMY ST,2498,2400,36003,36003,36003,,2400,,2498,E
1,1,1,21100,2,ANDREWS DR,3698,3600,36003,36003,36003,,3600,,3698,E
2,1,1,21100,2,AUTAUGA COUNTY 101,420,438,36003,36003,36003,,420,,438,E
3,1,1,21100,2,AUTAUGA COUNTY 101,500,598,36003,36003,36003,,500,,598,E
4,1,1,21100,3,AUTAUGA COUNTY 133,232,100,36003,36003,36003,,100,,232,E


In [9]:
output.tail()

Unnamed: 0,STATEFP,COUNTYFP,TRACTCE,BLKGRPCE,ZEST_FULLNAME,FROMHN,TOHN,ZEST_ZIP,ZCTA5CE,ZCTA5CE10,FROMHN_LEFT,FROMHN_RIGHT,TOHN_LEFT,TOHN_RIGHT,PARITY
7169,1,1,21000,1,FISCHER LN,4463,4401,36758,36758,36758,,4401,,4463,O
7170,1,1,21000,1,FISCHER LN,4499,4467,36758,36758,36758,,4467,,4499,O
7171,1,1,21000,1,KENT LN,4099,4001,36758,36758,36758,,4001,,4099,O
7172,1,1,21000,1,MARVIN CT,4401,4499,36758,36758,36758,,4401,,4499,O
7173,1,1,21000,1,VERNON SHEPPARD RD,1601,1699,36758,36758,36758,,1601,,1699,O
