# ZRP Example Usage
The purpose of this notebook is to illustrate how to use ZRP, the main class of the zrp package that processes user input data &  returns race/ethnicity predictions

In [1]:
%load_ext autoreload
%autoreload 2
%config Completer.use_jedi=False

In [2]:
from os.path import join, expanduser
import pandas as pd
import sys
import os
import re
import warnings

## Set source code path here

In [3]:
warnings.filterwarnings(action='once')
home = expanduser('~')

src_path = os.getcwd()
sys.path.append(src_path)

In [4]:
from zrp import ZRP
from zrp.prepare.utils import load_file, load_json

## Load sample data for prediction
Load list of New Jersey Mayors downloaded from https://www.nj.gov/dca/home/2022mayors.csv 

In [5]:
nj_mayors = load_file(src_path + "/2022-nj-mayors.csv")
nj_mayors.shape

(565, 18)

In [6]:
nj_mayors

Unnamed: 0,MUNI CODE,MUNI NAME,COUNTY,ADDRESS 1,ADDRESS 2,CITY,STATE,ZIP,PHONE,FAX,MAYOR NAME,TERM START,TERM END,FORM,TERM LEGNTH,EMAIL,SOCIAL MEDIA HANDLE,Municipal Contact List
0,1330,Aberdeen Township,Monmouth,One Aberdeen Square,,Aberdeen,NJ,07747-2300,(732) 583-4200,,Fred Tagliarini,,12/31/2025,COUNCIL-MANAGER,4,fred.tagliarini@aberdeennj.org,,
1,0101,Absecon City,Atlantic,Absecon Municipal Complex,500 Mill Road,Absecon,NJ,08201,(609) 641-0663,(609) 645-5098,Kimberly Horton,,12/31/2024,MAYOR-COUNCIL,3,khorton@abseconnj.org,,
2,1001,Alexandria Township,Hunterdon,782 Frenchtown Road,,Milford,NJ,08848,(908) 996-7071,,Gabe Plumer,,12/31/2022,TOWNSHIP,3,clerk@alexandrianj.gov,,
3,2101,Allamuchy Township,Warren,Post Office Box A,,Allamuchy,NJ,07820,(908) 852-5132,,Rosemary Tuohy,,12/31/2024,FAULKNER ACT,3,mayor@allamuchynj.org,,
4,0201,Allendale Borough,Bergen,500 West Crescent Avenue,,Allendale,NJ,07401,(201) 818-4400,,Ari Bernstein,,12/31/2022,,,aribernstein@allendalenj.gov,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
560,0269,Wood-Ridge Borough,Bergen,85 Humboldt Street,,Wood-Ridge,NJ,07075-2344,(201) 939-0202,,Paul A Sarlo,,12/31/2023,,,psarlo@njwoodridge.org,,
561,1715,Woodstown Borough,Salem,Post Office Box 286,,Woodstown,NJ,08098,(856) 769-2200,,Donald Dietrich,,12/31/2023,,,Don.dietrich@comcast.net,,
562,0824,Woolwich Township,Gloucester,120 Village Green Drive,,Woolwich Township,NJ,08085-3180,(856) 467-2666,,Craig Frederick,,12/31/2024,,,cfrederick@woolwichtwp.org,,
563,0340,Wrightstown Borough,Burlington,21 Saylors Pond Road,,Wrightstown,NJ,08562,(609) 723-4450,(609) 723-7137,Donald Cottrell,,12/31/2022,,,mayor@wrightstownborough.com,,


### Wrangle NJ mayor data for predictions
Prepare the NJ mayor data.  This parsing of the NJ mayors file will leave some NA's, but it is sufficient for demonstration purposes


In [None]:
zrp_sample = pd.DataFrame(columns=['first_name', 'middle_name', 'last_name', 'house_number', 'street_address', 'city', 'state', 'zip_code'])

Prepare Names

In [None]:
split_mayor_names = nj_mayors['MAYOR NAME'].str.split(' ')
zrp_sample['first_name'] = split_mayor_names.str[0]
zrp_sample['last_name'] = split_mayor_names.str[-1]

City, State, Zip

In [None]:
zrp_sample['city'] = nj_mayors['CITY']
zrp_sample['state'] = nj_mayors['STATE']
zrp_sample['zip_code'] = nj_mayors['ZIP']

Address

In [None]:
zrp_sample['house_number'] = nj_mayors['ADDRESS 1'].str.extract('([0-9]+)')
zrp_sample['street_address'] = nj_mayors['ADDRESS 1'].str.extract('.*[0-9]+([^0-9]+)')


In [None]:
zrp_sample['ZEST_KEY'] = zrp_sample.index.astype(str)  #must specify key to establish correspondence between inputs and outputs
zrp_sample

### Invoke the Zest Race Predictor on the sample data

To run with custom names provide a mapping of custom column names to the expected column names, for example:

`        ZRP(**{'first_name':'example_first_name',
               'middle_name':'example_middle_name',
               'last_name':'example_last_name',
               'house_number':'example_house_number',
               'street_address':'example_street_address',
               'zip_code':'example_zip_code',
               'state':'example_state',
               'census_tract':'example_census_tract',
               'block_group':'example_block_group',
        })`
        
All of the above dictionary keys are recommended to provide. If Census tract or Census block group are unavailable, `ZRP()` will geocode the input data using Census shapefile data. If house number also is not available `ZRP()` will use zip/postal codes and underlying data to return proxies. While all other columns are required if columns like middle name (or even first or last name) are highly or fully missing `ZRP()` will still be able to generate proxies. To accommodate more fair audit workflows we have enabled generating ZRP (name + geo), BISG (name + geo), ZRP name-only, and ZRP geo-only proxies.

Initialize, fit & transform `ZRP()`

In [None]:
%%time
zest_race_predictor = ZRP()
zest_race_predictor.fit()
zrp_output = zest_race_predictor.transform(zrp_sample)

### Inspect the output and join

In [None]:
zrp_output

### Check the most likely Hispanic 

In [None]:
zrp_output.nlargest(10, "HISPANIC")

### Check the most likely Black

In [None]:
zrp_output.nlargest(10, "BLACK")

BISG proxies are saved by default when `ZRP` is ran. Below we import the BISG proxies in.

In [None]:
bisg_output = pd.read_feather("artifacts/bisg_proxy_output.feather")

In [None]:
bisg_output.head()

In [None]:
bisg_output

How many proxies does BISG return?

In [None]:
f"Out of {bisg_output.shape[0]} records only {bisg_output[bisg_output.race_proxy.notna()].shape[0]} proxies are returned"  

How many proxies does ZRP return?

In [None]:
f"Out of {zrp_output.shape[0]} records {zrp_output[zrp_output.race_proxy.notna()].shape[0]} proxies are returned"  
