# How to Parse Raw ACS Data
The purpose of this notebook is to parse American Community Survey data. As stated on the United States Census Bureau site, “The 5-year estimates from the ACS are "period" estimates that represent data collected over a period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups." This example will parse ACS data from Alabama and return each parsed table in a dictionary.

In [1]:
%load_ext autoreload
%autoreload 2
%config Completer.use_jedi=False

In [2]:
from os.path import join, expanduser
import datetime as dt
import pandas as pd
import numpy as np
import warnings
import glob
import json
import sys
import os
import re

In [3]:
warnings.filterwarnings(action='once')
home = expanduser('~')

In [4]:
src_path = '{}/zest-race-predictor/playground/kam/zrp'.format(home)
sys.path.append(src_path)

Predefine paths & required parameters

In [5]:
# Support files path pointing to where the raw ACS data is stored
support_files_path = "/d/shared/zrp/shared_data"
# Year of ACS data
year = "2019"
# Span of ACS data. The ACS data is available in 1 or 5 year spans. 
span = "5"
# State
state_level = "al"
# State County FIPs Code
st_cty_code = "01001"

Import ACS Lookup Functions

In [6]:
from prepare.acs_lookup import *


### Initialize `ACS_Parser`
This class constructs American Community Survey lookup tables that enables race approximation.

In [7]:
acs_parse = ACS_Parser(support_files_path = support_files_path, year = year, span = span, state_level = state_level, n_jobs=-1 )

### Run `ACS_Parser`
Lookup tables are saved by default. 

In [8]:
%%time 
output = acs_parse.transform(False)

  0%|          | 0/141 [00:00<?, ?it/s][Parallel(n_jobs=-1)]: Using backend LokyBackend with 60 concurrent workers.
100%|██████████| 141/141 [00:57<00:00,  2.45it/s]


CPU times: user 20.7 s, sys: 6.07 s, total: 26.8 s
Wall time: 2min 38s


[Parallel(n_jobs=-1)]: Done 141 out of 141 | elapsed:  2.6min finished


### Preview 

In [9]:
output.keys()

dict_keys(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99', '100', '101', '102', '103', '104', '105', '106', '107', '108', '109', '110', '111', '112', '113', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '130', '131', '132', '133', '134', '135', '136', '137', '138', '139', '140', '141'])

In [10]:
output['1']['data'].head()
print(output['1']['data'].shape)

(8857, 244)


In [11]:
output['100']['data'].head()
print(output['100']['data'].head())

  FILEID FILETYPE STUSAB CHARITER SEQUENCE LOGRECNO B24123_246 B24123_247  \
0  ACSSF   2019e5     al      000     0100  0001772        NaN        NaN   
1  ACSSF   2019e5     al      000     0100  0001773        NaN        NaN   
2  ACSSF   2019e5     al      000     0100  0001774        NaN        NaN   
3  ACSSF   2019e5     al      000     0100  0001775        NaN        NaN   
4  ACSSF   2019e5     al      000     0100  0001776        NaN        NaN   

  B24123_248 B24123_249  ... B24123_484 B24123_485 B24123_486 B24123_487  \
0        NaN        NaN  ...        NaN        NaN        NaN        NaN   
1        NaN        NaN  ...        NaN        NaN        NaN        NaN   
2        NaN        NaN  ...        NaN        NaN        NaN        NaN   
3        NaN        NaN  ...        NaN        NaN        NaN        NaN   
4        NaN        NaN  ...        NaN        NaN        NaN        NaN   

  B24123_488 B24123_489 B24123_490 State               GEOID  \
0        NaN    