# How to Parse Raw ACS Data
The purpose of this notebook is to parse American Community Survey data. As stated on the United States Census Bureau site, “The 5-year estimates from the ACS are "period" estimates that represent data collected over a period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups." This example will parse ACS data from Alabama and return each parsed table in a dictionary.

In [1]:
%load_ext autoreload
%autoreload 2
%config Completer.use_jedi=False

In [2]:
from os.path import join, expanduser, dirname
import pandas as pd
import sys
import os
import re
import warnings

In [3]:
warnings.filterwarnings(action='ignore')
home = expanduser('~')

src_path = '{}/zrp'.format(home)
sys.path.append(src_path)

In [4]:
from zrp.prepare import ProcessStrings


Predefine paths & required parameters

In [5]:
# Support files path pointing to where the raw ACS data is stored
support_files_path = "INSERT-PATH-HERE"
# Year of ACS data
year = "2019"
# Span of ACS data. The ACS data is available in 1 or 5 year spans. 
span = "5"
# State
state_level = "al"
# State County FIPs Code
st_cty_code = "01001"

Import ACS Lookup Functions

In [6]:
from zrp.prepare.acs_lookup import *


### Initialize `ACS_Parser`
This class constructs American Community Survey lookup tables that enables race approximation. Census American Community Survey data is required for this module to run. You can retrieve 2019 data from
- https://www2.census.gov/programs-surveys/acs/summary_file/2019/data/2019_5yr_Summary_FileTemplates.zip
- https://www2.census.gov/programs-surveys/acs/summary_file/2019/data/5_year_entire_sf/2019_ACS_Geography_Files.zip
- https://www2.census.gov/programs-surveys/acs/summary_file/2019/data/5_year_entire_sf/All_Geographies_Not_Tracts_Block_Groups.zip
- https://www2.census.gov/programs-surveys/acs/summary_file/2019/data/5_year_entire_sf/Tracts_Block_Groups_Only.zip


In [7]:
acs_parse = ACS_Parser(support_files_path = support_files_path, year = year, span = span, state_level = state_level, n_jobs=-1 )

### Run `ACS_Parser`
Lookup tables are saved by default. 

In [9]:
%%time 
output = acs_parse.transform(save_table = False)

  0%|          | 0/141 [00:00<?, ?it/s][Parallel(n_jobs=-1)]: Using backend LokyBackend with 15 concurrent workers.
 32%|███▏      | 45/141 [00:43<01:10,  1.36it/s] [Parallel(n_jobs=-1)]: Done  20 tasks      | elapsed:   44.6s
100%|██████████| 141/141 [02:40<00:00,  1.14s/it]


CPU times: user 15.7 s, sys: 5.2 s, total: 20.9 s
Wall time: 3min 23s


[Parallel(n_jobs=-1)]: Done 141 out of 141 | elapsed:  3.4min finished


### Preview 

In [10]:
output.keys()

dict_keys(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99', '100', '101', '102', '103', '104', '105', '106', '107', '108', '109', '110', '111', '112', '113', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '130', '131', '132', '133', '134', '135', '136', '137', '138', '139', '140', '141'])

In [11]:
output['1'].keys()

dict_keys(['data', 'sequence', 'headers', 'Tracts_Block_Groups', 'Not_Tracts_Block_Groups', 'description'])

In [12]:
output['1']['sequence']

'2019_5yr_Summary_FileTemplates/seq1.xlsx'

In [13]:
print(output['1']['data'].shape)
output['1']['data'].head()

(8857, 244)


Unnamed: 0,FILEID,FILETYPE,STUSAB,CHARITER,SEQUENCE,LOGRECNO,B01001_001,B01001_002,B01001_003,B01001_004,...,B01001F_025,B01001F_026,B01001F_027,B01001F_028,B01001F_029,B01001F_030,B01001F_031,State,GEOID,Geography Name
0,ACSSF,201900000.0,al,0,1,1772,1993,907,34,55,...,0,0,0,0,0,0,0,AL,14000US01001020100,"Census Tract 201, Autauga County, Alabama"
1,ACSSF,201900000.0,al,0,1,1773,1959,1058,79,115,...,0,0,0,0,0,0,0,AL,14000US01001020200,"Census Tract 202, Autauga County, Alabama"
2,ACSSF,201900000.0,al,0,1,1774,3507,1731,62,143,...,0,0,0,33,0,0,0,AL,14000US01001020300,"Census Tract 203, Autauga County, Alabama"
3,ACSSF,201900000.0,al,0,1,1775,3878,1949,64,159,...,0,0,0,0,0,0,0,AL,14000US01001020400,"Census Tract 204, Autauga County, Alabama"
4,ACSSF,201900000.0,al,0,1,1776,10596,5256,229,488,...,0,20,0,0,0,0,0,AL,14000US01001020500,"Census Tract 205, Autauga County, Alabama"
