# Collect census tract information from ACS

Using 2019 five-year ACS data as an example. The script collects the ACS data to prepare for the meta-learning practice in the future.

Census documentation: https://docs.google.com/spreadsheets/d/1I3FJPtLnG0cgy4wI5KlzwcU-qvmW2b9S/edit?usp=sharing&ouid=115425849068992643587&rtpof=true&sd=true


In [1]:
# install the census data package
!pip install censusdata

Collecting censusdata
  Using cached CensusData-1.15.post1.tar.gz (26.6 MB)
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: censusdata
  Building wheel for censusdata (setup.py) ... [?25ldone
[?25h  Created wheel for censusdata: filename=CensusData-1.15.post1-py3-none-any.whl size=28205746 sha256=a4078cb589b4c980fa44baaf94371443458d92b03ab9857e3db80efae273ba26
  Stored in directory: /Users/autumnstar/Library/Caches/pip/wheels/7c/5b/55/834c5472b44ab5688be29f6009667601fbc13f38cff9dd36e6
Successfully built censusdata
Installing collected packages: censusdata
Successfully installed censusdata-1.15.post1


In [2]:
# install modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import censusdata # to download ACS data
import copy
import pickle
import time



# Collecting data

It is quite challenging to directly download the census data because you need to read through their thick documentation. However, other researchers have overcome such challenges by integrating the data collection process into a package. Censusdata is such an example.

In [3]:
# The censusdata package can support the data collection only up to 2019.
# Check references/ACS_2019_SF_5YR_Appendices.xlsx to understand the meaning of the variables.
# Then use the following line to see the details.
# Based on the intrests of research, the var_list and var_names can be updated. But keep them in order.
censusdata.printtable(censusdata.censustable('acs5', 2019, 'B01002')) # acs5: five year average.

Variable     | Table                          | Label                                                    | Type 
-------------------------------------------------------------------------------------------------------------------
B01002_001E  | MEDIAN AGE BY SEX              | !! !! Estimate Median age -- Total:                      | float
B01002_002E  | MEDIAN AGE BY SEX              | !! !! Estimate Median age -- Male                        | float
B01002_003E  | MEDIAN AGE BY SEX              | !! !! Estimate Median age -- Female                      | float
-------------------------------------------------------------------------------------------------------------------


In [4]:
# define variable list
var_list = [
# population
            'B01003_001E',
            'B01001_001E', 'B01001_002E', 'B01001_026E',
            'B01002_001E',
# households
            'B11001_001E',
# race
            'B02001_001E', 'B02001_002E', 'B02001_003E', 'B02001_004E', 'B02001_005E',
# income info (a lot of NAs)
            'B06010_001E', 'B06010_002E', 'B06010_003E', 'B06010_004E', 'B06010_005E', 'B06010_006E', 'B06010_007E', 'B06010_008E', 'B06010_009E', 'B06010_010E', 'B06010_011E',
            'B06011_001E',
# travel
            'B08301_001E', 'B08301_002E', 'B08301_010E', 'B08301_016E', 'B08301_018E', 'B08301_019E', 'B08301_021E',
# education
            'B15001_001E',
            'B15001_017E', 'B15001_018E', 'B15001_025E', 'B15001_026E', 'B15001_033E', 'B15001_034E', 'B15001_041E', 'B15001_042E',
            'B15001_058E', 'B15001_059E', 'B15001_066E', 'B15001_067E', 'B15001_074E', 'B15001_075E', 'B15001_082E', 'B15001_083E',
            'B15003_001E', 'B15003_022E', 'B15003_023E', 'B15003_025E',
# income info (more complete)
            'B19013_001E', 'B19301_001E',
# employement
            'B23025_001E', 'B23025_002E', 'B23025_007E',
# properties
            'B25002_001E', 'B25002_002E', 'B25002_003E',
            'B25064_001E',
            'B25075_001E', 'B25077_001E',
# imputation
            'B99082_001E'
           ]


In [5]:
var_names = [
# population
             'pop_total',
             'sex_total', 'sex_male', 'sex_female',
             'age_median',
# hosueholds
             'households',
# race
             'race_total', 'race_white', 'race_black', 'race_native', 'race_asian',
# income info (a lot of NAs)
             'inc_total_pop', 'inc_no_pop', 'inc_with_pop', 'inc_pop_10k', 'inc_pop_1k_15k', 'inc_pop_15k_25k', 'inc_pop_25k_35k', 'inc_pop_35k_50k', 'inc_pop_50k_65k', 'inc_pop_65k_75k', 'inc_pop_75k',
             'inc_median_ind',
# travel
             'travel_total_to_work', 'travel_driving_to_work', 'travel_pt_to_work', 'travel_taxi_to_work', 'travel_cycle_to_work', 'travel_walk_to_work', 'travel_work_from_home',
# education
             'edu_total_pop',
             'bachelor_male_25_34', 'master_phd_male_25_34', 'bachelor_male_35_44', 'master_phd_male_35_44', 'bachelor_male_45_64', 'master_phd_male_45_64',  'bachelor_male_65_over', 'master_phd_male_65_over',
             'bachelor_female_25_34', 'master_phd_female_25_34', 'bachelor_female_35_44', 'master_phd_female_35_44', 'bachelor_female_45_64', 'master_phd_female_45_64',  'bachelor_female_65_over', 'master_phd_female_65_over',
             'edu_total', 'edu_bachelor', 'edu_master', 'edu_phd',
# income info (more complete)
             'inc_median_household', 'inc_per_capita',
# employement
            'employment_total_labor', 'employment_employed', 'employment_unemployed',
# properties
             'housing_units_total', 'housing_units_occupied', 'housing_units_vacant',
             'rent_median',
             'property_value_total', 'property_value_median',
# imputation
            'vehicle_total_imputed'
            ]

In [6]:
# US mainland states and FIPS id.
# Now I see that the FIPS order follows the alphabet
# Check: https://www.usgs.gov/faqs/what-constitutes-united-states-what-are-official-definitions
# Check: https://www.mercercountypa.gov/dps/state_fips_code_listing.htm

states_and_fips = {
       '12':'FL'
}

# print
print(len(states_and_fips.keys()))

1


In [7]:
# download Florida data
starting_time = time.time()
us_state_ct = {}

for key_ in states_and_fips.keys():
    print(key_)
    state_ct = censusdata.download('acs5', 2019, censusdata.censusgeo([('state', key_), ('tract', '*')]), var_list)
    us_state_ct[key_] = state_ct

# check the time
end_time = time.time()
processing_time = end_time-starting_time
print(processing_time, ' seconds') # about 6 seconds.

12
7.4676690101623535  seconds


In [8]:
# check the downloaded data.
us_state_ct['12']

Unnamed: 0,B01003_001E,B01001_001E,B01001_002E,B01001_026E,B01002_001E,B11001_001E,B02001_001E,B02001_002E,B02001_003E,B02001_004E,...,B23025_001E,B23025_002E,B23025_007E,B25002_001E,B25002_002E,B25002_003E,B25064_001E,B25075_001E,B25077_001E,B99082_001E
"Census Tract 2.11, Miami-Dade County, Florida: Summary level: 140, state:12> county:086> tract:000211",2812,2812,1383,1429,39.4,931,2812,2086,517,0,...,2327,1660,667,1103,931,172,1592,562,240400,1485
"Census Tract 2.12, Miami-Dade County, Florida: Summary level: 140, state:12> county:086> tract:000212",4709,4709,2272,2437,34.2,1668,4709,2382,1953,0,...,3754,2559,1195,1969,1668,301,1109,295,179900,2167
"Census Tract 2.13, Miami-Dade County, Florida: Summary level: 140, state:12> county:086> tract:000213",5005,5005,2444,2561,34.1,1379,5005,2334,2206,224,...,3760,2381,1379,1646,1379,267,1291,685,254900,2257
"Census Tract 2.14, Miami-Dade County, Florida: Summary level: 140, state:12> county:086> tract:000214",6754,6754,2934,3820,31.3,2238,6754,4052,1671,326,...,4802,3292,1510,2725,2238,487,1135,1029,147800,3207
"Census Tract 1.28, Miami-Dade County, Florida: Summary level: 140, state:12> county:086> tract:000128",3021,3021,1695,1326,44.1,1364,3021,2861,121,0,...,2678,2093,585,2054,1364,690,1349,638,205900,1991
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Census Tract 312, Clay County, Florida: Summary level: 140, state:12> county:019> tract:031200",15742,15742,7957,7785,41.0,5517,15742,13894,1128,64,...,12250,7331,4919,6124,5517,607,1346,4153,206600,6719
"Census Tract 308.01, Clay County, Florida: Summary level: 140, state:12> county:019> tract:030801",5723,5723,2914,2809,43.0,2001,5723,4664,482,0,...,4603,2711,1892,2084,2001,83,1313,1743,211200,2522
"Census Tract 309.02, Clay County, Florida: Summary level: 140, state:12> county:019> tract:030902",10342,10342,4657,5685,37.6,3746,10342,7956,1351,13,...,8149,5270,2879,3907,3746,161,1105,2565,141700,4992
"Census Tract 303.01, Clay County, Florida: Summary level: 140, state:12> county:019> tract:030301",8960,8960,4166,4794,37.2,3324,8960,6286,1831,0,...,7054,4490,2564,3383,3324,59,1061,1948,169800,4288


In [9]:
# add the FIPS info. Change the idx.
def add_fips(df_state):
    state_fips=[]
    county_fips=[]
    tract_fips=[]
    full_ct_fips=[]

    for i in range(df_state.shape[0]):
        state_fips.append(df_state.index[i].params()[0][1])
        county_fips.append(df_state.index[i].params()[1][1])
        tract_fips.append(df_state.index[i].params()[2][1])
        full_ct_fips.append(df_state.index[i].params()[0][1]
                            +df_state.index[i].params()[1][1]
                            +df_state.index[i].params()[2][1])

    df_state['state_fips'] = state_fips
    df_state['county_fips'] = county_fips
    df_state['tract_fips'] = tract_fips
    df_state['full_ct_fips'] = full_ct_fips

    df_state.reset_index(drop = True, inplace = True)
    return df_state

In [10]:
# quick preprocessing: adding state name, replacing column names, and adding FIPS info.
for key_ in us_state_ct.keys():
    us_state_ct[key_].columns = var_names
    us_state_ct[key_]['state'] = states_and_fips[key_]
    us_state_ct[key_] = add_fips(us_state_ct[key_])

In [11]:
# check head, FL example
us_state_ct['12'].head()

Unnamed: 0,pop_total,sex_total,sex_male,sex_female,age_median,households,race_total,race_white,race_black,race_native,...,housing_units_vacant,rent_median,property_value_total,property_value_median,vehicle_total_imputed,state,state_fips,county_fips,tract_fips,full_ct_fips
0,2812,2812,1383,1429,39.4,931,2812,2086,517,0,...,172,1592,562,240400,1485,FL,12,86,211,12086000211
1,4709,4709,2272,2437,34.2,1668,4709,2382,1953,0,...,301,1109,295,179900,2167,FL,12,86,212,12086000212
2,5005,5005,2444,2561,34.1,1379,5005,2334,2206,224,...,267,1291,685,254900,2257,FL,12,86,213,12086000213
3,6754,6754,2934,3820,31.3,2238,6754,4052,1671,326,...,487,1135,1029,147800,3207,FL,12,86,214,12086000214
4,3021,3021,1695,1326,44.1,1364,3021,2861,121,0,...,690,1349,638,205900,1991,FL,12,86,128,12086000128


## **Exercise.** Download the census data for MA, including the socioeconomic and travel variables.