Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error getting block group data for all states #136

Closed
ummel opened this issue Nov 30, 2018 · 10 comments
Closed

Error getting block group data for all states #136

ummel opened this issue Nov 30, 2018 · 10 comments

Comments

@ummel
Copy link

ummel commented Nov 30, 2018

I am attempting to obtain block group data nationally following @walkerke's example here.

I get an odd error when running the code block below. Hopefully, you can replicate it. I am using version 0.8.3.

The traceback looks to me like the culprit might have to do with the match.call() and eval() calls in get_acs() when multiple environments are involved -- but that's just a hunch.

library(tidycensus)
library(purrr)

us <- unique(fips_codes$state)[1:51]

# Process all block groups by state (fails)
totalpop <- map_df(us, function(x) {
  get_acs(geography = "block group", variables = "B01003_001", state = x)
})

# Use map() formula convention. Also fails -- but for different reason. 
# Suggests environment/evaluation issue.
totalpop <- map_df(us, ~ get_acs(geography = "block group", variables = "B01003_001", state = .x))

# Try to trap input value that generates error
for (x in us) {
  get_acs(geography = "block group", variables = "B01003_001", state = x)
}

# Value of 'x' at error
x

# This fails
get_acs(geography = "block group", variables = "B01003_001", state = x)

# Naturally, this works...
get_acs(geography = "block group", variables = "B01003_001", state = "AL")

# But so does this.
y <- x
get_acs(geography = "block group", variables = "B01003_001", state = y)
@mfherman
Copy link
Collaborator

mfherman commented Nov 30, 2018

It looks like the way the package currently handles block group calls for multiple counties is failing when using the purrr::map() family of functions. I believe the issue is that match.call() is looking for a string to evaluate while making the multiple calls, but it is getting . or .x from map() instead of the states you are trying to iterate over:

library(tidycensus)
library(tidyverse)

my_states <- c("ME", "VT")

map_dfr(
  my_states,
  ~ get_acs(
    geography = "block group",
    variables = "B01003_001",
    state = .x
  )
)
#> Getting data from the 2012-2016 5-year ACS
#> Getting data from the 2012-2016 5-year ACS
#> Error in load_data_acs(geography, vars, key, year, state, county, survey): object '.x' not found

It seems like one solution to this might be to use rlang::ensym() instead of match.call() in get_acs()see: tidyverse/purrr#570. (This might also be a solution to #94?)

In any case, one way to get around this for now is to iterate over pairs of states and counties using map2(). This is similar to the example here #121 (comment) and I also wrote up a couple blog posts with code: https://mattherman.info/blog/tidycensus-mult/ and https://mattherman.info/blog/tidycensus-mult-year/. Here I demonstrate it with just two states, but this will work for as many state/county combos as you want:

my_counties <- fips_codes %>%
  filter(state %in% my_states)

map2_dfr(
  .x = my_counties$state_code,
  .y = my_counties$county_code,
  ~ get_acs(
    geography = "block group",
    variables = "B01003_001",
    state = .x,
    county = .y
  )
)
#> # A tibble: 1,608 x 5
#>    GEOID      NAME                                 variable  estimate   moe
#>    <chr>      <chr>                                <chr>        <dbl> <dbl>
#>  1 230010101… Block Group 1, Census Tract 101, An… B01003_0…      759   224
#>  2 230010101… Block Group 2, Census Tract 101, An… B01003_0…      957   185
#>  3 230010102… Block Group 1, Census Tract 102, An… B01003_0…      917   308
#>  4 230010102… Block Group 2, Census Tract 102, An… B01003_0…     1242   433
#>  5 230010102… Block Group 3, Census Tract 102, An… B01003_0…     2525   467
#>  6 230010103… Block Group 1, Census Tract 103, An… B01003_0…     1033   259
#>  7 230010103… Block Group 2, Census Tract 103, An… B01003_0…     1096   228
#>  8 230010104… Block Group 1, Census Tract 104, An… B01003_0…     1801   235
#>  9 230010104… Block Group 2, Census Tract 104, An… B01003_0…      607   181
#> 10 230010105… Block Group 1, Census Tract 105, An… B01003_0…      926   202
#> # ... with 1,598 more rows

Created on 2018-11-30 by the reprex package (v0.2.1)

One interesting wrinkle I discovered when trying to run this for all states, is that there are two counties whose names and FIPS codes changed in 2015:

Wade Hampton Census Area, Alaska (02-270)
Changed name and code to Kusilvak Census Area (02-158) effective July 1, 2015.
Shannon County, South Dakota (46-113)
Changed name and code to Oglala Lakota County (46-102) effective May 1, 2015.

So when using the technique I describe above, the API call fails for those counties that are not in the 2016 ACS data file, but still in the tidycensus::fips_codes dataset:

my_states2 <- c("SD", "ND")

my_counties2 <- fips_codes %>%
  filter(state %in% my_states2)

map2_dfr(
  .x = my_counties2$state_code,
  .y = my_counties2$county_code,
  ~ get_acs(
    geography = "block group",
    variables = "B01003_001",
    state = .x,
    county = .y
  )
)
#> Getting data from the 2012-2016 5-year ACS
#> No encoding supplied: defaulting to UTF-8.
#> Error: Your API call has errors.  The API message returned is .

To avoid this you could grab the county and state combinations using tigris::counties() (which happens to be how get_acs() handles multi-county requests internally)

county_state <- tigris::counties(
  state = my_states2,
  cb = TRUE,
  resolution = "20m",
  year = "2016",
  class = "sf"
)

map2_dfr(
  .x = county_state$STATEFP,
  .y = county_state$COUNTYFP,
  ~ get_acs(
    geography = "block group",
    variables = "B01003_001",
    state = .x,
    county = .y
  )
)

#> # A tibble: 1,226 x 5
#>    GEOID     NAME                                  variable  estimate   moe
#>    <chr>     <chr>                                 <chr>        <dbl> <dbl>
#>  1 46003973… Block Group 1, Census Tract 9736, Au… B01003_0…      414    86
#>  2 46003973… Block Group 2, Census Tract 9736, Au… B01003_0…      911   107
#>  3 46003973… Block Group 3, Census Tract 9736, Au… B01003_0…      770    93
#>  4 46003973… Block Group 4, Census Tract 9736, Au… B01003_0…      637    96
#>  5 46057955… Block Group 1, Census Tract 9551, Ha… B01003_0…     1091   125
#>  6 46057955… Block Group 2, Census Tract 9551, Ha… B01003_0…     1570   164
#>  7 46057955… Block Group 1, Census Tract 9552, Ha… B01003_0…     1142   149
#>  8 46057955… Block Group 2, Census Tract 9552, Ha… B01003_0…      845   146
#>  9 46057955… Block Group 3, Census Tract 9552, Ha… B01003_0…     1347   169
#> 10 38039968… Block Group 1, Census Tract 9686, Gr… B01003_0…     1348   128
#> # ... with 1,216 more rows

Created on 2018-11-30 by the reprex package (v0.2.1)

@walkerke
Copy link
Owner

walkerke commented Dec 7, 2018

Thanks @mfherman for the detailed response! You've pointed to a number of internal issues with tidycensus that I'd like to clean up when I have the time.

As an aside - I tend not to recommend tidycensus for bulk block group data pulls. NHGIS tends to be a better option given that block group data are only available by county from the API (and only available by tract for 2013-2017 at the moment).

@NikKrieger
Copy link

NikKrieger commented Dec 7, 2018

Does this have anything to do with the fact that pre-2015 calls to acs5 don't work at the block group level?

# Doesn't work:
tidycensus::get_acs(geography = "block group", variables = "B01003_001", year = 2010,
                    output = "wide", state = "OH", geometry = TRUE)
#> Error: Your API call has errors.  The API message returned is error: unknown/unsupported geography heirarchy.
# Works:
tidycensus::get_acs(geography = "block group", variables = "B01003_001", year = 2010,
                    output = "wide", state = "OH", geometry = TRUE)

It seems that get_acs()'s pre-2015 calls are trying to use URLs like this one, which throws an error:
https://api.census.gov/data/2010/acs/acs5?get=B01003_001E%2CB01003_001M%2CNAME&for=block%20group%3A%2A&in=state%3A39%2Bcounty%3A055&key=CENSUS_API_KEY
But removing the third element of the URL (/acs) results in a call that DOES work:
https://api.census.gov/data/2010/acs5?get=B01003_001E%2CB01003_001M%2CNAME&for=block%20group%3A%2A&in=state%3A39%2Bcounty%3A055&key=CENSUS_API_KEY

Starting in 2015, both forms of this link work.

@walkerke
Copy link
Owner

walkerke commented Dec 7, 2018

@NikKrieger your links represent some current issues with a migration of the ACS API to a new endpoint. The first link you call is the new endpoint, which includes some limitations on geographic hierarchies. I've put in a request to Census to try to resolve them. The second link is the old endpoint, which is slated to be shut down by the end of the year (originally, this was going to happen in August).

I'm hoping Census will give us more geographic flexibility; if they don't I'll work on ways to handle this internally within tidycensus.

@NikKrieger
Copy link

Thanks for the explanation. I'm certainly grateful for the Census API to a degree, but I'm experiencing a measure of disillusionment with the number of quirks and unavailabilities it has.

@camille-s
Copy link

@NikKrieger not that it's all that juicy, but we can see your API key in those URLs!

@larcat
Copy link

larcat commented Dec 11, 2018

Thanks for this thread and the workaround, all :)

@NikKrieger
Copy link

@NikKrieger not that it's all that juicy, but we can see your API key in those URLs!

Fixed. Thanks.

@walkerke
Copy link
Owner

@mfherman your contributions here are very much appreciated! One of my plans as research/dev time opens up soon (very heavy teaching semester this fall) is to do a full refactor of how tidycensus handles that internal block group logic, removing match.call() and probably writing out the arguments one-by-one. I'll also iron out some of the FIPS code issues by having tidycensus check tigris for the valid county codes under the hood for a given year, which should make national block group calls possible (though subject to API instability).

@walkerke
Copy link
Owner

This should now work in tidycensus master (0.9.2). You just may have to wait a while as you have to hit the API for each county in the United States...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants