New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors in values when combining API calls #256
Comments
Thanks for the report, @mbsabath. I can confirm this error and here is a slightly pared down reprex illustrating the issue. Looks like this is happening here when the two calls are being merged. I'll see what we need to do to fix this. Lines 285 to 297 in 02b1dcb
library(tidycensus)
library(dplyr)
dec_vars <- c("P001001", "P087002", "P087001", "P087008", "P087009", "P087016",
"P087017", "H076001", "P007003", "P007001", "P007002", "P007004",
"P007005", "PCT025004", "PCT025005", "PCT025012", "PCT025013",
"PCT025020", "PCT025021", "PCT025028", "PCT025029", "PCT025036",
"PCT025037", "PCT025045", "PCT025046", "PCT025053", "PCT025054",
"PCT025061", "PCT025062", "PCT025069", "PCT025070", "PCT025078",
"PCT025079", "PCT025002", "PCT025043", "PCT025077", "PCT025035",
"PCT025076", "P053001", "H004002", "H004001", "P013001", "P012003",
"P012004", "P012005", "P012027", "P012028", "P012029", "P012001")
dec_data <- get_decennial(
geography = "state",
state = "NY",
variables = dec_vars,
year = 2000
)
#> Getting data from the 2000 decennial Census
dec_data_sm <- get_decennial(
geography= "state",
state = "NY",
variables = c("P007001", "P007002", "P007004", "P007005"),
year = 2000
)
#> Getting data from the 2000 decennial Census
dec_data %>%
inner_join(dec_data_sm, by = c("GEOID", "NAME", "variable"))
#> # A tibble: 4 x 5
#> GEOID NAME variable value.x value.y
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 36 New York P007001 18976457 18976457
#> 2 36 New York P007002 16111441 12893689
#> 3 36 New York P007004 2791904 82461
#> 4 36 New York P007005 53637 1044976 Created on 2020-06-11 by the reprex package (v0.3.0) |
Thanks for filing! The issue here is that variable names are duplicated across SF1 and SF3 in the 2000 decennial Census. A brief example: library(tidycensus)
library(tidyverse)
vars_sf1 <- load_variables(2000, "sf1", cache = TRUE)
vars_sf3 <- load_variables(2000, "sf3", cache = TRUE) And to check: > filter(vars_sf1, str_detect(name, "P007001"))
# A tibble: 1 x 3
name label concept
<chr> <chr> <chr>
1 P007001 RACE:Total P7. Race [8]
>
> filter(vars_sf3, str_detect(name, "P007001"))
# A tibble: 1 x 3
name label concept
<chr> <chr> <chr>
1 P007001 Total population P7. Hispanic or Latino by Race [17]
@mfherman perhaps we should remove this behavior and throw an error message to avoid this issue, or maybe issue a warning so people know what they are getting. |
@walkerke Aha -- that makes much more sense than what I was seeing! I didn't realize the variable names were duplicated. Maybe it's safer to force the user to set the summary file explicitly. It is convenient to try SF3 for those vars not found in SF1, but if you don't know there are duplicate variable names (like me!) you get this unexpected result. |
I like that approach. Writing code to split up a varlist in to sf1 and sf3 variables is straightforward enough. The other option could be to do the check internally, assume sf1, and throw a warning for the duplicate variables saying that sf1 was assumed. |
I pushed a solution to this with some commits this morning. I didn't want to introduce breaking changes in case people have functioning code that uses SF3 variables while leaving I still need to do some testing of this, however, as your above examples now run without error - possibly due to the way that |
I'm using tidycensus version 0.9.9.5, and am getting errors in values in the 2000 decennial census when requesting a large list of variables (from both sf1 and sf3). The call to the API is as follows:
Where dec_data is the following vector:
When we query on a smaller subset for the same year (we noticed the issue when looking at the race statistics) we get results that seem correct. The query for the correct results is as follows:
The text was updated successfully, but these errors were encountered: