Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.get_census 50vars fixes #1 #8

Merged
merged 4 commits into from
May 16, 2023

Conversation

jbousquin
Copy link
Contributor

Breaks the variables into manageable chunks and then loops over those chunks, and merge results at the end (defaults to the intersection of the columns in both DataFrames, which should be either State, county etc. or GEOID).

…ead to df to one line (subset references to out were causing issues). Cleaned up white space.
Fix whitespace, other checks.
@jbousquin
Copy link
Contributor Author

I wasn't sure how you might want test cases structured. It makes sense to test with a request that returns index differences depending on the number of variables as discussed in the issue. However, I had trouble reproducing index differences across different requests with these and neither of these are 50+ vars (suggest just adding some vars to the second). Sharing my interpretation of those in python in case I'm missing something.

From: walkerke/tidycensus#165

variables = ["B01001_003E", "B01001_004E", "B01001_005E", "B01001_006E", "B01001_007E",
             "B01001_008E", "B01001_009E", "B01001_010E", "B01001_011E", "B01001_012E", 
             "B01001_013E", "B01001_014E", "B01001_015E", "B01001_016E", "B01001_017E", 
             "B01001_018E", "B01001_019E", "B01001_020E", "B01001_021E", "B01001_022E", 
             "B01001_023E", "B01001_024E", "B01001_025E", "B01001_026E", "B25002_002E",
             "B03003_003E"]

test1 = get_census(dataset = "acs/acs5",
                   variables = "B03003_003E",
                   year = 2017,
                   params = {
                             "for": "tract:*",
                             "in": "state:36;county:*",
                            },
                   return_geoid = True)

test2 = get_census(dataset = "acs/acs5",
                   variables = variables,
                   year = 2017,
                   params = {
                             "for": "tract:*",
                             "in": "state:36;county:*",
                            },
                   return_geoid = True)

test3 = test1.merge(test2, left_index=True, right_index=True, how='left', indicator=True)
assert len(test3[test3['_merge'] == 'both']) == len(test3), 'Batch index mis-match'

From: hrecht/censusapi#82

# Group B01001 (001-049E)
estimates = ['0'+ str(z) for z in range(1, 10)]
estimates +=list(range(10, 50))
group_B01001 = ['B01001_0'+ str(v) + 'E' for v in estimates]

acs_pop_group = get_census(dataset = "acs/acs5",
                           variables = group_B01001,
                           year = 2017,
                           params = {
                               "for": "tract:*",
                               "in": "state:02;county:*",
                           },
                           return_geoid = True)

acs_pop_manual = get_census(dataset = "acs/acs5",
                            variables = 'B01001_001E',
                            year = 2017,
                            params = {
                                "for": "tract:*",
                                "in": "state:02;county:*",
                            },
                            return_geoid = True)

# Check they are all equal
comp = acs_pop_group['B01001_001E']== acs_pop_manual['B01001_001E']
comp.value_counts()

# Or assert they are all equal
test_acs_pop_group = acs_pop_group.merge(acs_pop_manual, left_index=True, right_index=True, how='left', indicator=True)
assert len(test3[test3['_merge'] == 'both']) == len(test3), 'Batch index mis-match'

@walkerke
Copy link
Owner

walkerke commented May 3, 2023

Thanks! I'll spend some time going through this and doing some checks. Appreciate the PR!

@walkerke walkerke merged commit 848c9f6 into walkerke:main May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants