Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update: new web agg write #105

Merged
merged 8 commits into from Jun 29, 2021
Merged

update: new web agg write #105

merged 8 commits into from Jun 29, 2021

Conversation

nmmarquez
Copy link
Contributor

New aggregate function which aggregates data by different jurisdictions. The function logic is very similar to calc_aggregate_counts except each state has a count for each jurisdiction. The main part of the code is here.

ucla_df <- read_scrape_data(
        window = window, all_dates = all_dates, wide_data = FALSE) %>%
        mutate(Web.Group = case_when(
            Jurisdiction == "immigration" ~ "ICE",
            Jurisdiction == "federal" ~ "Federal",
            Age == "Juvenile" ~ "Juvenile",
            Jurisdiction == "state" ~ "Prison",
            Jurisdiction == "psychiatric" ~ "Psychiatric",
            Jurisdiction == "county" ~ "County",
            TRUE ~ NA_character_
        ))

Leaving this up as a draft for now to see if we agree on this categorizing and how we want to deal with incorporating MP data.

@nmmarquez nmmarquez marked this pull request as draft June 3, 2021 15:01
@nmmarquez nmmarquez linked an issue Jun 3, 2021 that may be closed by this pull request
Copy link
Member

@erika-tyagi erika-tyagi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay thank you for doing this! And thank you for adding max_na_rm as a function too! A few little thoughts:

  1. Do you think it's worth having a state param that works the same way as in the other calc_aggregate_counts? Obviously extremely easy to get the state = F version from the output of this, so not sure if it's worth it.
  2. Is there a reason you called the web grouping Prison instead of State? (thinking about federal prisons)
  3. Re: MP – I think we have to assign the whole totals to the state web grouping...? Probably whenever MP value > our value (as opposed to just when we don't have a value) to catch Texas, Ohio, etc., I think?

One thing I'm still iffy about is rates (literally never sure about rates). For State-Web.Group combinations where we have Residents.Population and/or Population.Feb20 for ALL (or basically all) facilities, that seems super straightforward. And for combinations where we don't have ANY population data, that also seems like a straightforward NA situation. But when we have population values for some (but not all) facilities in a grouping, do we use the total numerator for the measure but the partial denominator? Even if that overestimates rates? I honestly don't know what we're doing right now on the website to address this, so might be a perfect enemy of good situation.

Comment on lines +32 to +37
Jurisdiction == "immigration" ~ "ICE",
Jurisdiction == "federal" ~ "Federal",
Age == "Juvenile" ~ "Juvenile",
Jurisdiction == "state" ~ "Prison",
Jurisdiction == "psychiatric" ~ "Psychiatric",
Jurisdiction == "county" ~ "County",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order here means that

  • facilities that are Juvenile and immigration or federal would NOT be Juvenile, but
  • facilities that are Juvenile and county, psychiatric, or state WOULD be

There's only one federal or immigration row this affects right now (population data for ALL BOP CONTRACT JUVENILES), so I feel like it's fine the way it is – but maybe worth adding a comment for future selves!

))

fac_long_df <- ucla_df %>%
filter(State != "Not Available") %>%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on leaving these in and just keeping the state as Not Available? Means we'd keep both federal / ice facilities that we haven't gotten around to adding yet, but also federal aggregations (e.g. RRCs) where they're cross-state so the actual assigned state is Not Available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea but I'm not sure, practically, whether this matters. If state = "Not Available", would the numbers show up anywhere on the website? I don't think so

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they actually do show up on the website, just with the jurisdiction where the name of the state would be!
Screen Shot 2021-06-07 at 12 46 31 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just checking, since these are aggregated they wouldn't show up here right? this is only for "facilities"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's my understanding!

@nmmarquez nmmarquez marked this pull request as ready for review June 15, 2021 17:55
Age == "Juvenile" ~ "Juvenile",
Jurisdiction == "state" ~ "Prison",
Jurisdiction == "psychiatric" ~ "Psychiatric",
Jurisdiction == "cou nty" ~ "County",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

misspelling here on "county"

Copy link
Contributor

@hjohns12 hjohns12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from one tiny typo, this all looks good to me! Thank you so much for dealing with all this craziness and using good variable names to wit :)

Copy link
Member

@erika-tyagi erika-tyagi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wonderful! Made one suggestion, and I still feel like we should keep the Not Available state rows, but overall looks great // thank you so much for working through this!!

Comment on lines +102 to +104
pri_df <- state_df %>%
filter(Web.Group == "Prison") %>%
full_join(mp_df, by = c("State", "Date", "Measure"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the Web.Group for everything from MP to be Prison? If so, should we assign that here? (or wherever might make more sense) otherwise there are NA Web.Groups so that data won't make it onto the website :(

@nmmarquez
Copy link
Contributor Author

Added back in the "state not available" and changed the calculation of rate to include values for population in the denominator even when we dont have a numerator. This makes our rates more comparable to Marshall project and makes our denominators more sensible.
f52500f1-75f2-43e6-a9c2-bbd60bb04ef0

#'
#' @examples
#' \dontrun{
#' read_staff_popfeb20()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be read_mpap_pop_data().

@@ -0,0 +1,25 @@
#' Get Feb 20, 2020 population data for applicable rows in the fac_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use this function anywhere? // I think it's useful, but curious why it ended up here!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to add it in if were to start using staff data denoms anywhere. Not using yet.

Copy link
Member

@erika-tyagi erika-tyagi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay ty ty lgtm! Added a few last comments, but I don't think they need to preclude merging this. I guess the next step is to update write_latest_data // let HO know?

@nmmarquez nmmarquez merged commit 1e688d7 into master Jun 29, 2021
@erika-tyagi erika-tyagi deleted the new_agg_func branch July 21, 2021 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New aggregation function
3 participants