Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposing add_count and add_tally for adding an n column within groups #2078

Merged
merged 2 commits into from Dec 6, 2016

Conversation

@dgrtwo
Copy link
Member

@dgrtwo dgrtwo commented Aug 18, 2016

I very often find myself doing the following three steps (for example, before filtering out all groups below a certain size):

data %>%
  group_by(...) %>%
  mutate(n = n()) %>%
  ungroup()

I've contributed two verbs that shorten this, with working title add_count and add_tally. They add an n column (or an nn column if one is already present) to the data frame with counts within groups. Unlike count and tally the data is otherwise unchanged (same number of observations).

Also included basic unit tests and examples but I can flesh those out more if necessary.

@dgrtwo
Copy link
Member Author

@dgrtwo dgrtwo commented Aug 18, 2016

Two main questions:

  • why the name, and is there an alternative: I'm working off the possible precedent of add_predictions, add_predictors, and add_residuals from the modelr package. These are functions that take a table and add specifically calculated columns, like the above.

    The alternative we've used in the tidytext package is bind_tf_idf, when we add three columns (tf, idf, and tf_idf) to a table without otherwise changing it. If there's another prefix that means "add a column" I'm open to it, though I don't think mutate_ or append_ would be right.

    We could try to come up with clever verbs that don't refer to count or tally specifically (like... enumerate and tick or something) but I think that's making it more obscure

  • why the grouping behavior: I think it is important to return the tbl to its original grouping, because most of the time when this verb is useful, it is immediately followed by filtering or mutating, which are faster when ungrouped. (filtering is a particularly common pitfall). I also think that if the grouping behavior were preserved, one might as well just have used group_by so that it's clearer

@mgacc0
Copy link

@mgacc0 mgacc0 commented Sep 12, 2016

It would be a nice feature to have.
Alternatively I would propose the name add_tally_by (or add_count_by).

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Nov 7, 2016

LGTM.

@hadley
Copy link
Member

@hadley hadley commented Dec 2, 2016

LGTM

@krlmlr krlmlr merged commit dabc82b into tidyverse:master Dec 6, 2016
3 checks passed
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Dec 6, 2016

Thanks!

@lock
Copy link

@lock lock bot commented Jan 18, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants