Skip to content

select not working for grouped_df #170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dickoa opened this issue Dec 24, 2013 · 7 comments
Closed

select not working for grouped_df #170

dickoa opened this issue Dec 24, 2013 · 7 comments
Assignees
Labels
bug an unexpected problem or unintended behavior
Milestone

Comments

@dickoa
Copy link

dickoa commented Dec 24, 2013

Hi,

I think that select verb is not working as expected for grouped_df.
Here's a small reproductible example :

Using select

require(dplyr)
group_by(mtcars, vs) %.% select(mpg)
Error: index out of bounds

Using subset

str(group_by(mtcars, vs) %.% subset(select = mpg))
Classesgrouped_df’, ‘tbl_df’, ‘tbland 'data.frame':    32 obs. of  1 variable:
 $ mpg: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
@romainfrancois
Copy link
Member

Here is the implementation of select for grouped data frames:

select.grouped_df <- function(.data, ...) {
  grouped_df(select.data.frame(.data, ...), groups(.data))
}

So the error comes from the grouped_df call that uses variables no longer in the data. The data that goes into grouped_df only has the mpg variable, so cannot be grouped by vs.

Maybe we should lose the groupings or force selection of grouping variables. @hadley ?

@hadley
Copy link
Member

hadley commented Dec 24, 2013

Ooh, I didn't think of that possibility. I think it would be best to give an error in this situation: e.g. "selection doesn't include grouping variables: vs".

@piccolbo
Copy link

It seems to me Romain's proposal to implicitly select grouping variables was superior from a DRY or "war on boilerplate" point of view. Now we have to write

group_by(data, group.vars) %.% select(select.vars, group.vars)

Mandated repetition of group.vars or error. Can't be The Right Way.

@hadley
Copy link
Member

hadley commented Mar 19, 2014

Yes, that's an excellent point. No reason to let people do something that they shouldn't want to do.

@hadley hadley reopened this Mar 19, 2014
@hadley hadley modified the milestones: v0.2, v0.1.1 Mar 19, 2014
@romainfrancois
Copy link
Member

Perhaps this should come with some sort of message letting the user know we've added variables to the selection. In any case, this can be handled in the select.grouped_df method.

@hadley
Copy link
Member

hadley commented Mar 20, 2014

I think it's reasonable to say that it's implicit in the grouping - i.e. a grouped select includes the grouping variables in the same way that a grouped arrange orders by the grouped variables.

@piccolbo
Copy link

An alternative could be to remove the constraint and allow to drop grouping variables. This makes more sense than it seems: imagine you are doing resampling and you are only interested in the variation of an estimator, not exactly which group it comes from. But there are implications to this: I don't think it's natural to preserve the grouping when the grouping variables are gone. Hence it would become an implicit ungroup. plyrmr goes with implicit selection, but I am not sure it's the last word.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants