New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

select not working for grouped_df #170

Closed
dickoa opened this Issue Dec 24, 2013 · 7 comments

Comments

Projects
None yet
4 participants
@dickoa

dickoa commented Dec 24, 2013

Hi,

I think that select verb is not working as expected for grouped_df.
Here's a small reproductible example :

Using select

require(dplyr)
group_by(mtcars, vs) %.% select(mpg)
Error: index out of bounds

Using subset

str(group_by(mtcars, vs) %.% subset(select = mpg))
Classesgrouped_df’, ‘tbl_df’, ‘tbland 'data.frame':    32 obs. of  1 variable:
 $ mpg: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
@romainfrancois

This comment has been minimized.

Member

romainfrancois commented Dec 24, 2013

Here is the implementation of select for grouped data frames:

select.grouped_df <- function(.data, ...) {
  grouped_df(select.data.frame(.data, ...), groups(.data))
}

So the error comes from the grouped_df call that uses variables no longer in the data. The data that goes into grouped_df only has the mpg variable, so cannot be grouped by vs.

Maybe we should lose the groupings or force selection of grouping variables. @hadley ?

@hadley

This comment has been minimized.

Member

hadley commented Dec 24, 2013

Ooh, I didn't think of that possibility. I think it would be best to give an error in this situation: e.g. "selection doesn't include grouping variables: vs".

@piccolbo

This comment has been minimized.

piccolbo commented Mar 19, 2014

It seems to me Romain's proposal to implicitly select grouping variables was superior from a DRY or "war on boilerplate" point of view. Now we have to write

group_by(data, group.vars) %.% select(select.vars, group.vars)

Mandated repetition of group.vars or error. Can't be The Right Way.

@hadley

This comment has been minimized.

Member

hadley commented Mar 19, 2014

Yes, that's an excellent point. No reason to let people do something that they shouldn't want to do.

@hadley hadley reopened this Mar 19, 2014

@hadley hadley modified the milestones: v0.2, v0.1.1 Mar 19, 2014

@romainfrancois

This comment has been minimized.

Member

romainfrancois commented Mar 19, 2014

Perhaps this should come with some sort of message letting the user know we've added variables to the selection. In any case, this can be handled in the select.grouped_df method.

@hadley

This comment has been minimized.

Member

hadley commented Mar 20, 2014

I think it's reasonable to say that it's implicit in the grouping - i.e. a grouped select includes the grouping variables in the same way that a grouped arrange orders by the grouped variables.

@piccolbo

This comment has been minimized.

piccolbo commented Mar 20, 2014

An alternative could be to remove the constraint and allow to drop grouping variables. This makes more sense than it seems: imagine you are doing resampling and you are only interested in the variation of an estimator, not exactly which group it comes from. But there are implications to this: I don't think it's natural to preserve the grouping when the grouping variables are gone. Hence it would become an implicit ungroup. plyrmr goes with implicit selection, but I am not sure it's the last word.

hadley added a commit that referenced this issue Mar 25, 2014

@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.