New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selecting columns from a grouped_df with `[` results in lost grouping #398

Closed
wch opened this Issue Apr 17, 2014 · 4 comments

Comments

Projects
None yet
3 participants
@wch
Member

wch commented Apr 17, 2014

m <- mtcars %>% group_by(cyl)

# Selecting rows keeps grouping
m[1:3, ]
# Source: local data frame [3 x 11]
# Groups: cyl
# 
#                mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1

# Selecting columns loses grouping
m[1:3, 1:3]
# Source: local data frame [3 x 3]
# Groups: 
# 
#                mpg cyl disp
# Mazda RX4     21.0   6  160
# Mazda RX4 Wag 21.0   6  160
# Datsun 710    22.8   4  108

# All attributes are lost
str(m[1:3, 1:3])
# Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':  3 obs. of  3 variables:
#  $ mpg : num  21 21 22.8
#  $ cyl : num  6 6 4
#  $ disp: num  160 160 108

This is especially problematic because R crashes if you use do() on this grouped_df which has no groups:

m[, 1:3] %>% do(mpg = mean(.$mpg))
# [segfault]

@wch wch changed the title from Selecting columns from a grouped tbl_df with `[` results in lost grouping to Selecting columns from a grouped_df with `[` results in lost grouping Apr 17, 2014

romainfrancois added a commit that referenced this issue Apr 17, 2014

@romainfrancois

This comment has been minimized.

Member

romainfrancois commented Apr 17, 2014

I've put some protection in place. We now get an error:

> m <- mtcars %>% group_by(cyl)
> m[1:3,1:3] %>% do(mpg = mean(.$mpg))
Erreur : no variables to group by
@wch

This comment has been minimized.

Member

wch commented Apr 17, 2014

It might make sense to keep the groups of any grouping columns that are selected (cyl in this case), but drop the groups of any grouping columns that aren't selected. If no grouping columns are selected, you could drop the grouped_df class.

I realize that select() works differently -- it keeps the grouping columns, even if the user doesn't ask for them specifically.

@romainfrancois

This comment has been minimized.

Member

romainfrancois commented Apr 17, 2014

We'd have to write a [.grouped_df for this. Not sure we want to go there.

@hadley

This comment has been minimized.

Member

hadley commented Apr 21, 2014

I think we should probably protect attributes in [ methods (we wouldn't encourage users to use these methods but they are useful for developers). I'll add the code

@hadley hadley added the enhancement label Aug 1, 2014

@hadley hadley modified the milestones: 0.3.1, 0.3 Aug 1, 2014

@hadley hadley self-assigned this Aug 1, 2014

@hadley hadley closed this in 7deb5ab Aug 12, 2014

krlmlr pushed a commit to krlmlr/dplyr that referenced this issue Mar 2, 2016

@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.