Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

group_by + summarize drops last grouping var #862

Closed
dpastoor opened this issue Jan 1, 2015 · 4 comments
Closed

group_by + summarize drops last grouping var #862

dpastoor opened this issue Jan 1, 2015 · 4 comments

Comments

@dpastoor
Copy link
Contributor

dpastoor commented Jan 1, 2015

Tried this on two computers, with both the most recent dev version and 0.4 and bug persists.

mtcars %>% group_by(mpg, cyl) %>% summarize(mean_hp = mean(hp))

will drop cyl (but still maintains cyl column)

Source: local data frame [27 x 3]
Groups: mpg

    mpg cyl mean_hp
1  10.4   8   210.0
2  13.3   8   245.0
3  14.3   8   245.0

mtcars %>% group_by(mpg, cyl, qsec) %>% summarize(mean_hp = mean(hp))

will drop qsec (but still keep column)

Source: local data frame [32 x 4]
Groups: mpg, cyl

    mpg cyl  qsec mean_hp
1  10.4   8 17.82     215
2  10.4   8 17.98     205
3  13.3   8 15.41     245
4  14.3   8 15.84     245
5  14.7   8 17.42     230
> devtools::session_info()
Session info------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.1.2 (2014-10-31)
 system   x86_64, darwin14.0.0        
 ui       RStudio (0.98.1091)         
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/New_York            

Packages----------------------------------------------------------------------------------------------------
 package    * version    date       source                       
 assertthat   0.1        2013-12-06 CRAN (R 3.1.2)               
 DBI          0.3.1      2014-09-24 CRAN (R 3.1.2)               
 devtools     1.6.1      2014-10-07 CRAN (R 3.1.2)               
 dplyr      * 0.3.0.9000 2015-01-01 Github (hadley/dplyr@1fc07de)
 lazyeval     0.1.9      2014-10-01 CRAN (R 3.1.2)               
 magrittr     1.5        2014-11-22 CRAN (R 3.1.2)               
 Rcpp         0.11.3     2014-09-29 CRAN (R 3.1.2)               
 rstudioapi   0.1        2014-03-27 CRAN (R 3.1.2) 
@hadley
Copy link
Member

hadley commented Jan 1, 2015

This is by design. After you've summarised, the last group will only have one row per group, so it's not useful to group on it.

@hadley hadley closed this as completed Jan 1, 2015
@dpastoor
Copy link
Contributor Author

dpastoor commented Jan 1, 2015

I see, so what if I want to maintain groups?

The work around I am using (in a function) currently is:

  grps <- NULL
  if(!is.null(groups(df))) grps <- groups(df)
  out <- df %>% dplyr::summarize_(...stuff...))
  if(!is.null(grps)) out <- dplyr::group_by_(out, .dots=grps)
  return(out)

The 2 unexpected side effects of this (and maybe it is a poor design on my part?) are:

  1. if I have a function that behaves differently based on whether passing in a grouped_df or not. If only one grouping variable was present, the returned df loses its grouped_df class. This mangles my output some :-(

  2. I have a function that I want colwise summaries on all the columns in the resulting summarized df (excluding grouping vars). By dropping the groups it tries to perform the function on the grouping var that is dropped but fails.

I guess since it is by design (and won't go away) just factor that in to checks or otherwise if I want to force carrying all grouping variables after summarization?

@hadley
Copy link
Member

hadley commented Jan 2, 2015

Then that's your problem 😛

If you give me a bit more context about what you're trying to achieve and I can suggest alternative approaches.

@dpastoor
Copy link
Contributor Author

dpastoor commented Jan 2, 2015

touche!

I'm going to let it digest a bit before I waste any of your time running down a blind alley. I'll be at your workshop in a couple weeks so if I'm still scratching your head I can pick your brain then. It's a subtle design choice on your end but I see why it makes sense.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants