Skip to content

summarise() does not correctly coerce factors with different levels #1678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Mar 1, 2016 · 2 comments
Closed

summarise() does not correctly coerce factors with different levels #1678

ghost opened this issue Mar 1, 2016 · 2 comments
Assignees
Labels
bug an unexpected problem or unintended behavior
Milestone

Comments

@ghost
Copy link

ghost commented Mar 1, 2016

I response to #1556 with the latest dev version.

I expected the output for group b-1 (2nd row) to be all 0's on the summarised variables.

library(dplyr)
df <- data.frame(grp=c("a","a","a","b","b","b"), grp2=c("2","2","2","2","1","1"), fac=c("1","1","1","1","0","0"))

dplyr::summarise(dplyr::group_by(df, grp, grp2),
                                                mean   = mean(as.numeric(fac)-1),
                                                sum1   = sum(as.numeric(fac)-1),
                                                sum2   = sum(fac == 1),
                                                any1.0 = factor(ifelse(any(fac == 1), 1, 0)),
                                                any1.1 = ifelse(any(fac == 1), 1, 0),
                                                any2.0 = factor(if(any(fac == 1)) 1 else 0),
                                                any2.1 = if(any(fac == 1)) 1 else 0,
                                                any3.0 = factor(if(any(fac == factor(1, levels = c(0,1)))) 1 else 0),
                                                any3.1 = if(any(fac == factor(1, levels = c(0,1)))) 1 else 0)

Source: local data frame [3 x 11]
Groups: grp [?]

     grp   grp2  mean  sum1  sum2 any1.0 any1.1 any2.0 any2.1 any3.0 any3.1
  (fctr) (fctr) (dbl) (dbl) (int) (fctr)  (dbl) (fctr)  (dbl) (fctr)  (dbl)
1      a      2     1     3     3      1      1      1      1      1      1
2      b      1     0     0     0      1      0      1      0      1      0
3      b      2     1     1     1      1      1      1      1      1      1
Session info -------------------------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.2.3 (2015-12-10)
 system   x86_64, mingw32             
 ui       RStudio (0.99.875)          
 language (EN)                        
 collate  German_Germany.1252         
 tz       Europe/Berlin               
 date     2016-03-01                  

Packages -----------------------------------------------------------------------------------------------------------------------------------------------------------------
 package      * version    date       source                       
 assertthat     0.1        2013-12-06 CRAN (R 3.2.2)               
 curl           0.9.6      2016-02-17 CRAN (R 3.2.3)               
 DBI            0.3.1      2014-09-24 CRAN (R 3.2.2)               
 devtools       1.10.0     2016-01-23 CRAN (R 3.2.3)               
 digest         0.6.9      2016-01-08 CRAN (R 3.2.3)               
 dplyr        * 0.4.3.9000 2016-03-01 Github (hadley/dplyr@7d4e0ba)
 git2r          0.13.1     2015-12-10 CRAN (R 3.2.3)               
 httr           1.1.0      2016-01-28 CRAN (R 3.2.3)               
 knitr          1.12.3     2016-01-22 CRAN (R 3.2.3)               
 lazyeval       0.1.10     2015-01-02 CRAN (R 3.2.2)               
 magrittr       1.5        2014-11-22 CRAN (R 3.2.2)               
 memoise        1.0.0      2016-01-29 CRAN (R 3.2.3)               
 nycflights13 * 0.1        2014-07-22 CRAN (R 3.2.3)               
 R6             2.1.2      2016-01-26 CRAN (R 3.2.3)               
 Rcpp           0.12.3     2016-01-10 CRAN (R 3.2.3)               
 rstudioapi     0.5        2016-01-24 CRAN (R 3.2.3)               
 withr          1.0.1      2016-02-04 CRAN (R 3.2.3)        
@1va
Copy link

1va commented Mar 3, 2016

You could specify the levels (of the if/else results) to prevent wrong/unexpected concatenation of factors with different levels:
factor(ifelse(any(fac == 1), 1, 0), levels = c(0, 1))

But I agree that the expected behaviour would be that the levels are handled automatically similar to
unlist(list(factor(0), factor(1))

Compare c(factor(0), factor(1)) and c(factor(0, levels = c(0, 1)), factor(1, levels = c(0, 1)))

@hadley hadley changed the title summarise with factors summarise() does not correctly coerce factors with different levels Mar 8, 2016
@hadley
Copy link
Member

hadley commented Mar 8, 2016

Minimal reprex:

data_frame(x = 1:2) %>% 
  group_by(x) %>% 
  summarise(
    y = if(x == 1) "a" else "b",
    z = factor(y)
  )

@hadley hadley added bug an unexpected problem or unintended behavior data frame labels Mar 8, 2016
@hadley hadley added this to the 0.5 milestone Mar 8, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants