Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

summarise() does not correctly coerce factors with different levels #1678

Closed
helix123 opened this issue Mar 1, 2016 · 2 comments
Closed

summarise() does not correctly coerce factors with different levels #1678

helix123 opened this issue Mar 1, 2016 · 2 comments
Assignees
Labels
Milestone

Comments

@helix123
Copy link
Contributor

@helix123 helix123 commented Mar 1, 2016

I response to #1556 with the latest dev version.

I expected the output for group b-1 (2nd row) to be all 0's on the summarised variables.

library(dplyr)
df <- data.frame(grp=c("a","a","a","b","b","b"), grp2=c("2","2","2","2","1","1"), fac=c("1","1","1","1","0","0"))

dplyr::summarise(dplyr::group_by(df, grp, grp2),
                                                mean   = mean(as.numeric(fac)-1),
                                                sum1   = sum(as.numeric(fac)-1),
                                                sum2   = sum(fac == 1),
                                                any1.0 = factor(ifelse(any(fac == 1), 1, 0)),
                                                any1.1 = ifelse(any(fac == 1), 1, 0),
                                                any2.0 = factor(if(any(fac == 1)) 1 else 0),
                                                any2.1 = if(any(fac == 1)) 1 else 0,
                                                any3.0 = factor(if(any(fac == factor(1, levels = c(0,1)))) 1 else 0),
                                                any3.1 = if(any(fac == factor(1, levels = c(0,1)))) 1 else 0)

Source: local data frame [3 x 11]
Groups: grp [?]

     grp   grp2  mean  sum1  sum2 any1.0 any1.1 any2.0 any2.1 any3.0 any3.1
  (fctr) (fctr) (dbl) (dbl) (int) (fctr)  (dbl) (fctr)  (dbl) (fctr)  (dbl)
1      a      2     1     3     3      1      1      1      1      1      1
2      b      1     0     0     0      1      0      1      0      1      0
3      b      2     1     1     1      1      1      1      1      1      1
Session info -------------------------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.2.3 (2015-12-10)
 system   x86_64, mingw32             
 ui       RStudio (0.99.875)          
 language (EN)                        
 collate  German_Germany.1252         
 tz       Europe/Berlin               
 date     2016-03-01                  

Packages -----------------------------------------------------------------------------------------------------------------------------------------------------------------
 package      * version    date       source                       
 assertthat     0.1        2013-12-06 CRAN (R 3.2.2)               
 curl           0.9.6      2016-02-17 CRAN (R 3.2.3)               
 DBI            0.3.1      2014-09-24 CRAN (R 3.2.2)               
 devtools       1.10.0     2016-01-23 CRAN (R 3.2.3)               
 digest         0.6.9      2016-01-08 CRAN (R 3.2.3)               
 dplyr        * 0.4.3.9000 2016-03-01 Github (hadley/dplyr@7d4e0ba)
 git2r          0.13.1     2015-12-10 CRAN (R 3.2.3)               
 httr           1.1.0      2016-01-28 CRAN (R 3.2.3)               
 knitr          1.12.3     2016-01-22 CRAN (R 3.2.3)               
 lazyeval       0.1.10     2015-01-02 CRAN (R 3.2.2)               
 magrittr       1.5        2014-11-22 CRAN (R 3.2.2)               
 memoise        1.0.0      2016-01-29 CRAN (R 3.2.3)               
 nycflights13 * 0.1        2014-07-22 CRAN (R 3.2.3)               
 R6             2.1.2      2016-01-26 CRAN (R 3.2.3)               
 Rcpp           0.12.3     2016-01-10 CRAN (R 3.2.3)               
 rstudioapi     0.5        2016-01-24 CRAN (R 3.2.3)               
 withr          1.0.1      2016-02-04 CRAN (R 3.2.3)        
@1va
Copy link

@1va 1va commented Mar 3, 2016

You could specify the levels (of the if/else results) to prevent wrong/unexpected concatenation of factors with different levels:
factor(ifelse(any(fac == 1), 1, 0), levels = c(0, 1))

But I agree that the expected behaviour would be that the levels are handled automatically similar to
unlist(list(factor(0), factor(1))

Compare c(factor(0), factor(1)) and c(factor(0, levels = c(0, 1)), factor(1, levels = c(0, 1)))

@hadley hadley changed the title summarise with factors summarise() does not correctly coerce factors with different levels Mar 8, 2016
@hadley
Copy link
Member

@hadley hadley commented Mar 8, 2016

Minimal reprex:

data_frame(x = 1:2) %>% 
  group_by(x) %>% 
  summarise(
    y = if(x == 1) "a" else "b",
    z = factor(y)
  )

@hadley hadley added this to the 0.5 milestone Mar 8, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants