Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

summarise cannot recognize true levels of factor #2217

Closed
lampk opened this issue Oct 29, 2016 · 1 comment
Closed

summarise cannot recognize true levels of factor #2217

lampk opened this issue Oct 29, 2016 · 1 comment
Assignees
Labels
Milestone

Comments

@lampk
Copy link

@lampk lampk commented Oct 29, 2016

When I create a factor X using summarise() and then use that factor to derive another variable Z, I realize that summarise() will not recognize the levels of factor X.

For example

# create fake dataset
tmp <- data.frame(id = 1:3,
                  x1 = c(1, 0, 0),
                  x2 = factor(c("Yes", "Yes", "No"), levels = c("Yes", "No")))

# test summarise()
tmp2 <- tmp %>%
  group_by(id) %>%
  summarise(y1 = ifelse(x1 > 0, "Yes", "No"),
            y2 = factor(ifelse(x1 > 0, "Yes", "No"), levels = c("Yes", "No")),
            y3 = ifelse(x2 == "Yes" & y1 == "Yes", "Yes", "No"),
            y4 = ifelse(x2 == "Yes" & y2 == "Yes", "Yes", "No"),
            y5 = ifelse(x2 == "Yes" & y2 == 1, "Yes", "No")
            )
tmp2

give

# A tibble: 3 × 6
     id    y1     y2    y3    y4    y5
  <int> <chr> <fctr> <chr> <chr> <chr>
1     1   Yes    Yes   Yes    No   Yes
2     2    No     No    No    No    No
3     3    No     No    No    No    No

y4 is clearly wrong, and somehow summarise() only recognizes that y2 has 2 values: 1, 2 and does not recognize its levels

My sessionInfo()

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8      
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.5.0
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Nov 7, 2016

Confirmed. The following works as expected:

tmp2 <- tmp %>%
  mutate(y1 = ifelse(x1 > 0, "Yes", "No"),
            y2 = factor(ifelse(x1 > 0, "Yes", "No"), levels = c("Yes", "No")),
            y3 = ifelse(x2 == "Yes" & y1 == "Yes", "Yes", "No"),
            y4 = ifelse(x2 == "Yes" & y2 == "Yes", "Yes", "No"),
            y5 = ifelse(x2 == "Yes" & y2 == 1, "Yes", "No")
  )
tmp2

Loading

@krlmlr krlmlr self-assigned this Feb 10, 2017
@krlmlr krlmlr added this to the data frame 2 milestone Feb 20, 2017
@krlmlr krlmlr added this to the data frame 2 milestone Feb 20, 2017
krlmlr added a commit to krlmlr/dplyr that referenced this issue Mar 21, 2017
@krlmlr krlmlr closed this in #2547 Mar 21, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants