New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spread when id column has names #525

Closed
wangyuchen opened this Issue Dec 28, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@wangyuchen
Copy link

wangyuchen commented Dec 28, 2018

tidyr/R/id.R

Line 41 in cbdd14e

if (!is_null(attr(x, "n")) && !drop) return(x)

I think you'd want to add exact = TRUE here to match attribute "n" exactly.

Otherwise, when drop = FALSE, the empty attribute condition gets used and it will partial match all attributes. It happened that I had a named vector in my data frame and it was passed into id_var(). It matched the "names" attribute and returned the variable without adding the "n" attribute.

@wangyuchen wangyuchen changed the title spread when id columns has names spread when id column has names Dec 28, 2018

@hadley hadley added the reprex label Jan 4, 2019

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jan 4, 2019

Can you please provide a minimal reprex (reproducible example)? That will help us create a unit test to make sure we fix the bug.

@wangyuchen

This comment has been minimized.

Copy link
Author

wangyuchen commented Jan 4, 2019

Yes of course. Please see if this works.

library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.4.4
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.4.4
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# named vector
id_col <- c(x = 1, y = 2, z = 3)

# spread key contains levels not included in data
# need to use drop = FALSE when spread
spread_df <- 
  data.frame(key = factor(1:3, 1:5, letters[1:5]),
             out = 1:3,
             id = id_col)  # name attribute will be dropped
spread_df
#>   key out id
#> x   a   1  1
#> y   b   2  2
#> z   c   3  3

# both work fine
spread_df %>%  
  spread(key, out, drop = TRUE) 
#>   id  a  b  c
#> 1  1  1 NA NA
#> 2  2 NA  2 NA
#> 3  3 NA NA  3

spread_df %>%  
  spread(key, out, drop = FALSE) 
#>   id  a  b  c  d  e
#> 1  1  1 NA NA NA NA
#> 2  2 NA  2 NA NA NA
#> 3  3 NA NA  3 NA NA


spread_df2 <- 
  data.frame(key = factor(1:3, 1:5, letters[1:5]),
             out = 1:3) %>% 
  mutate(id = id_col)  # the name attribute will be preserved in mutate

spread_df2 %>%  
  spread(key, out, drop = TRUE) 
#>   id  a  b  c
#> 1  1  1 NA NA
#> 2  2 NA  2 NA
#> 3  3 NA NA  3

spread_df2 %>%  
  spread(key, out, drop = FALSE) 
#> Error: Result 1 is not a length 1 atomic vector

Created on 2019-01-04 by the reprex package (v0.2.1)

Ryo-N7 added a commit to Ryo-N7/tidyr that referenced this issue Jan 19, 2019

romainfrancois added a commit to romainfrancois/tidyr that referenced this issue Feb 5, 2019

@hadley hadley closed this in 0b27690 Feb 5, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment