Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relocate.data.frame() shouldn't rely on vec_unique() retaining names #6209

Closed
DavisVaughan opened this issue Mar 7, 2022 · 1 comment · Fixed by #6340
Closed

relocate.data.frame() shouldn't rely on vec_unique() retaining names #6209

DavisVaughan opened this issue Mar 7, 2022 · 1 comment · Fixed by #6340

Comments

@DavisVaughan
Copy link
Member

DavisVaughan commented Mar 7, 2022

Currently, relocate() relies on the fact that vec_unique() keeps names

dplyr/R/relocate.R

Lines 83 to 85 in 8abb54b

pos <- vec_unique(c(lhs, to_move, rhs))
out <- .data[pos]
new_names <- names(pos)

We think we should actually drop names from the result of vec_unique(), since the returned names are reliant on the fact that only the first encountered duplicate value is returned (this is also consistent with unique()). See r-lib/vctrs#1442 and r-lib/vctrs#1532 where we tried to implement this, but found that dplyr relies on this behavior currently.


One easy way to fix this is to use vec_unique_loc() to get the locations, and then vec_slice(), which will retain names.

That should work in the immediate term but it may be worth looking at the code to also see why we currently get names off the unique values, and if there is something else we should be doing instead, since that feels unreliable.

@eutwt
Copy link
Contributor

eutwt commented Apr 1, 2022

The lhs and rhs in c(lhs, to_move, rhs) are already unique, and neither contain any elements in to_move.

dplyr/R/relocate.R

Lines 80 to 85 in 8abb54b

lhs <- setdiff(seq2(1, where - 1), to_move)
rhs <- setdiff(seq2(where + 1, ncol(.data)), to_move)
pos <- vec_unique(c(lhs, to_move, rhs))
out <- .data[pos]
new_names <- names(pos)

This means the only effect of vec_unique is to remove duplicates in to_move. And to_move only contains duplicates when the user supplies the same column twice with different names (if they supply the same column twice with no name or the same name, eval_select would give a unique output).

So, the only thing vec_unique does here is silently prevent users from creating copies of columns with relocate. Maybe there should be a check for this up front with an error that copying columns using relocate is not supported?

df <- tibble(a = 1, b = 2)

#### current behavior
df %>% 
  relocate(new = b, new2 = b)
# # A tibble: 1 × 2
#     new     a
#   <dbl> <dbl>
# 1     2     1

#### no `vec_unique()` i.e. `pos` defined as `pos <- c(lhs, to_move, rhs)`
df %>% 
  relocate(new = b, new2 = b)
# # A tibble: 1 × 3
#     new  new2     a
#   <dbl> <dbl> <dbl>
# 1     2     2     1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants