New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

names<- method should protect against invalid names #466

Closed
hadley opened this Issue Aug 23, 2018 · 6 comments

Comments

Projects
None yet
3 participants
@hadley
Copy link
Member

hadley commented Aug 23, 2018

i.e. the following code should generate errors:

library(tibble)

df <- tibble(x = 1, y = 2)

names(df) <- "x"
names(df) <- c("x", "x")
names(df) <- c("", "")
@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented Aug 23, 2018

I think this could create some self-contradiction because, if I see how things are going, there will be a way to authorize, e.g., creation of a tibble with duplicate names, if you really want to.

I suspect the only thing that is categorically not allowed is names(df) <- NULL. And maybe names<- could also enforce this principle (if adopted): treat a name of NA as "".

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Aug 23, 2018

We could still allow (e.g.)

names(df, .names = "tidy") <- "x"
@krlmlr

This comment has been minimized.

Copy link
Member

krlmlr commented Aug 23, 2018

I see there's names<-.POSIXlt(), so I guess we might be able to implement names<-.tbl_df() . But would it affect rlang::set_names() ?

Would it work if we warn first, instead of throwing an error, to reduce friction with existing code?

krlmlr added a commit that referenced this issue Aug 30, 2018

Refactor and revise name repair (#469)
Fixes #463 Exposure (or lack thereof) of syntactic argument in tibble() and as_tibble() 
Fixes #461 Advertise .tidy_names
Fixed #446 Justify tidy_names

---

Other name-related tibble issues

  * Finalize interface of `.name_repair = <A_FUNCTION>` #476 *let's decide/discuss there*
  * tibble doesn't make it easy to get variable names that match our style guide #472 *let's solve in follow-up PR*
  * names<- method should protect against invalid names #466 *let's solve in follow-up PR*
  * Move name repair to vctrs #464 *PR lays groundwork for this long-term goal*
  * revisit strategy for making names syntactic #459 *let's solve in follow-up PR*
  * Re-encode character columns and column names to UTF-8 #87 *relevant, but I don't touch this*

Related issue in [tidyverse/principles](https://github.com/tidyverse/principles):

  * [Vector names](tidyverse/principles#4)

---

Overview:

  * Defines and motivates 3 levels of repaired names:
    - `minimal`: not `NULL`, no `NA`s, à la `rlang::names2()`
    - `unique`: `minimal` + no `""`s and no dupes (cf `tidy_names(syntactic = FALSE)`)
    - `syntactic`: `unique` + syntactic (cf `tidy_names(syntactic = TRUE)`)
  * Internal utility functions for each level of name repair, i.e. all combinations of "repair vs check" and "operate on the names vs the named vector".
  * `tibble()` and `as_tibble()` expose name repair via `.name_repair`. Replaces `.tidy_names` (piloted only in dev) and `as_tibble(..., validate = TRUE/FALSE)` (in CRAN version).
  * Default behaviour (`.name_repair = "assert_unique")`): do no repair but check that the names are `unique`.
  * `.name_repair = "none"` means just make sure the names are `minimal`
  * `.name_repair = "unique"` and `.name_repair = "syntactic"` mean what you think
  * `.name_repair` can also take a user-supplied function.

---

To discuss:

  * ~Should `valid` just be `unique`?~ YES and I have made it so in the source.
  * Let us review the "names degrees of freedom" and facilities offered by base, tidyverse for independent control:
    - exist or not
    - unique or not
    - syntactic or not
  * Automatic naming, based on expression. We currently offer no control over this in `tibble()`.
  * How much to reveal re: long-term plans. For example, why `tidy_names()` and `set_tidy_names()` are still here, but haven't gained the obvious counterparts like `unique_names()` and `set_syntactic_names()`. Do we need a Life Cycle section in the docs?
  * My backwards compatibility report

@krlmlr krlmlr added this to the 1.5.0 milestone Oct 5, 2018

@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented Oct 16, 2018

So is this a go for 1.5.0?

To restate @hadley's example with current syntax, it could look like:

names(df, .name_repair = "minimal") <- "x"

and presumably .name_repair would generally work as in tibble() and as_tibble()?

@krlmlr

This comment has been minimized.

Copy link
Member

krlmlr commented Oct 19, 2018

With 6165f02 I'm seeing:

devtools::load_all("~/git/R/tibble")
#> Loading tibble
df <- tibble(a = 1)
names(df, .name_repair = "minimal") <- NULL
#> Error in `names<-`(`*tmp*`, .name_repair = "minimal", value = NULL): 3 arguments passed to 'names<-' which requires 2

Created on 2018-10-19 by the reprex package (v0.2.1.9000)

So a names<-() with an extra argument doesn't seem to be supported.

I'll add a simple "minimal" check for now, we'll see how many downstream packages are affected. Also, the user can always subvert with rlang::set_names() .

@krlmlr

This comment has been minimized.

Copy link
Member

krlmlr commented Oct 19, 2018

Seems like set_names() somehow forwards to names<-(), so we're safe here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment