Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

names<- method should protect against invalid names #466

Closed
hadley opened this issue Aug 23, 2018 · 7 comments
Closed

names<- method should protect against invalid names #466

hadley opened this issue Aug 23, 2018 · 7 comments
Milestone

Comments

@hadley
Copy link
Member

@hadley hadley commented Aug 23, 2018

i.e. the following code should generate errors:

library(tibble)

df <- tibble(x = 1, y = 2)

names(df) <- "x"
names(df) <- c("x", "x")
names(df) <- c("", "")
@jennybc
Copy link
Member

@jennybc jennybc commented Aug 23, 2018

I think this could create some self-contradiction because, if I see how things are going, there will be a way to authorize, e.g., creation of a tibble with duplicate names, if you really want to.

I suspect the only thing that is categorically not allowed is names(df) <- NULL. And maybe names<- could also enforce this principle (if adopted): treat a name of NA as "".

@hadley
Copy link
Member Author

@hadley hadley commented Aug 23, 2018

We could still allow (e.g.)

names(df, .names = "tidy") <- "x"
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Aug 23, 2018

I see there's names<-.POSIXlt(), so I guess we might be able to implement names<-.tbl_df() . But would it affect rlang::set_names() ?

Would it work if we warn first, instead of throwing an error, to reduce friction with existing code?

krlmlr added a commit that referenced this issue Aug 30, 2018
Fixes #463 Exposure (or lack thereof) of syntactic argument in tibble() and as_tibble() 
Fixes #461 Advertise .tidy_names
Fixed #446 Justify tidy_names

---

Other name-related tibble issues

  * Finalize interface of `.name_repair = <A_FUNCTION>` #476 *let's decide/discuss there*
  * tibble doesn't make it easy to get variable names that match our style guide #472 *let's solve in follow-up PR*
  * names<- method should protect against invalid names #466 *let's solve in follow-up PR*
  * Move name repair to vctrs #464 *PR lays groundwork for this long-term goal*
  * revisit strategy for making names syntactic #459 *let's solve in follow-up PR*
  * Re-encode character columns and column names to UTF-8 #87 *relevant, but I don't touch this*

Related issue in [tidyverse/principles](https://github.com/tidyverse/principles):

  * [Vector names](tidyverse/design#4)

---

Overview:

  * Defines and motivates 3 levels of repaired names:
    - `minimal`: not `NULL`, no `NA`s, à la `rlang::names2()`
    - `unique`: `minimal` + no `""`s and no dupes (cf `tidy_names(syntactic = FALSE)`)
    - `syntactic`: `unique` + syntactic (cf `tidy_names(syntactic = TRUE)`)
  * Internal utility functions for each level of name repair, i.e. all combinations of "repair vs check" and "operate on the names vs the named vector".
  * `tibble()` and `as_tibble()` expose name repair via `.name_repair`. Replaces `.tidy_names` (piloted only in dev) and `as_tibble(..., validate = TRUE/FALSE)` (in CRAN version).
  * Default behaviour (`.name_repair = "assert_unique")`): do no repair but check that the names are `unique`.
  * `.name_repair = "none"` means just make sure the names are `minimal`
  * `.name_repair = "unique"` and `.name_repair = "syntactic"` mean what you think
  * `.name_repair` can also take a user-supplied function.

---

To discuss:

  * ~Should `valid` just be `unique`?~ YES and I have made it so in the source.
  * Let us review the "names degrees of freedom" and facilities offered by base, tidyverse for independent control:
    - exist or not
    - unique or not
    - syntactic or not
  * Automatic naming, based on expression. We currently offer no control over this in `tibble()`.
  * How much to reveal re: long-term plans. For example, why `tidy_names()` and `set_tidy_names()` are still here, but haven't gained the obvious counterparts like `unique_names()` and `set_syntactic_names()`. Do we need a Life Cycle section in the docs?
  * My backwards compatibility report
@krlmlr krlmlr added this to the 1.5.0 milestone Oct 5, 2018
@jennybc
Copy link
Member

@jennybc jennybc commented Oct 16, 2018

So is this a go for 1.5.0?

To restate @hadley's example with current syntax, it could look like:

names(df, .name_repair = "minimal") <- "x"

and presumably .name_repair would generally work as in tibble() and as_tibble()?

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Oct 19, 2018

With 6165f02 I'm seeing:

devtools::load_all("~/git/R/tibble")
#> Loading tibble
df <- tibble(a = 1)
names(df, .name_repair = "minimal") <- NULL
#> Error in `names<-`(`*tmp*`, .name_repair = "minimal", value = NULL): 3 arguments passed to 'names<-' which requires 2

Created on 2018-10-19 by the reprex package (v0.2.1.9000)

So a names<-() with an extra argument doesn't seem to be supported.

I'll add a simple "minimal" check for now, we'll see how many downstream packages are affected. Also, the user can always subvert with rlang::set_names() .

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Oct 19, 2018

Seems like set_names() somehow forwards to names<-(), so we're safe here.

@github-actions
Copy link

@github-actions github-actions bot commented Dec 9, 2020

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants