Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tibble() should auto-splice unnamed tibble columns #581

Closed
hadley opened this issue Feb 21, 2019 · 12 comments
Closed

tibble() should auto-splice unnamed tibble columns #581

hadley opened this issue Feb 21, 2019 · 12 comments
Labels
vctrs ↗️ Requires vctrs package

Comments

@hadley
Copy link
Member

hadley commented Feb 21, 2019

For compatibility with proposed dplyr semantics

cc @lionel-

@krlmlr
Copy link
Member

krlmlr commented Feb 22, 2019

I don't understand auto-splicing. Should anything change here:

library(rlang)
library(tibble)

quos <- quos(a = 5)
tibble(!!!quos)
#> # A tibble: 1 x 1
#>       a
#>   <dbl>
#> 1     5
tibble(quos)
#> # A tibble: 1 x 1
#>   quos          
#>   <S3: quosures>
#> 1 ~5

Created on 2019-02-22 by the reprex package (v0.2.1.9000)

@lionel-
Copy link
Member

lionel- commented Feb 22, 2019

  • tibble(tibble(a = 1)) would be equivalent to tibble(a = 1).

  • tibble(A = tibble(a = 1)) would create a df-col.

The idea is that giving a name to a tibble namespaces it, i.e. creates a df-col. There were discussions about auto-splicing vs namespacing semantics in tidyverse/dplyr#3721, tidyverse/dplyr#4169 , tidyverse/dplyr#3967, and r-lib/tidyselect#86. I'll write a meta issue about it next week.

Implementation might be tricky. It involves turning off auto-labelling of quosures and manually labelling unnamed objects which are not tibbles at the right time. We now have rlang::as_label() for this.

@krlmlr krlmlr modified the milestone: 2.x.y Feb 22, 2019
@hadley
Copy link
Member Author

hadley commented Mar 4, 2019

as_tibble() would need the same treatment.

@jennybc
Copy link
Member

jennybc commented Mar 4, 2019

I feel compelled to mention that having shape or type depend on the presence/absence of names has felt awkward in the past. I think that comes up in the long-running purrr::map_df[rc]() conversation as well.

@krlmlr
Copy link
Member

krlmlr commented Jun 29, 2019

This feels awkward. Users can always splice manually. What's the advantage of auto-splicing?

Thinking about namespaces in C++: an unnamed namespace is still a namespace and doesn't somehow get embedded into its parent. Auto-named tibble columns feel similar.

@lionel-
Copy link
Member

lionel- commented Jul 1, 2019

Auto-splicing doesn't use complicated syntax and doesn't have problems of evaluation timing.

It will be important to have auto-splicing in dplyr to implement mapping():

data %>%
  mutate(
    bar = bar(),
    mapping(starts_with("foo"), ~ .x / sd(.x))
  )

If you splice, mapping() will be evaluated too early and won't get access to the tidyselect variables.

@hadley
Copy link
Member Author

hadley commented Jul 1, 2019

And we're trying it out with tibble() as it seems like a low-risk environment. If adding this behaviour to tibble() causes significant problems, we'll re-evaluate.

@krlmlr
Copy link
Member

krlmlr commented Jul 1, 2019

I'm not yet convinced the principal constructor for data frames in the tidyverse is a low-risk environment ;-)

I'm especially concerned about the handling of name conflicts. Even if we apply tidyverse rules, this may lead to very surprising behavior.

To reiterate the argument:

  1. We are implementing (or planning to) adverbs like mapping() that work inside other verbs and generate a data frame. The net effect should be creation and updating of columns in the existing data frame:

    tibble(foo1 = 1:3, foo2 = 2:4, bar = 5) %>%
      mutate(
        bar = 6,
        mapping(starts_with("foo"), ~ .x / sd(.x))
      )
    ## tibble(foo1 = as.numeric(1:3), foo2 = as.numeric(2:4), bar = 6)
  2. I suppose the following should also work:

    tibble(foo1 = 1:3, foo2 = 2:4, bar = 5) %>%
      mutate(
        bar = 6,
        tibble(foo1 = 2:4, foo2 = 3:5)
      )
    ## tibble(foo1 = 2:4, foo2 = 3:5, bar = 6)
  3. For consistency, tibble() should behave identically.

Am I missing anything?

As an alternative, how about an auto-splicing adverb:

tibble(foo1 = 1:3, foo2 = 2:4, bar = 5) %>%
  mutate(
    bar = 6,
    auto_splice(tibble(foo1 = 2:4, foo2 = 3:5))
  ) %>%
  flatten_auto_splice()
## tibble(foo1 = 2:4, foo2 = 3:5, bar = 6)

auto_splice() would work by attaching a class to the object, which would be picked up by flatten_auto_splice() . Of course flatten_auto_splice() could become embedded into mutate() and tibble() .

@lionel-
Copy link
Member

lionel- commented Jul 1, 2019

An auto_splice() adverb seems less elegant.

We have already started using the named/unnamed syntax in vec_cbind(), it wouldn't be convenient to use an adverb there.

@hadley
Copy link
Member Author

hadley commented Jul 1, 2019

It is low risk because I find it very hard to imagine that anyone is counting on the current behaviour:

library(tibble)
names(tibble(tibble(x = 1, y = 2)))
#> [1] "tibble(x = 1, y = 2)"

Created on 2019-07-01 by the reprex package (v0.3.0)

@krlmlr
Copy link
Member

krlmlr commented Aug 7, 2019

There's also:

names(tibble::tibble(mtcars))
#> [1] "mtcars"

Created on 2019-08-07 by the reprex package (v0.3.0)

Are we still considering auto-splice? I'm happy to implement, just double-checking.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2020

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
vctrs ↗️ Requires vctrs package
Projects
None yet
Development

No branches or pull requests

4 participants