Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve recycling for reused atomic values in tibble() #213

Closed
krlmlr opened this issue Jan 16, 2017 · 4 comments
Closed

Improve recycling for reused atomic values in tibble() #213

krlmlr opened this issue Jan 16, 2017 · 4 comments

Comments

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Jan 16, 2017

Original issue tidyverse/dplyr#2360 by @rpruim.

The following result for data_frame() is unexpected (to me) given the result of mutate() below it.

data_frame(a = 1:3, b = 1, c = b/sum(b))
## # A tibble: 3 × 3
##       a     b     c
##   <int> <dbl> <dbl>
## 1     1     1     1
## 2     2     1     1
## 3     3     1     1
data_frame(a = 1:3) %>% mutate(b = 1, c = b/sum(b))
## # A tibble: 3 × 3
##       a     b         c
##   <int> <dbl>     <dbl>
## 1     1     1 0.3333333
## 2     2     1 0.3333333
## 3     3     1 0.3333333
@rpruim
Copy link

@rpruim rpruim commented Jan 17, 2017

Glad to see this is on the list for fixing. Thanks.

@NikNakk
Copy link

@NikNakk NikNakk commented Apr 21, 2017

tibble is currently implemented using as_tibble(lst(...)). The eval_tidy is done in lst, but the length == 1L columns aren't expanded to length == nrow(tbl) until the as_tibble is done. It looks as though either lst would need to be altered to take a parameter to indicate that scalars should be duplicated to fill the number of rows, or that some of the code will need to be duplicated from lst into tibble.

The bigger issue is that we don't know the number of rows until the first non-scalar column is evaluated. For example:

tibble(w = 1, x = 2, y = sum(w) / mean(x), z = 3:5)

is valid syntactically but when y is evaluated, the number of rows is unknown. Another example is:

f <- function(a) {a + c(4, 9, 12)}
tibble(w = 1, x = 2, y = sum(w) / mean(x), z = f(x))

It's even less evident looking at the call to know that there will eventually be three rows until all of the arguments have been evaluated.

@krlmlr
Copy link
Member Author

@krlmlr krlmlr commented Apr 21, 2017

Thanks @NikNakk for the analysis. I think tibble(a = ..., b = ...) generally should return the same as tibble(a = ...) %>% mutate(b = ...), so your last two examples aren't a big problem. Still, I don't see an easy way here, let's shelve this for now.

krlmlr added a commit that referenced this issue Aug 19, 2017
@krlmlr krlmlr closed this in #298 Aug 19, 2017
krlmlr added a commit that referenced this issue Aug 19, 2017
- Values of length 1 in a `tibble()` call are recycled prior to evaluating subsequent arguments, improving consistency with `mutate()` (#213).
- Recycling of values of length 1 in a `tibble()` call maintains their class (#284).
krlmlr added a commit that referenced this issue Aug 22, 2017
Bug fixes
---------

- Values of length 1 in a `tibble()` call are recycled prior to evaluating subsequent arguments, improving consistency with `mutate()` (#213).
- Recycling of values of length 1 in a `tibble()` call maintains their class (#284).
- `add_row()` now always preserves the column data types of the input data frame the same way as `rbind()` does (#296).
- `lst()` now again handles duplicate names, the value defined last is used in case of a clash.
- Adding columns to zero-row data frames now also works when mixing lengths 1 and 0 in the new columns (#167).
- The `validate` argument is now also supported in `as_tibble.tbl_df()`, with default to `FALSE` (#278).  It must be passed as named argument, as in `as_tibble(validate = TRUE)`.

Formatting
----------

- `format_v()` now always surrounds lists with `[]` brackets, even if their length is one. This affects `glimpse()` output for list columns (#106).
- Factor levels are escaped when printing (#277).
- Non-syntactic names are now also escaped in `glimpse()` (#280).
- `tibble()` gives a consistent error message in the case of duplicate column names (#291).
@github-actions
Copy link

@github-actions github-actions bot commented Dec 13, 2020

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants