New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve recycling for reused atomic values in tibble() #213

Closed
krlmlr opened this Issue Jan 16, 2017 · 3 comments

Comments

Projects
3 participants
@krlmlr
Member

krlmlr commented Jan 16, 2017

Original issue tidyverse/dplyr#2360 by @rpruim.

The following result for data_frame() is unexpected (to me) given the result of mutate() below it.

data_frame(a = 1:3, b = 1, c = b/sum(b))
## # A tibble: 3 × 3
##       a     b     c
##   <int> <dbl> <dbl>
## 1     1     1     1
## 2     2     1     1
## 3     3     1     1
data_frame(a = 1:3) %>% mutate(b = 1, c = b/sum(b))
## # A tibble: 3 × 3
##       a     b         c
##   <int> <dbl>     <dbl>
## 1     1     1 0.3333333
## 2     2     1 0.3333333
## 3     3     1 0.3333333
@rpruim

This comment has been minimized.

rpruim commented Jan 17, 2017

Glad to see this is on the list for fixing. Thanks.

@krlmlr krlmlr added this to To Do in krlmlr Apr 17, 2017

@NikNakk

This comment has been minimized.

NikNakk commented Apr 21, 2017

tibble is currently implemented using as_tibble(lst(...)). The eval_tidy is done in lst, but the length == 1L columns aren't expanded to length == nrow(tbl) until the as_tibble is done. It looks as though either lst would need to be altered to take a parameter to indicate that scalars should be duplicated to fill the number of rows, or that some of the code will need to be duplicated from lst into tibble.

The bigger issue is that we don't know the number of rows until the first non-scalar column is evaluated. For example:

tibble(w = 1, x = 2, y = sum(w) / mean(x), z = 3:5)

is valid syntactically but when y is evaluated, the number of rows is unknown. Another example is:

f <- function(a) {a + c(4, 9, 12)}
tibble(w = 1, x = 2, y = sum(w) / mean(x), z = f(x))

It's even less evident looking at the call to know that there will eventually be three rows until all of the arguments have been evaluated.

@krlmlr

This comment has been minimized.

Member

krlmlr commented Apr 21, 2017

Thanks @NikNakk for the analysis. I think tibble(a = ..., b = ...) generally should return the same as tibble(a = ...) %>% mutate(b = ...), so your last two examples aren't a big problem. Still, I don't see an easy way here, let's shelve this for now.

@krlmlr krlmlr moved this from To Do to Blocked in krlmlr Apr 21, 2017

krlmlr added a commit that referenced this issue Aug 19, 2017

@krlmlr krlmlr closed this in #298 Aug 19, 2017

krlmlr added a commit that referenced this issue Aug 19, 2017

Merge pull request #298 from tidyverse/b-#213-recycle
- Values of length 1 in a `tibble()` call are recycled prior to evaluating subsequent arguments, improving consistency with `mutate()` (#213).
- Recycling of values of length 1 in a `tibble()` call maintains their class (#284).

krlmlr added a commit that referenced this issue Aug 22, 2017

Merge tag 'v1.3.4'
Bug fixes
---------

- Values of length 1 in a `tibble()` call are recycled prior to evaluating subsequent arguments, improving consistency with `mutate()` (#213).
- Recycling of values of length 1 in a `tibble()` call maintains their class (#284).
- `add_row()` now always preserves the column data types of the input data frame the same way as `rbind()` does (#296).
- `lst()` now again handles duplicate names, the value defined last is used in case of a clash.
- Adding columns to zero-row data frames now also works when mixing lengths 1 and 0 in the new columns (#167).
- The `validate` argument is now also supported in `as_tibble.tbl_df()`, with default to `FALSE` (#278).  It must be passed as named argument, as in `as_tibble(validate = TRUE)`.

Formatting
----------

- `format_v()` now always surrounds lists with `[]` brackets, even if their length is one. This affects `glimpse()` output for list columns (#106).
- Factor levels are escaped when printing (#277).
- Non-syntactic names are now also escaped in `glimpse()` (#280).
- `tibble()` gives a consistent error message in the case of duplicate column names (#291).

@krlmlr krlmlr moved this from Blocked to Done in krlmlr Aug 25, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment