Rework flattening #912

hadley · 2022-09-04T14:43:39Z

This is a large PR that reworks flattening, removing inconsistencies and switching to vctrs as a backend.

The key idea is to introduce a new family of "combining" functions: list_c(), list_rbind(), and list_cbind(), which replace flatten_lgl(), flatten_int(), flatten_dbl(), flatten_chr() (now list_c()), flatten_dfc() (list_cbind()), and flatten_dfr() (list_rbind()). The new functions are straightforward wrappers around vctrs functions, but somehow feel natural in purrr to me.

This leaves flatten(), which had a rather idiosyncratic interface. It's now been replaced by list_flatten() which now always removes a single layer of list hierarchy (and nothing else). While working on this I realised that this was actually what splice() did, so overall this feels like a major improvement in naming consistency.

With those functions in place we can deprecate map_dfr() and map_dfc() which are actually "flat" map functions because they combine, rather than simplify, the results. They have never actually belonged with map_int() and friends because they don't have the restriction that .f needs to return a length-1 results. This also strongly implies that flat_map() would just be map_c() and is thus not necessary.

Fixes map_dfc() fails when the result contains list columns #376 by deprecating map_dfc()
Fixes Bring back flatmap #405 by clearly ruling against map_c()
Fixes map_dfr() column binds vectors #472 by deprecating map_dfr()
Fixes Is flattening c + splicing? #575 by introducing list_c(), list_rbind(), and list_cbind()
Fixes Reconsider flatten_dfc() and flatten_dfr() #757 by deprecating flatten_dfr() and flatten_dfc()
Fixes Use vctrs instead of dplyr #758 by introducing list_rbind() and list_cbind()
Part of Plan for flatten(), simplify() and friends #900

Updated to reflect that we no longer believe that _cbind(), _rbind() and _c() are necessary as they're very lightweight wrappers.

* Use `modify_if` * Use `vec_unchop()` * Test * Add `name_spec` argument * Simplify `splice_if` using `name_spec`

DavisVaughan · 2022-09-07T22:54:28Z

NAMESPACE

@@ -129,8 +129,12 @@ export(lift_lv)
 export(lift_vd)


High level thought:

I really like list_c(), list_cbind(), and list_rbind() as the main high level functions here.

Since flat-map and dfc/dfr operations are somewhat rare, I would actually be very happy with:

x |> map(f) |> list_c() x |> map(f) |> list_rbind() x |> map(f) |> list_cbind()

i.e. no need for map_c(), map_cbind(), and map_rbind().

I feel like those may clutter the map API for little benefit, especially considering you need to add map2/pmap variants, which greatly expands the number of options.

I really would love if all map*() functions had the invariant of length(x) == length(map*(x, f)), and we'd get there eventually by removing map_dfr/dfc().

The list_c/rbind/cbind() tools just feel semantically separate from map(), since they are something you do after the map or on a generic list.

Like, list_c() and friends will be very useful to call on furrr::future_map() results too, but I'd rather not expand the furrr API with future_map_c(), future_map_cbind(), map2/pmap variants, etc if we don't have to, and it doesn't really seem like we need to

I think the other reason I think these are separate is because map_c() physically can't be faster than sequential calls to map() + list_c(), because you need all of the results from the map() to be able to compute the output size that we'd get from list_c(), and to me that suggests they shouldn't be merged together

Yeah, agreed. I think the length(x) == length(map*(x, f)) invariant is key.

x |> map_vec(f) feels a bit like this because it'll begin by being implemented as x |> map(f) |> list_simplify() but I think that's more of a current implementation detail. Especially if you do supply a ptype we could potentially implement map_vec() much more efficiently.

R/list-combine.R

DavisVaughan · 2022-09-07T23:59:59Z

R/list-combine.R

+  check_is_list(x)
+  vctrs::vec_unchop(x, ptype = ptype)


Is the vec_unchop() error here bad enough that you needed check_is_list()?

I've left it for now for consistency with the others.

R/list-flatten.R

R/map-df.R

tests/testthat/test-list-combine.R

Co-authored-by: Davis Vaughan <davis@rstudio.com>

Conflicts: NEWS.md R/map.R man/imap.Rd man/map.Rd man/map2.Rd man/pmap.Rd

lionel- · 2022-09-08T13:33:47Z

R/list-combine.R

+#' @param ptype An optional prototype to ensure that the output type is always
+#'   the same.
+#' @param id By default, `names(x)` are lost. Alternatively, supply a string
+#'   and the names will be saved into a column with name `id`. If `id`


This makes it sound like the column name is always named id. Perhaps:

Alternatively, supply id with a column name into which the names will be saved.

I'll use {id} since that's a convention we use elsewhere and should be familiar to many readers.

lionel- · 2022-09-08T13:36:04Z

R/list-combine.R

+  vec_check_list(x)
+  vctrs::vec_cbind(!!!x, .name_repair = name_repair, .size = size, .call = current_env())
+}
+
+#' @export
+#' @rdname list_c
+list_rbind <- function(x, id = rlang::zap(), ptype = NULL) {
+  vec_check_list(x)
+  vctrs::vec_rbind(!!!x, .names_to = id, .ptype = ptype, .call = current_env())
+}


Do we want to require each element to be a data frame, or do we embrace the way vec_rbind() and vec_cbind() convert vectors to data frames?

Let's try requiring data frames.

R/list-flatten.R

lionel- · 2022-09-08T13:44:51Z

R/list-flatten.R

+  x <- modify_if(x, vec_is_list, identity, .else = list)
+  vec_unchop(
+    x,
+    ptype = list(),


I'm thinking this should be ptype = x (the original x) to preserve the input type.

Can this ever be something than a list though? x can't be a data frame, and it seems unlikely to be a list_of unless it's a list_of<list>?

oh it has to be a list, which we check with vec_check_list(). But it could be some kind of vctrs list. I agree that list_of is unlikely until we start supporting more complex type definitions.

I think my concern is that it might make sense to preserve types in the future if we have more vctrs lists, but it will be difficult to change then, so better do it now.

Can we assume a list() can always be coerced to x, even if x is a subtype of list? I don't think we can, so this seems risky?

But I tried it anyway, and then I got:

list_flatten(list_of(list(1, 2, 3), list(4), list(4))) #> Error: #> ! Can't convert <list> to <list_of<list>>.

So I'm going to leave as is for this PR and we can revisit in the future.

ah yes. I think the proper way would be to initialise the ptype of the input to the full output size, then assign elements into it.

R/map-df.R

lionel- · 2022-09-08T13:48:48Z

R/map-if-at.R

+    return(map(.x, .f, ...))
+  }
+
+  # Should this be replaced with a generic way of figuring out atomic


Can we now use vec_is_list()?

Yeah, tracking in #920

lionel-

🎉

DavisVaughan · 2022-09-08T18:59:08Z

R/reduce.R

-#'   map_dfr(~ tibble(value = .x, step = 1:100), .id = "simulation") %>%
+#'   map_rbind(~ tibble(value = .x, step = 1:100), .id = "simulation") %>%


Needs an update, not map_rbind() anymore

Good catch. IMO it reads better now that list_rbind() is a separate step.

R/list-combine.R

Conflicts: NEWS.md R/map.R man/map.Rd man/map2.Rd man/pmap.Rd

hadley added 12 commits September 2, 2022 15:01

Implement list_flatten()/_c()/_rbind()/_cbind()

078c80e

Deprecate flatten_*

50c74f3

Add news bullets

13cf2c3

Move _dfc() and _dfr() to own file

aeba6bc

Polish list_flatten() implementation

87304a7

* Use `modify_if` * Use `vec_unchop()` * Test * Add `name_spec` argument * Simplify `splice_if` using `name_spec`

Update connection to splice

c126644

Test list_c() and friends

90ce7e0

Move map_at(), _if(), _depth() to own file

3a5192f

Tweak map docs

bf27747

Implement map_c()/map_cbind()/map_rbind()

96dfd4d

Get R CMD check passing again

7a618c0

Fix narrowing of .f

af378ad

hadley mentioned this pull request Sep 4, 2022

Use vctrs for flattening into atomic vectors #785

Closed

DavisVaughan reviewed Sep 8, 2022

View reviewed changes

hadley and others added 10 commits September 8, 2022 07:56

Apply suggestions from code review

94d7324

Co-authored-by: Davis Vaughan <davis@rstudio.com>

Remove map wrappers

d687da0

Re-document

1e143d2

Improve error reporting

0fee03e

list_flatten() improvements

b6b939e

Doc updates

32ce653

Test that data frames aren't ok

cf301ae

Merge commit 'c3ad48c251d6909bf4710cee287e5e14c3428aa6'

3cd7a1a

Conflicts: NEWS.md R/map.R man/imap.Rd man/map.Rd man/map2.Rd man/pmap.Rd

stringsAsFactors = ugh

e058e71

Update news and function reference

68c72eb

hadley marked this pull request as ready for review September 8, 2022 13:15

Tweak docs

a209d5c

lionel- reviewed Sep 8, 2022

View reviewed changes

hadley added 3 commits September 8, 2022 08:59

Doc tweak

10077a4

Update/finish deprecation messages

91a76f0

Use map_if instead of modify_if

eb288c9

Require data frames

c71fcc8

lionel- mentioned this pull request Sep 8, 2022

Draft list_flatten() r-lib/vctrs#1214

Closed

lionel- approved these changes Sep 8, 2022

View reviewed changes

hadley mentioned this pull request Sep 8, 2022

rlang reconciliation #320

Closed

13 tasks

DavisVaughan approved these changes Sep 8, 2022

View reviewed changes

hadley added 2 commits September 8, 2022 16:24

Doc fixes

dc8ca02

Merge commit 'd2896e2a2951f40f6b87f2a530eb4ee2dda0b38f'

5206ecd

Conflicts: NEWS.md R/map.R man/map.Rd man/map2.Rd man/pmap.Rd

hadley merged commit 4f78bd3 into main Sep 8, 2022

hadley deleted the flatten branch September 8, 2022 21:38

mlane3 mentioned this pull request Nov 11, 2022

Concerns about if 1.0.0 really is user focused. #1001

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework flattening #912

Rework flattening #912

hadley commented Sep 4, 2022 •

edited

DavisVaughan Sep 7, 2022

DavisVaughan Sep 7, 2022

DavisVaughan Sep 8, 2022

hadley Sep 8, 2022

DavisVaughan Sep 7, 2022

hadley Sep 8, 2022

lionel- Sep 8, 2022

hadley Sep 8, 2022

lionel- Sep 8, 2022

hadley Sep 8, 2022

lionel- Sep 8, 2022

hadley Sep 8, 2022

lionel- Sep 8, 2022

hadley Sep 8, 2022

lionel- Sep 8, 2022

lionel- Sep 8, 2022

hadley Sep 8, 2022

lionel- left a comment

DavisVaughan Sep 8, 2022

hadley Sep 8, 2022

		#' map_dfr(~ tibble(value = .x, step = 1:100), .id = "simulation") %>%
		#' map_rbind(~ tibble(value = .x, step = 1:100), .id = "simulation") %>%

Rework flattening #912

Rework flattening #912

Conversation

hadley commented Sep 4, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lionel- left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadley commented Sep 4, 2022 •

edited