Is flattening c + splicing? #575

hadley · 2018-11-19T17:14:08Z

i.e. should flatten(x, .type = foo) be equivalent to vec_c(!!!x, .type = foo) or should it be equivalent to simplify()?

The text was updated successfully, but these errors were encountered:

lionel- · 2018-11-22T09:52:54Z

I think the difference between flatten() and simplify() is that the latter always returns a vector of the same length as the input. So these are equivalent:

map(x, f, .type = int())
simplify(map(x, f), .type = int())

(Though when the .type is supplied, I guess simplify() is a straight vec_cast(), so maybe it shouldn't have such a parameter.)

hadley · 2018-11-22T14:43:23Z

What function do you use to turn list(1, list(2), 3, list(4)) into list(1, 2, 3, 4) ?

lionel- · 2018-11-22T14:59:57Z

That is flatten(). Or do you mean a function that flattens but also requires flattened lists to be length 1? Do we need such a function?

hadley · 2018-11-22T15:04:04Z

Oh so what's the equivalent of vec_c(!!!x, .type = foo) then?

lionel- · 2018-11-22T15:49:20Z

Under simple scenarios, that appears to be flatten(x, .type = foo). Taking your example data:

x <- list(1, list(2), 3, list(4))

If you pass .type = list(), you get these successive transformations:

tmp <- list(list(1), list(2), list(3), list(4))
out <- list(1, 2, 3, 4)

If you pass .type = int()), you get these:

tmp <- list(1L, 2L, 3L, 4L)
out <- int(1L, 2L, 3L, 4L)

However I'm not sure that's the correct behaviour. For instance, what should flatten(list(mtcars, list(mtcars), mtcars)) return? The rlang behaviour is to only flatten lists, or equivalently it wraps non-list objects in a lists before concatenation.

I'm not sure how to translate this in vec_c(). I guess the simplest way is to wrap all non-list objects in a list. This could be parameterised with a .if = type that leaves alone all objects for which vec_is(x, type) is TRUE, and rewraps all other objects in a vector of type type and size 1. And then it calls vec_c() on the result, possibly with type = .type. Does that make sense?

hadley · 2018-11-22T16:00:05Z

So maybe flatten() always inputs and outputs a list, but removes a single layer of nesting. The output will be longer than the input when it contains any lists?

And then simplify(x) is equivalent to vec_c(!!!x)?

I'm not sure we need a function here where the input and output have equal length? (That seems antithetical to flattening/simplifying)

lionel- · 2018-11-22T16:15:19Z

That makes sense. And then map_int() is map() + vec_cast(type = int()) and flatten_int() is flatten() + vec_cast(type = int())?

And if int(...) == vec_c(..., .type = int()), this means that we no longer need the typed variants (we can keep the current ones but never add new ones). These would be equivalent:

map_int(x, f)
int(map(x, f))
x %>% map(f) %>% int()

I.e. the length constraint is done by map(), and then we can use unconstrained int().

lionel- · 2018-11-22T16:16:23Z

(or maybe the rule is that we keep typed variants when they have a computational advantage)

hadley · 2018-11-22T16:21:54Z

Yeah, that sounds good.

Are we sure we have the names around the right way? Because flap_map(x, f) would be equivalent to map(x, f) %>% simplify()?

lionel- · 2018-11-22T16:52:27Z

I think flatten() and simplify() have the right name. The former only flattens (always returns a list) while the latter returns the common type. However I'm not sure what flat_map() should do... Do we really need it though, given that flatten() and simplify() are one function call away?

Actually one argument against using simplify() for flatmap() is that it wouldn't be type-stable, since we don't know what's in x. Perhaps only _vec functions should use vec_c() and simplify(), to make it clear you're programming against the general vector interface instead of a specific vector type?

lionel- · 2018-12-04T11:14:52Z

From Michel on Slack:

# list of factors
xf <- lapply(as.factor(letters[1:3]), identity)

# vector of factors
unlist(xf, recursive = FALSE)

# list of integers
purrr::flatten(xf)

# vector of integers, converted to character
purrr::flatten_chr(xf)

# list of factors
rlang::flatten(xf)

# error
rlang::flatten_chr(xf)

lionel- · 2019-02-08T10:54:56Z

So maybe flatten() always inputs and outputs a list, but removes a single layer of nesting.

For a generic flatten_vec(), the rule might be that only the input type is considered a recursive type, all others are treated as atomic.

So flatten_vec(list(list(1), mtcars)) returns list(1, mtcars). On the other hand, when passed a tibble, it would flatten all df-cols, but not the list-cols.

hadley · 2019-02-08T15:09:52Z

The primary distinction between flatten() and simplify() is that flatten() preserves the input type, and simplify() preserves the input size. Both flatten() and simplify() will always succeed — to make simplify() safe, you'd need to provide a .ptype.

lionel- · 2019-10-29T10:33:40Z

I no longer think flatten() should be generic based on input type. It should just work with lists (or subtypes) and flatten any elements that is a subtype of list. A rough predicate for list subtype might be:

is_list <- function(x) typeof(x) == "list" && !is.data.frame(x) && vec_is(x)

hadley · 2019-10-29T11:01:04Z

Or inherits(x, “list”)?

lionel- · 2019-10-29T13:25:22Z

This would be covered by vec_is().

hadley · 2019-10-30T12:22:12Z

I think you're missing my point — the only things where inherits(x, "list") is true are for bare lists, or S3 objects that have lists in their subclasses. It seems to me that it separates out lists from data frames, records, and list-scalars in a single step, without any additional conditions.

lionel- · 2019-10-30T13:40:11Z

You're right I was confused.

DavisVaughan · 2019-10-30T18:45:39Z

R implementation of flatten() and flatten_vec() which is implemented as a "flatten() + reduction into a ptype"

library(vctrs)
library(rlang)
library(purrr)

is_list <- function(x) {
  inherits(x, "list")
}

# - `x` must be a list as defined by `is_list()`
# - `vec_ptype(flatten(x)) == list()`

flatten <- function(x) {
  if (!is_list(x)) {
    abort("`x` must be a list or list subclass.")
  }
  
  # Gather output size
  size <- vec_size(x)
  for (i in seq_along(x)) {
    elt <- x[[i]]
    
    if (is_list(elt)) {
      size <- size + vec_size(elt) - 1L
    }
  }
  
  # Always returns a list
  idx <- 1L
  out <- vec_init(list(), n = size)
  
  # If atomic, insert into `out` immediately
  # If list, flatten by inserting each element into `out`
  for (i in seq_along(x)) {
    elt <- x[[i]]
    
    if (is_list(elt)) {
      for (j in seq_along(elt)) {
        out[[idx]] <- elt[[j]]
        idx <- idx + 1L
      }
      next
    }
    
    out[[idx]] <- elt
    idx <- idx + 1L
  }
  
  out
}

# - `x` must be a list as defined by `is_list()`
# - `vec_ptype(flatten_vec(x)) == ptype %||% vec_ptype_common(!!! flatten(x))`

flatten_vec <- function(x, ptype = NULL) {
  x <- flatten(x)
  
  sizes <- map_int(x, vec_size)
  size <- sum(sizes)
  
  ptype <- ptype %||% vec_ptype_common(!!! x)
  
  out <- vec_init(ptype, n = size)
  
  pos <- 1L
  for (i in seq_along(x)) {
    size <- sizes[[i]]

    if (size == 0L) {
      next
    }

    idx <- pos + 0L:(size - 1L)
    
    vec_slice(out, idx) <- x[[i]]
    
    pos <- pos + size
  }
  
  out
} 

flatten_int <- function(x) {
  flatten_vec(x, ptype = integer())
}

With flatten()

# - return value of flatten() is a list
# - flatten() must take a list as input

df <- data.frame(x = 1:2)

flatten(1)
#> Error: `x` must be a list or list subclass.

flatten(df)
#> Error: `x` must be a list or list subclass.

flatten(list(1))
#> [[1]]
#> [1] 1

flatten(list(df))
#> [[1]]
#>   x
#> 1 1
#> 2 2

flatten(list(1, list(1)))
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 1

flatten(list(1, list(1:2, 2:3), list(3:4)))
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 1 2
#> 
#> [[3]]
#> [1] 2 3
#> 
#> [[4]]
#> [1] 3 4

# flattens just 1 level
flatten(list(1, list(list(1))))
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [[2]][[1]]
#> [1] 1

With flatten_int()

# flatten_int() is a flatten() followed by insertion into an integer vector
# output size is determined after the flatten()

flatten_int(1)
#> Error: `x` must be a list or list subclass.

flatten_int(list(1))
#> [1] 1

flatten_int(list("x"))
#> Error: No common type for `value` <character> and `x` <integer>.

flatten_int(list(1, list(1)))
#> [1] 1 1

flatten_int(list(1, list(1:5, 2:3)))
#> [1] 1 1 2 3 4 5 2 3

# only 1 layer of flattening is allowed
flatten_int(list(1, list(list(1))))
#> Error: No common type for `value` <list> and `x` <integer>.

Generic

flatten_vec(list(1))
#> [1] 1

flatten_vec(list(1, list(2, 1:5)))
#> [1] 1 2 1 2 3 4 5

flatten_vec(list(Sys.Date(), list(Sys.Date() + 0:2, Sys.Date())))
#> [1] "2019-10-30" "2019-10-30" "2019-10-31" "2019-11-01" "2019-10-30"

flatten_vec(list(df, list(df, df)))
#>   x
#> 1 1
#> 2 2
#> 3 1
#> 4 2
#> 5 1
#> 6 2

lionel- added the vctrs ♣️ label Nov 22, 2018

lionel- mentioned this issue Nov 26, 2018

flatten() fails on lists with unevaluated expressions #557

Closed

lionel- added the flatten 🌎 label Nov 26, 2018

lionel- added this to the vctrs milestone Nov 30, 2018

lionel- mentioned this issue Jun 26, 2019

Flattening, squashing and unpacking r-lib/vctrs#395

Closed

DavisVaughan mentioned this issue Jan 9, 2020

Atomic constructors tidyverse/funs#45

Closed

hadley mentioned this issue Sep 4, 2022

Rework flattening #912

Merged

hadley closed this as completed in #912 Sep 8, 2022

hadley closed this as completed in 4f78bd3 Sep 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is flattening c + splicing? #575

Is flattening c + splicing? #575

hadley commented Nov 19, 2018 •

edited

lionel- commented Nov 22, 2018

hadley commented Nov 22, 2018

lionel- commented Nov 22, 2018

hadley commented Nov 22, 2018

lionel- commented Nov 22, 2018

hadley commented Nov 22, 2018

lionel- commented Nov 22, 2018

lionel- commented Nov 22, 2018

hadley commented Nov 22, 2018

lionel- commented Nov 22, 2018

lionel- commented Dec 4, 2018

lionel- commented Feb 8, 2019 •

edited

hadley commented Feb 8, 2019

lionel- commented Oct 29, 2019

hadley commented Oct 29, 2019

lionel- commented Oct 29, 2019

hadley commented Oct 30, 2019

lionel- commented Oct 30, 2019

DavisVaughan commented Oct 30, 2019 •

edited

Is flattening c + splicing? #575

Is flattening c + splicing? #575

Comments

hadley commented Nov 19, 2018 • edited

lionel- commented Nov 22, 2018

hadley commented Nov 22, 2018

lionel- commented Nov 22, 2018

hadley commented Nov 22, 2018

lionel- commented Nov 22, 2018

hadley commented Nov 22, 2018

lionel- commented Nov 22, 2018

lionel- commented Nov 22, 2018

hadley commented Nov 22, 2018

lionel- commented Nov 22, 2018

lionel- commented Dec 4, 2018

lionel- commented Feb 8, 2019 • edited

hadley commented Feb 8, 2019

lionel- commented Oct 29, 2019

hadley commented Oct 29, 2019

lionel- commented Oct 29, 2019

hadley commented Oct 30, 2019

lionel- commented Oct 30, 2019

DavisVaughan commented Oct 30, 2019 • edited

hadley commented Nov 19, 2018 •

edited

lionel- commented Feb 8, 2019 •

edited

DavisVaughan commented Oct 30, 2019 •

edited