Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is flattening c + splicing? #575

Closed
hadley opened this issue Nov 19, 2018 · 19 comments · Fixed by #912
Closed

Is flattening c + splicing? #575

hadley opened this issue Nov 19, 2018 · 19 comments · Fixed by #912

Comments

@hadley
Copy link
Member

hadley commented Nov 19, 2018

i.e. should flatten(x, .type = foo) be equivalent to vec_c(!!!x, .type = foo) or should it be equivalent to simplify()?

@lionel-
Copy link
Member

lionel- commented Nov 22, 2018

I think the difference between flatten() and simplify() is that the latter always returns a vector of the same length as the input. So these are equivalent:

map(x, f, .type = int())
simplify(map(x, f), .type = int())

(Though when the .type is supplied, I guess simplify() is a straight vec_cast(), so maybe it shouldn't have such a parameter.)

@hadley
Copy link
Member Author

hadley commented Nov 22, 2018

What function do you use to turn list(1, list(2), 3, list(4)) into list(1, 2, 3, 4) ?

@lionel-
Copy link
Member

lionel- commented Nov 22, 2018

That is flatten(). Or do you mean a function that flattens but also requires flattened lists to be length 1? Do we need such a function?

@hadley
Copy link
Member Author

hadley commented Nov 22, 2018

Oh so what's the equivalent of vec_c(!!!x, .type = foo) then?

@lionel-
Copy link
Member

lionel- commented Nov 22, 2018

Under simple scenarios, that appears to be flatten(x, .type = foo). Taking your example data:

x <- list(1, list(2), 3, list(4))

If you pass .type = list(), you get these successive transformations:

tmp <- list(list(1), list(2), list(3), list(4))
out <- list(1, 2, 3, 4)

If you pass .type = int()), you get these:

tmp <- list(1L, 2L, 3L, 4L)
out <- int(1L, 2L, 3L, 4L)

However I'm not sure that's the correct behaviour. For instance, what should flatten(list(mtcars, list(mtcars), mtcars)) return? The rlang behaviour is to only flatten lists, or equivalently it wraps non-list objects in a lists before concatenation.

I'm not sure how to translate this in vec_c(). I guess the simplest way is to wrap all non-list objects in a list. This could be parameterised with a .if = type that leaves alone all objects for which vec_is(x, type) is TRUE, and rewraps all other objects in a vector of type type and size 1. And then it calls vec_c() on the result, possibly with type = .type. Does that make sense?

@hadley
Copy link
Member Author

hadley commented Nov 22, 2018

So maybe flatten() always inputs and outputs a list, but removes a single layer of nesting. The output will be longer than the input when it contains any lists?

And then simplify(x) is equivalent to vec_c(!!!x)?

I'm not sure we need a function here where the input and output have equal length? (That seems antithetical to flattening/simplifying)

@lionel-
Copy link
Member

lionel- commented Nov 22, 2018

That makes sense. And then map_int() is map() + vec_cast(type = int()) and flatten_int() is flatten() + vec_cast(type = int())?

And if int(...) == vec_c(..., .type = int()), this means that we no longer need the typed variants (we can keep the current ones but never add new ones). These would be equivalent:

map_int(x, f)
int(map(x, f))
x %>% map(f) %>% int()

I.e. the length constraint is done by map(), and then we can use unconstrained int().

@lionel-
Copy link
Member

lionel- commented Nov 22, 2018

(or maybe the rule is that we keep typed variants when they have a computational advantage)

@hadley
Copy link
Member Author

hadley commented Nov 22, 2018

Yeah, that sounds good.

Are we sure we have the names around the right way? Because flap_map(x, f) would be equivalent to map(x, f) %>% simplify()?

@lionel-
Copy link
Member

lionel- commented Nov 22, 2018

I think flatten() and simplify() have the right name. The former only flattens (always returns a list) while the latter returns the common type. However I'm not sure what flat_map() should do... Do we really need it though, given that flatten() and simplify() are one function call away?

Actually one argument against using simplify() for flatmap() is that it wouldn't be type-stable, since we don't know what's in x. Perhaps only _vec functions should use vec_c() and simplify(), to make it clear you're programming against the general vector interface instead of a specific vector type?

@lionel-
Copy link
Member

lionel- commented Dec 4, 2018

From Michel on Slack:

# list of factors
xf <- lapply(as.factor(letters[1:3]), identity)

# vector of factors
unlist(xf, recursive = FALSE)

# list of integers
purrr::flatten(xf)

# vector of integers, converted to character
purrr::flatten_chr(xf)

# list of factors
rlang::flatten(xf)

# error
rlang::flatten_chr(xf)

@lionel-
Copy link
Member

lionel- commented Feb 8, 2019

So maybe flatten() always inputs and outputs a list, but removes a single layer of nesting.

For a generic flatten_vec(), the rule might be that only the input type is considered a recursive type, all others are treated as atomic.

So flatten_vec(list(list(1), mtcars)) returns list(1, mtcars). On the other hand, when passed a tibble, it would flatten all df-cols, but not the list-cols.

@hadley
Copy link
Member Author

hadley commented Feb 8, 2019

The primary distinction between flatten() and simplify() is that flatten() preserves the input type, and simplify() preserves the input size. Both flatten() and simplify() will always succeed — to make simplify() safe, you'd need to provide a .ptype.

@lionel-
Copy link
Member

lionel- commented Oct 29, 2019

I no longer think flatten() should be generic based on input type. It should just work with lists (or subtypes) and flatten any elements that is a subtype of list. A rough predicate for list subtype might be:

is_list <- function(x) typeof(x) == "list" && !is.data.frame(x) && vec_is(x)

@hadley
Copy link
Member Author

hadley commented Oct 29, 2019

Or inherits(x, “list”)?

@lionel-
Copy link
Member

lionel- commented Oct 29, 2019

This would be covered by vec_is().

@hadley
Copy link
Member Author

hadley commented Oct 30, 2019

I think you're missing my point — the only things where inherits(x, "list") is true are for bare lists, or S3 objects that have lists in their subclasses. It seems to me that it separates out lists from data frames, records, and list-scalars in a single step, without any additional conditions.

@lionel-
Copy link
Member

lionel- commented Oct 30, 2019

You're right I was confused.

@DavisVaughan
Copy link
Member

DavisVaughan commented Oct 30, 2019

R implementation of flatten() and flatten_vec() which is implemented as a "flatten() + reduction into a ptype"

library(vctrs)
library(rlang)
library(purrr)

is_list <- function(x) {
  inherits(x, "list")
}

# - `x` must be a list as defined by `is_list()`
# - `vec_ptype(flatten(x)) == list()`

flatten <- function(x) {
  if (!is_list(x)) {
    abort("`x` must be a list or list subclass.")
  }
  
  # Gather output size
  size <- vec_size(x)
  for (i in seq_along(x)) {
    elt <- x[[i]]
    
    if (is_list(elt)) {
      size <- size + vec_size(elt) - 1L
    }
  }
  
  # Always returns a list
  idx <- 1L
  out <- vec_init(list(), n = size)
  
  # If atomic, insert into `out` immediately
  # If list, flatten by inserting each element into `out`
  for (i in seq_along(x)) {
    elt <- x[[i]]
    
    if (is_list(elt)) {
      for (j in seq_along(elt)) {
        out[[idx]] <- elt[[j]]
        idx <- idx + 1L
      }
      next
    }
    
    out[[idx]] <- elt
    idx <- idx + 1L
  }
  
  out
}

# - `x` must be a list as defined by `is_list()`
# - `vec_ptype(flatten_vec(x)) == ptype %||% vec_ptype_common(!!! flatten(x))`

flatten_vec <- function(x, ptype = NULL) {
  x <- flatten(x)
  
  sizes <- map_int(x, vec_size)
  size <- sum(sizes)
  
  ptype <- ptype %||% vec_ptype_common(!!! x)
  
  out <- vec_init(ptype, n = size)
  
  pos <- 1L
  for (i in seq_along(x)) {
    size <- sizes[[i]]

    if (size == 0L) {
      next
    }

    idx <- pos + 0L:(size - 1L)
    
    vec_slice(out, idx) <- x[[i]]
    
    pos <- pos + size
  }
  
  out
} 

flatten_int <- function(x) {
  flatten_vec(x, ptype = integer())
}

With flatten()

# - return value of flatten() is a list
# - flatten() must take a list as input

df <- data.frame(x = 1:2)

flatten(1)
#> Error: `x` must be a list or list subclass.

flatten(df)
#> Error: `x` must be a list or list subclass.

flatten(list(1))
#> [[1]]
#> [1] 1

flatten(list(df))
#> [[1]]
#>   x
#> 1 1
#> 2 2

flatten(list(1, list(1)))
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 1

flatten(list(1, list(1:2, 2:3), list(3:4)))
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 1 2
#> 
#> [[3]]
#> [1] 2 3
#> 
#> [[4]]
#> [1] 3 4

# flattens just 1 level
flatten(list(1, list(list(1))))
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [[2]][[1]]
#> [1] 1

With flatten_int()

# flatten_int() is a flatten() followed by insertion into an integer vector
# output size is determined after the flatten()

flatten_int(1)
#> Error: `x` must be a list or list subclass.

flatten_int(list(1))
#> [1] 1

flatten_int(list("x"))
#> Error: No common type for `value` <character> and `x` <integer>.

flatten_int(list(1, list(1)))
#> [1] 1 1

flatten_int(list(1, list(1:5, 2:3)))
#> [1] 1 1 2 3 4 5 2 3

# only 1 layer of flattening is allowed
flatten_int(list(1, list(list(1))))
#> Error: No common type for `value` <list> and `x` <integer>.

Generic

flatten_vec(list(1))
#> [1] 1

flatten_vec(list(1, list(2, 1:5)))
#> [1] 1 2 1 2 3 4 5

flatten_vec(list(Sys.Date(), list(Sys.Date() + 0:2, Sys.Date())))
#> [1] "2019-10-30" "2019-10-30" "2019-10-31" "2019-11-01" "2019-10-30"

flatten_vec(list(df, list(df, df)))
#>   x
#> 1 1
#> 2 2
#> 3 1
#> 4 2
#> 5 1
#> 6 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants