Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive map over nested lists #720

Closed
vspinu opened this issue Nov 20, 2019 · 16 comments · Fixed by #946
Closed

Recursive map over nested lists #720

vspinu opened this issue Nov 20, 2019 · 16 comments · Fixed by #946
Labels
feature a feature request or enhancement

Comments

@vspinu
Copy link
Member

vspinu commented Nov 20, 2019

A common scenario for me is to map over nested lists. AFAIK there is no built in or purr functionality for this.

Basically an rapply version but which would operate not just on leaves but also on the inner nodes.

Example:

rmap(sessionInfo(), unclass)
@vspinu
Copy link
Member Author

vspinu commented Nov 20, 2019

To make this fully featured one would ideally need two versions one pre and one post visiting.

See clojure's walk functionality. With examples postwalk-demo and prewalk-demo.

@lionel- lionel- added the feature a feature request or enhancement label Jan 20, 2020
@hadley
Copy link
Member

hadley commented Aug 24, 2022

I think this is out of scope for purrr just because it's so hard to get the general pattern right. In my experience, there's been relatively little code in common across the places I've had to recursively walk across a list.

@hadley hadley closed this as completed Aug 24, 2022
@hyiltiz
Copy link

hyiltiz commented Aug 29, 2022

I think something like Haskell's reduce/fold functionals would be able to cover most of the cases needed for recursion (python/clj's walk should be similar). Based on the binary function argument, the final result could be a single value (scalar), a flat list/vector, or just the original tree (like Haskell's fmap). If this is too general to implement and too abstract to be useful for typical R users, then a fmap-like function that preserves the shape of the original data structure but maps values at each node/leaves to another would be quite welcome.

@vspinu
Copy link
Member Author

vspinu commented Aug 30, 2022

This is my current basic implementation. All one needs is a pre, post and leaf mappers for a fully generic case. What is missing is one extra argument for the "leaf" check and more care with attributes.

#' @details `rmap()`: is a recursive version of map which applies traverses a
#'   nested tree in the depth-first-order and applies `.fpre` to each node on
#'   the way down and `.fpost` on the way up and `.fleaf` to non-recursive leafs.
#' @rdname programming
#' @param .x a recursive object
#' @param .fpre,.fpost,.fleaf purrr style functions applied during down-ward
#'   (.fpre) and up-ward (.fpost) traversal of the tree. .fleaf is applied to
#'   tails only. All default to `identity`.
#' @param ... extra parameters supplied to `.fpre` and `.fpost` and .fleaf.
#' @export
rmap <- function(.x, .fpre = NULL, .fpost = NULL, .fleaf = NULL, ...) {
  .fpost <- if (is.null(.fpost)) identity
             else purrr::as_mapper(.fpost, ...)
  .fpre <- if (is.null(.fpre)) identity
           else purrr::as_mapper(.fpre, ...)
  .fleaf <- if(is.null(.fleaf)) identity
            else purrr::as_mapper(.fleaf, ...)
  worker <- function(x) {
    x <- .fpre(x)
    if (is.recursive(x) && !(is.language(x) || is.function(x))) {
      y <- lapply(x, worker)
      attributes(y) <- attributes(x)
      .fpost(y)
    } else {
      .fpost(.fleaf(x))
    }
  }
  .x <- .fpre(.x)
  .y <- lapply(.x, worker)
  attributes(.y) <- attributes(.x)
  invisible(.fpost(.y))
}

@hadley
Copy link
Member

hadley commented Aug 30, 2022

(FWIW I'm pretty sure you don't want is.recursive() because it returns TRUE for functions 😬. Probably better to use inherits(x, "list") or vctrs::vec_is_list() to also avoid recursing into data frame columns, which won't generally work with the attribute restoration you're doing.)

@vspinu
Copy link
Member Author

vspinu commented Aug 30, 2022

(FWIW I'm pretty sure you don't want is.recursive() because it returns TRUE for functions

Sure. This is why I check for language and function in the above implementation 😉

A better option is to have a custom is_leaf as an extra custom argument. Being able to recure into data.frames or to treat classed lists is leafs is a feature.

rmap <- function(.x, .fpre = NULL, .fpost = NULL, .fleaf = NULL, ...,
                 is_leaf = function(x) {
                   is.recursive(x) && !(is.language(x) || is.function(x))
                 }) {
  .fpost <- if (is.null(.fpost)) identity
             else purrr::as_mapper(.fpost, ...)
  .fpre <- if (is.null(.fpre)) identity
           else purrr::as_mapper(.fpre, ...)
  .fleaf <- if(is.null(.fleaf)) identity
            else purrr::as_mapper(.fleaf, ...)
  worker <- function(x) {
    x <- .fpre(x)
    if (is_leaf(x)) {
      .fpost(.fleaf(x))
    } else {
      y <- lapply(x, worker)
      attributes(y) <- attributes(x)
      .fpost(y)
    }
  }
  .x <- .fpre(.x)
  .y <- lapply(.x, worker)
  attributes(.y) <- attributes(.x)
  invisible(.fpost(.y))
}

As you see it's not that complicated. I hope you could reconsider adding this into the package. One way or another I end up needing this in most of my non-trivial projects.

@hadley
Copy link
Member

hadley commented Aug 30, 2022

Can you show me a few examples of how you use it? I'm still not convinced that this level of generality is at a good place on the compactness vs readability spectrum.

@vspinu
Copy link
Member Author

vspinu commented Sep 1, 2022

Whenever one needs to alter selectively elements of a nested structure (jsons, shiny UI elements etc) this function is pretty much the only way to go.

Three examples from my recent projects:

  1. Cleaning recipy environment of data in order to avoid saving all the captured garbage (to adress the large size of recipes issue:
clean_recipe <- function(rp) {
  rp$steps <-
    rmap(rp$steps, .fleaf = function(x) {
      if (!is.null(attr(x, ".Environment"))) {
        attr(x, ".Environment") <- .GlobalEnv
      }
      x
    })
  if (!is.null(rp$template))
    rp$template <- head(rp$template, 10)
  rp
}
  1. Processing shiny html elements nested classes:
out <- tabsetPanel(...)
out$children[[1]] <-
  rmap(out$children[[1]], .fpost = function(x) {
    if (is.list(x) && identical(x$name, "a"))
      x$attribs$class <- paste("nav-link", x$attribs$class)
    x
  })
  1. Recursive sort of nested jsons. I needed to normalize two jsons for testing purposes. Two jsons are equivalent if fields are the same ignoring the order.
## recursive sort
rsort <- function(x) {
  rmap(x, .fpost = function(el) {
    if (is.data.table(el))
      el <- as.data.frame(el)
    if (!is.null(names(el)))
      el <- el[sort(names(el))]
    if (is.list(el) && !is.data.frame(el))
      el <- discard(el, ~ is_empty(.) || (is.data.frame(.) && nrow(.) == 0))
    el
  })
}

@hadley hadley reopened this Sep 1, 2022
@hyiltiz
Copy link

hyiltiz commented Sep 1, 2022

3rd example on json is something I'd end up doing quite frequently; though not necessarily on json, since any nested list (like sexp, yaml, file system direcatories) could be recursed similarly.

@hadley
Copy link
Member

hadley commented Sep 2, 2022

I think I don't find rmap() very appealing because I'd prefer to write the recursion myself since it's (a) not that much code and (b) a pattern that I think more readers could recognise (rather than relying on a special map).

clean_recipe <- function(rp) {
  rp$steps <- clean_recipe_steps(rp$steps)
  if (!is.null(rp$template))
    rp$template <- head(rp$template, 10)
  rp
}

clean_recipe_steps <- function(x) {
  if (is.list(x)) {
    x[] <- map(x, clean_recipe_steps)
  } else if (is.atomic(x)) {
    if (!is.null(attr(x, ".Environment"))) {
      attr(x, ".Environment") <- .GlobalEnv
    }
    x
  } else {
    x
  }
}

# -------------------------

out <- tabsetPanel(...)
out$children[[1]] <- add_nav_link(out$children[[1]])

add_nav_link <- function(x) {
  if (is.list(x)) {
    x[] <- map(x, add_nav_link)
    
    if (identical(x$name, "a"))
      x$attribs$class <- paste("nav-link", x$attribs$class)
  } else {
    x
  }
}

# --------------------------

rsort <- function(x) {
  if (is.atomic(x)) {
    return(x)
  }
  
  if (is.data.table(x))
    x <- as.data.frame(x)

  if (!is.null(names(x)))
    x <- x[sort(names(x))]
  
  if (inherits(x, "list")) {
    x[] <- map(x, rsort)
    x <- discard(x, ~ NROW(.x) == 0) 
  }
  
  x
}

I think I find the other map functions more compelling because they remove more boilerplate that's the same from problem to problem. Every recursion problem is a little different so I feel like making the repeated code invisible makes it harder to understand overall.

@hadley
Copy link
Member

hadley commented Sep 2, 2022

I rewrote fmap() to better understand it and align it a bit better with the purrr style. That make me realise it probably makes more sense to call it rmodify() as it is recursively calling modify().

rmodify <- function(
    .x, 
    .fpre = identity, 
    .fpost = identity, 
    .fleaf = identity, ...,
    is_leaf = is_recursive
  ) {
  
  .fpost <- .fpost %||% purrr::as_mapper(.fpost, ...)
  .fpre <- .fpre %||% purrr::as_mapper(.fpre, ...)
  .fleaf <- .fleaf %||% purrr::as_mapper(.fleaf, ...)

  worker <- function(x) {
    out <- .fpre(x)
    if (is_leaf(out)) {
      out <- .fleaf(x)
    } else {
      out <- modify(f, worker)
    }
    .fpost(out)
  }
  out <- .fpre(.x)
  out <- modify(out, worker)
  .fpost(out)
}

is_recursive <- function(x) {
  inherits(x, "list") || is.pairlist(x) || is.expression(x) || is.call(x)
}

@vspinu
Copy link
Member Author

vspinu commented Sep 2, 2022

The is_leaf should be a negation of recursive !is.recursive(x) || s.language(x) || is.function(x).

I'd prefer to write the recursion myself

I see your point, but my experience is different. Reading rmap(.fpre = ...), or rmap(.fleaf=) is semantically very clear to me. It conveys the tree processing succinctly and cleanly. Just like modify(...) conveys the pattern better thanx <- ...; for (i in seq_along(x)) x[i] <- ...}.

probably makes more sense to call it rmodify() as it is recursively calling modify()

Indeed. It seems to fit better into midify family functions. modify is a more generic version of map, which we want here for internal node processing. The difference is that with .fpost one could change the container type, thus achieving the same effect as map_xyz do.

@hadley
Copy link
Member

hadley commented Sep 2, 2022

I see your point, but my experience is different. Reading rmap(.fpre = ...), or rmap(.fleaf=) is semantically very clear to me.

I don't disagree, but my gut feeling is that the proportion of R programmers that this is true for is relatively small, small enough that it doesn't feel like quite the right fit for purrr. That said, the cost for including it is low, but we are currently on "simplifying purrr" cycle which makes me reluctant to add it now.

@hyiltiz
Copy link

hyiltiz commented Sep 3, 2022

If we can't have Haskell's fold/reduce, could we please have fmap? fmap shouldn't overcomplicate the interface or introduce much maintenance burden.

@hadley
Copy link
Member

hadley commented Sep 8, 2022

@hyiltiz if you want that I'd recommend starting with an issue that describes exactly what fmap is, how it would map to R, and why it's important.

@hyiltiz
Copy link

hyiltiz commented Sep 11, 2022

The fmap I am referring to, when applied to tree-like structures (like nested lists), is a subset of the rmap implemented above -- subset in the sense that there only one of .fpre, .fpost, .fleaf is needed. It is a functional that traverses (in unspecified traversal order) and applies a custom input function to each node/leaf (whatever stores content in the structure) and returns the same structure.

While it could also be applied to other structures like vectors, tibbles and matrices, what fmap should do is not unique and is dependent on its object and its purpose. I think that is out of scope for purrr, and thus doesn't merit a new issue by itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants