Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R crashes when using tidyr::unnest_wider #1348

Closed
mattnolan001 opened this issue Apr 11, 2022 · 6 comments · Fixed by r-lib/vctrs#1553
Closed

R crashes when using tidyr::unnest_wider #1348

mattnolan001 opened this issue Apr 11, 2022 · 6 comments · Fixed by r-lib/vctrs#1553

Comments

@mattnolan001
Copy link

Running tidyr::unnest_wider as part of a pipe, R crashes with the following error reported,

[91205:91206:20220410,071753.955164:ERROR file_io_posix.cc:148] open /home/matt/.r/crashpad_database/pending/dc2183c4-0851-4c62-908e-7d4e41a2702e.lock: File exists (17)
[91205:91205:20220410,071753.957703:ERROR process_memory_range.cc:86] read out of range
[91205:91205:20220410,071753.957712:ERROR elf_image_reader.cc:558] missing nul-terminator
[91205:91205:20220410,071753.957794:ERROR elf_dynamic_array_reader.h:61] tag not found
[91205:91205:20220410,071753.960132:ERROR elf_dynamic_array_reader.h:61] tag not found
[91205:91205:20220410,071753.960189:ERROR elf_dynamic_array_reader.h:61] tag not found
[91205:91205:20220410,071753.960236:ERROR elf_dynamic_array_reader.h:61] tag not found
[91205:91205:20220410,071753.960281:ERROR elf_dynamic_array_reader.h:61] tag not found

Further description and discussion here: https://stackoverflow.com/questions/71820155/r-crashes-when-using-tidyrunnest-wider

Here is a reproducible example (with thanks to Ben Bolker):

library(tidyverse)
f <- function(n) tibble(neuron=0:(n-1), r.squared = rnorm(n),
     slope = rnorm(n), p.value = rnorm(n))
df <- replicate(1000, f(1000), simplify = FALSE)

dff <- tibble(x=df)
for (i in 1:100) { 
   cat(i, "\n")
   unnest_wider(dff, x)
}
@mattnolan001
Copy link
Author

@lionel- A more reliable reprex is:

library(tidyverse)
f <- function(n) {
  df <- tibble(neuron=0:(n-1), r.squared = rnorm(n),
                        slope = rnorm(n), p.value = rnorm(n))
  df$p.value[2] <- NA
  df
}
df <- replicate(1000, f(1000), simplify = FALSE)

dff <- tibble(x=df)
for (i in 1:100) { 
  cat(i, "\n")
  unnest_wider(dff, x)
}

I've tested this with tidyr from CRAN and with tidtr 1.2.0.9000. It reliably crashes with both.

@DavisVaughan
Copy link
Member

r-lib/vctrs#1553 should fix this

@mattnolan001
Copy link
Author

mattnolan001 commented Apr 12, 2022

Is r-lib/vctrs#1553 included in v1.2.0.9000? This is the number given for the version I just pullled from the main github repo and it still crashes.

@DavisVaughan
Copy link
Member

You'd have to run devtools::install_github("r-lib/vctrs#1553"), then restart R. Then both CRAN tidyr and dev tidyr should work

@mattnolan001
Copy link
Author

Thanks! That works now.

@lionel-
Copy link
Member

lionel- commented Apr 13, 2022

The fix is now on CRAN, thanks for the reprex and report @mattnolan001 @bbolker!

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue May 5, 2022
# vctrs 0.4.1

* OOB errors with `character()` indexes use "that don't exist" instead
  of "past the end" (#1543).

* Fixed memory protection issues related to common type
  determination (#1551, tidyverse/tidyr#1348).


# vctrs 0.4.0

* New experimental `vec_locate_sorted_groups()` for returning the locations of
  groups in sorted order. This is equivalent to, but faster than, calling
  `vec_group_loc()` and then sorting by the `key` column of the result.

* New experimental `vec_locate_matches()` for locating where each observation
  in one vector matches one or more observations in another vector. It is
  similar to `vec_match()`, but returns all matches by default (rather than just
  the first), and can match on binary conditions other than equality. The
  algorithm is inspired by data.table's very fast binary merge procedure.

* The `vec_proxy_equal()`, `vec_proxy_compare()`, and `vec_proxy_order()`
  methods for `vctrs_rcrd` are now applied recursively over the fields (#1503).

* Lossy cast errors now inherit from incompatible type errors.

* `vec_is_list()` now returns `TRUE` for `AsIs` lists (#1463).

* `vec_assert()`, `vec_ptype2()`, `vec_cast()`, and `vec_as_location()`
  now use `caller_arg()` to infer a default `arg` value from the
  caller.

  This may result in unhelpful arguments being mentioned in error
  messages. In general, you should consider snapshotting vctrs error
  messages thrown in your package and supply `arg` and `call`
  arguments if the error context is not adequately reported to your
  users.

* `vec_ptype_common()`, `vec_cast_common()`, `vec_size_common()`, and
  `vec_recycle_common()` gain `call` and `arg` arguments for
  specifying an error context.

* `vec_compare()` can now compare zero column data frames (#1500).

* `new_data_frame()` now errors on negative and missing `n` values (#1477).

* `vec_order()` now correctly orders zero column data frames (#1499).

* vctrs now depends on cli to help with error message generation.

* New `vec_check_list()` and `list_check_all_vectors()` input
  checkers, and an accompanying `list_all_vectors()` predicate.

* New `vec_interleave()` for combining multiple vectors together, interleaving
  their elements in the process (#1396).

* `vec_equal_na(NULL)` now returns `logical(0)` rather than erroring (#1494).

* `vec_as_location(missing = "error")` now fails with `NA` and `NA_character_`
  in addition to `NA_integer_` (#1420, @krlmlr).

* Starting with rlang 1.0.0, errors are displayed with the contextual
  function call. Several vctrs operations gain a `call` argument that
  makes it possible to report the correct context in error messages.
  This concerns:

  - `vec_cast()` and `vec_ptype2()`
  - `vec_default_cast()` and `vec_default_ptype2()`
  - `vec_assert()`
  - `vec_as_names()`
  - `stop_` constructors like `stop_incompatible_type()`

  Note that default `vec_cast()` and `vec_ptype2()` methods
  automatically support this if they pass `...` to the corresponding
  `vec_default_` functions. If you throw a non-internal error from a
  non-default method, add a `call = caller_env()` argument in the
  method and pass it to `rlang::abort()`.

* If `NA_character_` is specified as a name for `vctrs_vctr` objects, it is
  now automatically repaired to `""` (#780).

* `""` is now an allowed name for `vctrs_vctr` objects and all its
  subclasses (`vctrs_list_of` in particular) (#780).

* `list_of()` is now much faster when many values are provided.

* `vec_as_location()` evaluates `arg` only in case of error, for performance
  (#1150, @krlmlr).

* `levels.vctrs_vctr()` now returns `NULL` instead of failing (#1186, @krlmlr).

* `vec_assert()` produces a more informative error when `size` is invalid
  (#1470).

* `vec_duplicate_detect()` is a bit faster when there are many unique values.

* `vec_proxy_order()` is described in `vignette("s3-vectors")` (#1373, @krlmlr).

* `vec_chop()` now materializes ALTREP vectors before chopping, which is more
  efficient than creating many small ALTREP pieces (#1450).

* New `list_drop_empty()` for removing empty elements from a list (#1395).

* `list_sizes()` now propagates the names of the list onto the result.

* Name repair messages are now signaled by `rlang::names_inform_repair()`. This
  means that the messages are now sent to stdout by default rather than to
  stderr, resulting in prettier messages. Additionally, name repair messages can
  now be silenced through the global option `rlib_name_repair_verbosity`, which
  is useful for testing purposes. See `?names_inform_repair` for more
  information (#1429).

* `vctrs_vctr` methods for `na.omit()`, `na.exclude()`, and `na.fail()` have
  been added (#1413).

* `vec_init()` is now slightly faster (#1423).

* `vec_set_names()` no longer corrupts `vctrs_rcrd` types (#1419).

* `vec_detect_complete()` now computes completeness for `vctrs_rcrd` types in
  the same way as data frames, which means that if any field is missing, the
  entire record is considered incomplete (#1386).

* The `na_value` argument of `vec_order()` and `vec_sort()` now correctly
  respect missing values in lists (#1401).

* `vec_rep()` and `vec_rep_each()` are much faster for `times = 0` and
  `times = 1` (@mgirlich, #1392).

* `vec_equal_na()` and `vec_fill_missing()` now work with integer64 vectors
  (#1304).

* The `xtfrm()` method for vctrs_vctr objects no longer accidentally breaks
  ties (#1354).

* `min()`, `max()` and `range()` no longer throw an error if `na.rm = TRUE` is
  set and all values are `NA` (@gorcha, #1357). In this case, and where an empty
  input is given, it will return `Inf`/`-Inf`, or `NA` if `Inf` can't be cast
  to the input type.

* `vec_group_loc()`, used for grouping in dplyr, now correctly handles
  vectors with billions of elements (up to `.Machine$integer.max`) (#1133).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants