Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory errors when using Rscript with read_tsv_chunked() with trailing tab #1145

Closed
Shians opened this issue Nov 6, 2020 · 2 comments
Closed

Comments

@Shians
Copy link

Shians commented Nov 6, 2020

Sorry for the bizzare problem. When using read_tsv_chunked() on a large enough malformed file, R throws random memory issues. When running this in Rstudio I get the classic bomb with unexpected termination. But when running via Rscript on console I get a random sample of

 *** caught segfault ***
address 0x55fd97fb94f8, cause 'memory not mapped'

or

malloc(): unsorted double linked list corrupted

or

*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
...many more

and potentially other memory errors that I have not noted down.

The code below is the minimal triggering condition on Pop!OS 20.04 LTS (Ubuntu 20.04). Run via Rscript with R4.0.3 and readr 1.4.0.

writeLines(paste("x", 1:6, sep = "", collapse = "\t"), "test.txt")
write(
    rep(paste0(paste(letters[1:6], collapse = "\t"), "\t"), 1e7), # trailing tab
    "test.txt",
    append = TRUE
)


readr::read_tsv_chunked(
    "test.txt",
    callback = readr::SideEffectChunkCallback$new(
        function(x, i) {}
    )
)

This particular instance throws

malloc(): unsorted double linked list corrupted
  • Does not trigger when file does not have trailing tabs
  • Does not trigger with 1e6 rows
  • Does not trigger using regular read_tsv(), though it will flag many problems

I can simply fix my file but the nature of the error message made it incredibly difficult to diagnose the root of the problem.

@jimhester
Copy link
Collaborator

jimhester commented Nov 6, 2020

This turned out to be a protection issue in the cpp11 package. I believe with that change this is now fixed.

Unfortunately we just had a release of cpp11 yesterday, so we will have to wait at least a week to do another CRAN release with this fix.

In the meantime you can install the development versions of cpp11 and then re-install readr.

@Shians
Copy link
Author

Shians commented Nov 7, 2020

Confirming fix in my test case and actual data.

Shians added a commit to Shians/NanoMethViz that referenced this issue Feb 8, 2021
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jun 12, 2021
# cpp11 0.2.7

* Fix a transient memory leak for functions that return values from
  `cpp11::unwind_protect()` and `cpp11::safe` (#154)

# cpp11 0.2.6

* `cpp_register()` now uses symbols exclusively in the `.Call()`
  interface. This allows it to be more robust in interactive use with
  the pkgload package.

# cpp11 0.2.5

* `cpp_source()` gains a `cxx_std` argument to control which C++
  standard is used.  This allows you to use code from `C++14` and
  later standards with cpp_source(). (#100)

* The cpp11 knitr engine now allows you to set the `cxx_std` chunk
  option to control the C++ standard used.

* `cpp_source()` now has much more informative error messages when
  compilation fails (#125, #139)

* `cpp_source()` now uses a unique name for the DLL, so works when run
  multiple times on the same source file on Windows (#143)

* `writable::list_of<T>` now supports modification of vectors as
  intended (#131).

* Errors when running
  `tools::package_native_routine_registration_skeleton()` are no
  longer swallowed (#134)

* `cpp_source()` can now accept a source file called `cpp11.cpp`
  (#133)

* `named_arg` now explicitly protect their values, avoiding protection
  issues when using large
  inputs. [tidyverse/readr#1145](tidyverse/readr#1145)

* `r_string(std::string)` now uses `Rf_mkCharLenCE()` instead of
  `Rf_mkChar()`, which avoids the performance cost of checking the
  string length.

* Writable vector classes now properly set their lengths as intended
  when being copied to a read only class (#128).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants