read_lines craches on big gzipped file #309
Closed
Labels
Comments
Also I'm using Rcpp_0.12.1, see RcppCore/Rcpp#302. |
I can report the same thing. Trying to read a large text file with read_fwf(), 220 MB gzipped, 2.5 GB uncompressed. It reads fine when uncompressed but fails with the "long vectors not supported" error above. I am using readr 0.2.2, Rcpp 0.12.3, R 3.02. Just let me know if there is any more info I can provide or testing. |
Deleting all the data.table stuff which is peripheral to this issue. @dselivanov can you please provide a reproducible example? |
library(readr)
# works
txt = rep(paste(rep('a', 2 ^ 16), collapse = ''), 2 ^ 15 - 1)
writeLines(txt, con = gzfile('~/temp/test_read_lines.gz', open = 'w+', compression = 1))
rm(txt)
txt = read_lines("~/temp/test_read_lines.gz")
rm(txt)
# not works
txt = rep(paste(rep('a', 2 ^ 16), collapse = ''), 2 ^ 15)
writeLines(txt, con = gzfile('~/temp/test_read_lines.gz', open = 'w+', compression = 1))
rm(txt)
txt = read_lines("~/temp/test_read_lines.gz") |
Minimal reprex tmp <- tempfile(fileext = ".gz")
x <- rep(paste(rep('a', 2 ^ 16), collapse = ''), 2 ^ 15)
writeLines(x, con = gzfile(tmp, open = 'w+', compression = 1))
y <- readr::read_lines(tmp) |
@jimhester this should be fairly straightforward - will just require using the long vector API. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I'm trying to read big text file (~4.5gb in gzipped form, english wikipedia dump)
read_lines
produce following error:EDIT: I'm doing it on machine with 250gb ram, so ram is not an issue.
The text was updated successfully, but these errors were encountered: