-
Notifications
You must be signed in to change notification settings - Fork 285
read_lines craches on big gzipped file #309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Also I'm using Rcpp_0.12.1, see RcppCore/Rcpp#302. |
I can report the same thing. Trying to read a large text file with read_fwf(), 220 MB gzipped, 2.5 GB uncompressed. It reads fine when uncompressed but fails with the "long vectors not supported" error above. I am using readr 0.2.2, Rcpp 0.12.3, R 3.02. Just let me know if there is any more info I can provide or testing. |
Deleting all the data.table stuff which is peripheral to this issue. @dselivanov can you please provide a reproducible example? |
library(readr)
# works
txt = rep(paste(rep('a', 2 ^ 16), collapse = ''), 2 ^ 15 - 1)
writeLines(txt, con = gzfile('~/temp/test_read_lines.gz', open = 'w+', compression = 1))
rm(txt)
txt = read_lines("~/temp/test_read_lines.gz")
rm(txt)
# not works
txt = rep(paste(rep('a', 2 ^ 16), collapse = ''), 2 ^ 15)
writeLines(txt, con = gzfile('~/temp/test_read_lines.gz', open = 'w+', compression = 1))
rm(txt)
txt = read_lines("~/temp/test_read_lines.gz") |
Minimal reprex tmp <- tempfile(fileext = ".gz")
x <- rep(paste(rep('a', 2 ^ 16), collapse = ''), 2 ^ 15)
writeLines(x, con = gzfile(tmp, open = 'w+', compression = 1))
y <- readr::read_lines(tmp) |
@jimhester this should be fairly straightforward - will just require using the long vector API. |
I'm trying to read big text file (~4.5gb in gzipped form, english wikipedia dump)
read_lines
produce following error:EDIT: I'm doing it on machine with 250gb ram, so ram is not an issue.
The text was updated successfully, but these errors were encountered: