New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_lines craches on big gzipped file #309

Closed
dselivanov opened this Issue Nov 8, 2015 · 6 comments

Comments

Projects
None yet
4 participants
@dselivanov

dselivanov commented Nov 8, 2015

I'm trying to read big text file (~4.5gb in gzipped form, english wikipedia dump)
read_lines produce following error:

Error in read_lines_(ds, locale_ = locale, n_max = n_max, progress = progress) :
long vectors not supported yet: ../../src/include/Rinlinedfuns.h:137

EDIT: I'm doing it on machine with 250gb ram, so ram is not an issue.

@dselivanov

This comment has been minimized.

dselivanov commented Nov 9, 2015

Also I'm using Rcpp_0.12.1, see RcppCore/Rcpp#302.

@MagicForrest

This comment has been minimized.

MagicForrest commented Feb 29, 2016

I can report the same thing. Trying to read a large text file with read_fwf(), 220 MB gzipped, 2.5 GB uncompressed. It reads fine when uncompressed but fails with the "long vectors not supported" error above. I am using readr 0.2.2, Rcpp 0.12.3, R 3.02.

Just let me know if there is any more info I can provide or testing.

@hadley

This comment has been minimized.

Member

hadley commented Jun 2, 2016

Deleting all the data.table stuff which is peripheral to this issue. @dselivanov can you please provide a reproducible example?

@dselivanov

This comment has been minimized.

dselivanov commented Jun 2, 2016

library(readr)
# works
txt = rep(paste(rep('a', 2 ^ 16), collapse = ''), 2 ^ 15 - 1)
writeLines(txt, con = gzfile('~/temp/test_read_lines.gz', open = 'w+', compression = 1))
rm(txt)
txt = read_lines("~/temp/test_read_lines.gz")
rm(txt)
# not works
txt = rep(paste(rep('a', 2 ^ 16), collapse = ''), 2 ^ 15)
writeLines(txt, con = gzfile('~/temp/test_read_lines.gz', open = 'w+', compression = 1))
rm(txt)
txt = read_lines("~/temp/test_read_lines.gz")
@hadley

This comment has been minimized.

Member

hadley commented Jun 2, 2016

Minimal reprex

tmp <- tempfile(fileext = ".gz")

x <- rep(paste(rep('a', 2 ^ 16), collapse = ''), 2 ^ 15)
writeLines(x, con = gzfile(tmp, open = 'w+', compression = 1))
y <- readr::read_lines(tmp)
@hadley

This comment has been minimized.

Member

hadley commented Jun 14, 2016

@jimhester this should be fairly straightforward - will just require using the long vector API.

jimhester added a commit to jimhester/readr that referenced this issue Jun 14, 2016

jimhester added a commit to jimhester/readr that referenced this issue Jun 14, 2016

@jimhester jimhester removed the ready label Jun 15, 2016

@lock lock bot locked and limited conversation to collaborators Sep 24, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.