Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading from connections #610

Closed
jimhester opened this issue Feb 10, 2017 · 8 comments
Closed

Reading from connections #610

jimhester opened this issue Feb 10, 2017 · 8 comments
Labels

Comments

@jimhester
Copy link
Member

@jimhester jimhester commented Feb 10, 2017

Current implementation reads the full connection content to a raw vector, it would be nice to have a streaming interface as well.

It is slightly more complicated in most cases because we typically need to read part of the file to figure out the formats, then read the rest of the file.

@vnijs
Copy link

@vnijs vnijs commented Apr 28, 2017

readr is really great but slows down a ton when trying to read big files in chunks (e.g., to move to a dbase). This is the one advantage that read.csv still has and it would be great if read_csv could support this. Is it feasible? See also #185

@schelhorn
Copy link

@schelhorn schelhorn commented Apr 28, 2017

I support this argument. Especially when reading through gzipped files, readr seems to expand the whole content in memory before constructing the R memory representations.

That becomes clear when using the readr chunked reading mode with a filter function that only picks a subset of the rows - while read.table will only accumulate memory according to the size of the selected rows (plus the current chunk that is parsed, using a gzcon with text=T), readr will stall until the whole multi-GB gzip file is in memory. That really isn't useful in the age of biggish data where compression is very usual.

@schelhorn
Copy link

@schelhorn schelhorn commented Jun 13, 2017

Hi - may I ask if this feature made it to the roadmap?

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Jan 18, 2018

I'm seeing errors about "negative length" on Windows when reading a compressed (but not uncompressed) 3.5 GB CSV file. I suspect this is due to RcppCore/Rcpp#804. Reading compressed files as a stream would fix these symptoms.

@jnolis
Copy link

@jnolis jnolis commented Jan 24, 2018

This would be a great feature for my team. We have large files we want to load into a SQL Server database R and readr are great tools to pre-process the data. Unfortunately without a way to read from connections we can't load files that won't fit into memory.

@tinyheero
Copy link

@tinyheero tinyheero commented Mar 14, 2018

+1 for this issue. Would be a great feature to have as we are running into memory issues reading in large compressed files.

@dpprdan
Copy link

@dpprdan dpprdan commented Feb 27, 2019

Sounds like a duplicate of #76 or am I missing something?

@jimhester
Copy link
Member Author

@jimhester jimhester commented May 6, 2021

Fixed by #1172

@jimhester jimhester closed this May 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants