New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IO error reading multiple Feather files in R #177
Comments
|
Can you try calling maybe_gc <- local({
i <- 1
function(every = 1000) {
if (i %% every == 0) {
message("GC")
gc()
}
i <<- i + 1
}
}) |
|
I added a call to |
|
@krlmlr any thoughts on how we could solve this problem? This is the downside of the lazy approach - if you are reading too many files, I'm assuming the OS runs out of file locks. |
|
I guess we need |
|
@lmullen: This should be fixed now, could you please confirm? |
|
After reinstalling Feather from master, I re-ran my original code above. The first time I got one feather file that failed to load. Every other time I've run it all the files loaded correctly. I can't reproduce that first error, so I think it is fixed. Thanks! |
|
Is it possible this same problem could happen in Python, with pandas' |
|
Could you report the issue either on pandas's issue tracker or the JIRA for Apache Arrow? Thanks |
I have 18,500 Feather files (all with the same columns and column types) which I want to read in. So I do this:
When I run that code, some number of the Feather files (it varies from about 150 to 1500) fail to load with this error.
There is nothing actually wrong with those Feather files though. I can
read_feather(path_to_problem_file)and get back a data frame as expected. I can also get the paths which failed to load, then runmap(paths_that_didnt_load, read_feather)and load all of them fine.My only suspicion is that Feather is too fast---that it reads the files so quickly that the disk can't get to the next file in time. FWIW, the files that don't load in the batch tend to come in sequence. The files are stored on a RAID 10 array, so it's not as fast as an SSD, but it's fast.
When I put a
Sys.sleep(1)call in between loading each file, that cuts down on the number of errors.I can't think of a good way to provide a reproducible example, but happy to do so if you can give instructions.
The text was updated successfully, but these errors were encountered: