Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During batch feed, OSError: [Errno 24] Too many open files #389

Closed
neo-anderson opened this issue Nov 3, 2022 · 9 comments
Closed

During batch feed, OSError: [Errno 24] Too many open files #389

neo-anderson opened this issue Nov 3, 2022 · 9 comments
Assignees

Comments

@neo-anderson
Copy link

Hi 👋
When I try to ingest data into Vespa Cloud, I get this error - OSError: [Errno 24] Too many open files.

When I select only the first few documents in my dataset, the feed works. If I use the whole dataset, I get that error. I dont see a way to reset the connections/close files. So, pyvespa wont let me upload anymore data unless I quit the python session and do it all over again. Synchronous batch feed works, but it is too slow for my usecase.
Code:

# works
app.feed_batch(schema="myschema", batch=batch_data[:1000], batch_size=1000, total_timeout=200, asynchronous=True)
# fails
app.feed_batch(schema="myschema", batch=batch_data, batch_size=1000, total_timeout=200, asynchronous=True)
@thigm85
Copy link
Contributor

thigm85 commented Nov 3, 2022

This seems to be an OS limitation. I was once running a script on an AWS EC2 instance and I had to increase the open file soft limit when using app.feed_batch.

I increased the limit with ulimit -n 10000 and checked with ulimit -Sn. It worked after that.

If the above does not work, try to reduce the number of async connections via the connections parameter:

app.feed_batch(..., connections, ...)

@neo-anderson
Copy link
Author

I didn't change ulimit, but tried reducing the number of connections like you suggested and it worked! The default number of connections (100) was fine for a local docker deployment. Had to reduce it to ingest the same data into Vespa Cloud. Thanks for your help.

@thigm85
Copy link
Contributor

thigm85 commented Nov 4, 2022

Glad it worked. Did you check what the highest value that worked was? I might consider changing the default connections value.

@thigm85 thigm85 closed this as completed Nov 7, 2022
@neo-anderson
Copy link
Author

Hi @thigm85 , connections set to 100 and 50 didn't work for me, but setting it to 20 worked. I dont know if it's the max value that would work though. Cheers!

@thigm85
Copy link
Contributor

thigm85 commented Nov 8, 2022

Thanks @neo-anderson

@neo-anderson
Copy link
Author

I tried ulimit -n 10000. It fixed the issue completely. Ingesting with 100 connections without an issue 👍. Thanks!

@bratseth
Copy link
Member

@lesters put this in the doc somewhere and/or change default?

@thigm85
Copy link
Contributor

thigm85 commented Nov 10, 2022

@bratseth @lesters I can create a troubleshooting section and include this. It was on my to-do list anyway.

@bratseth
Copy link
Member

Yes, great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants