-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s3store: Performance improvements #478
Conversation
I am going to merge this into a new |
@Acconut Sorry for the long delay. The improvements seem reasonable to me, though I'm still concerned (and possibly confused) about what happens if a part fails to upload when we're making API calls in parallel. For example, if your upload has three parts (1, 2, 3) and 1 and 3 succeed but 2 fails, what happens? My understanding was that the offset calculation (e.g. at the bottom of We ran tusd in production for a while with a similar parallel part upload mechanism, a prototype of the code in #402, and there were occasional random failures that seemed to be caused by failed parts (not certain of this, but it seemed likely). That's why I didn't use parallelism for requests to S3 in #402. My understanding might not be correct, though, and I'd love to see this improvement in tusd. I hope this context helps. :) Thanks for your continued work on this project. |
Thank for the feedback. Your concerns are justified :) Right now, such a situation would lead to a corrupted upload, but I think we can solve this in the future by storing a list of usable parts on S3, instead of always fetching a list of all uploaded parts and treating every one of them as usable (i.e. if we upload parts 1,2,3 and 2 fails, only 1 would be considered usable). That should make sure the upload never ends up in a corrupted state. However, I first wanted to measure the real world impact of this PR, so I merged it into a v2 branch, so we can test the prelease without affecting users on the mainline version. That being said, we haven´t experienced problems with uploading parts to S3 in production, so this was not a pressing issue for us. So, all in all, this should be improved in the future :) |
This PR attempts to improve the performance (especially the throughput) of uploads using the S3Store. It does this by following changes:
GetInfo
call is now cached, so aWriteChunk
andFinishUpload
call do not have to repeat the calls to list the parts on S3 etc.GetInfo
are now executed concurrently instead of sequentially to reduce total time.UploadPart
calls insideWriteChunk
are not executed concurrently instead of sequentially. This allows to better utilize the available throughput to S3. The concurrency is limited by a semaphore to not drain the system of resources.Remaining tasks:
@acj I remember that you spent some time in the past working on the S3Store, so I figured you might be interested in these changes. Let me know if you have any feedback :)