s3store: Performance improvements #478

Acconut · 2021-05-16T21:39:37Z

This PR attempts to improve the performance (especially the throughput) of uploads using the S3Store. It does this by following changes:

The information retrieved during a GetInfo call is now cached, so a WriteChunk and FinishUpload call do not have to repeat the calls to list the parts on S3 etc.
The calls to S3 for GetInfo are now executed concurrently instead of sequentially to reduce total time.
The UploadPart calls inside WriteChunk are not executed concurrently instead of sequentially. This allows to better utilize the available throughput to S3. The concurrency is limited by a semaphore to not drain the system of resources.

Remaining tasks:

Address TODO comments
Implement more resilient error handling (in another PR)
Make pre-release for testing

⚠️ This PR should best be released in a new major version because it changes the behavior of the S3 store quite significantly.

@acj I remember that you spent some time in the past working on the S3Store, so I figured you might be interested in these changes. Let me know if you have any feedback :)

This reverts commit 86a329c.

This reverts commit 29b59a2.

Acconut · 2021-05-18T08:28:45Z

I am going to merge this into a new v2 branch which will be used as the base for working on the next major release.

)

acj · 2021-07-06T21:09:13Z

@Acconut Sorry for the long delay. The improvements seem reasonable to me, though I'm still concerned (and possibly confused) about what happens if a part fails to upload when we're making API calls in parallel. For example, if your upload has three parts (1, 2, 3) and 1 and 3 succeed but 2 fails, what happens? My understanding was that the offset calculation (e.g. at the bottom of fetchInfo) assumed that all successfully-uploaded parts were consecutive, which isn't true in my example, and that the upload would be broken at that point.

We ran tusd in production for a while with a similar parallel part upload mechanism, a prototype of the code in #402, and there were occasional random failures that seemed to be caused by failed parts (not certain of this, but it seemed likely). That's why I didn't use parallelism for requests to S3 in #402.

My understanding might not be correct, though, and I'd love to see this improvement in tusd. I hope this context helps. :)

Thanks for your continued work on this project.

Acconut · 2021-07-09T06:31:25Z

Thank for the feedback. Your concerns are justified :) Right now, such a situation would lead to a corrupted upload, but I think we can solve this in the future by storing a list of usable parts on S3, instead of always fetching a list of all uploaded parts and treating every one of them as usable (i.e. if we upload parts 1,2,3 and 2 fails, only 1 would be considered usable). That should make sure the upload never ends up in a corrupted state.

However, I first wanted to measure the real world impact of this PR, so I merged it into a v2 branch, so we can test the prelease without affecting users on the mainline version. That being said, we haven´t experienced problems with uploading parts to S3 in production, so this was not a pressing issue for us.

So, all in all, this should be improved in the future :)

Acconut added 17 commits April 26, 2021 13:19

Add first draft of parallel upload queue

86a329c

s3store: Use queue for parallel uploads

29b59a2

Revert "Add first draft of parallel upload queue"

0594bab

This reverts commit 86a329c.

Revert "s3store: Use queue for parallel uploads"

3e04869

This reverts commit 29b59a2.

s3store: Cache results from listing parts and checking incomplete object

ccfd6c8

s3store: Remove debugging output`

2d00ab2

s3store: Make requests for fetching info concurrently

ac402d0

s3store: Make parallel uploads work and tests pass

0ce944a

s3store: Add semaphore package

16eae05

Merge branch 'master' of github.com:tus/tusd into feat/parallel-parts

c3f74f6

s3store: Add comments to semaphore package

64ebe1d

s3store: Encapsulate more logic into s3PartProducer

0370b49

s3store: Refactor WriteChunk

72bebb1

s3store: Remove TODO

ad0e7da

s3store: Acquire lock before uploading

539ab53

cli: Add flag for setting concurrency limit

d53bb68

s3store: One more comment

e22fd60

Acconut marked this pull request as ready for review May 18, 2021 08:23

Acconut changed the base branch from master to v2 May 18, 2021 08:27

Acconut merged commit 8fd1836 into v2 May 18, 2021

Acconut deleted the feat/parallel-parts branch May 18, 2021 08:29

Acconut added a commit that referenced this pull request May 23, 2021

fixup! s3store: Parallelize part uploads and information retrieval (#478

ce54ff8

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

s3store: Performance improvements #478

s3store: Performance improvements #478

Acconut commented May 16, 2021 •

edited

Loading

Acconut commented May 18, 2021

acj commented Jul 6, 2021

Acconut commented Jul 9, 2021

s3store: Performance improvements #478

s3store: Performance improvements #478

Conversation

Acconut commented May 16, 2021 • edited Loading

Acconut commented May 18, 2021

acj commented Jul 6, 2021

Acconut commented Jul 9, 2021

Acconut commented May 16, 2021 •

edited

Loading