Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using deferred length with S3 compatible service, last part of multi-upload is ignored #396

Closed
ecoessentials opened this issue Jun 16, 2020 · 16 comments · Fixed by #462
Labels

Comments

@ecoessentials
Copy link

Describe the bug
I use tusd with an S3 compatible service (Scaleway). When I upload a file by chunks with Upload-Defer-Length to 1, the last part is not concatenated with the previous parts.

See following screenshot, obtained with chunk size of 5 Mb.

sceenshot-s3-multi-upload

Here is the content of .info file:

{
 "SizeIsDeferred": false,
 "PartialUploads": null,
 "Storage": {
  "Key": "d484497fc33fbd87bf6d698a01a14e69",
  "Bucket": "essentials-test",
  "Type": "s3store"
 },
 "Offset": 10485760,
 "IsFinal": false,
 "ID": "d484497fc33fbd87bf6d698a01a14e69+NmQxZTY2NDUtYWQxYi00ZWMyLThlMmMtN2M4OThiZDU2Zjli",
 "IsPartial": false,
 "Size": 10691054,
 "MetaData": {}
}

Setup details

  • Operating System: Linux
  • Used tusd version: 1.3.0 docker
  • Used tusd data storage: AWS S3
  • Used tusd configuration: -s3-endpoint https://s3.fr-par.scw.cloud -s3-bucket essentials-test --hooks-dir /srv/tusd-hooks
  • Used tus client library: tus-js-client
@Acconut
Copy link
Member

Acconut commented Jun 19, 2020

Interesting situation. Can you share the client code? Can you confirm that one of the PATCH requests include the Upload-Length header? Otherwise, tusd does not know that the upload is complete and will not attach the last part.

@ecoessentials
Copy link
Author

I confirm that the last PATCH request contains the Upload-Length header : I had already checked and anyway, it works very well when I use the file store.

I did some tests and I found that for the last chunk containing the Upload-Length header:

  • before the call to upload.WriteChunk at line 642 of pkg/handler/unrouted_handler.go file, the info variable is correctly updated (in particular, info.SizeIsDeferred is false)
  • when the WriteChunk function in file pkg/s3store/s3store.go is called, the info variable doesn't contain the correct data (in particular info.SizeIsDeferred is true).

As I use 5Mb chunks, I solved the problem by removing the test concerning the length of the chunks (lines 351-352). It's fine for me but obviously it's not a good solution.

I hope this will help you.

@Acconut
Copy link
Member

Acconut commented Jun 19, 2020

Thank you very much for the details! They definitely help a lot!
Would you be interested in opening a PR this fix this issue?

@ecoessentials
Copy link
Author

My solution works for me but does not fix the issue because the test I removed is necessary when chunk size is less than 5Mb.

@ashitikov
Copy link

I have the same problem with one small chunk (< 5 mb). AWS S3 after upload contains 3 files:

  1. hash-named-file - 0B
  2. hash-named-file.info - 400-500B
  3. hash-named-file.part - 3.5 MB

@Acconut
Copy link
Member

Acconut commented Jan 3, 2021

Thanks for reporting! I will try to look into it soon but I cannot promise anything. Would you be able to look into it as well?

@ashitikov
Copy link

Thanks for reporting! I will try to look into it soon but I cannot promise anything. Would you be able to look into it as well?

Sorry, I'm not so experienced with GO lang, but I could test some branch for you

@ashitikov
Copy link

Any progress on that? ;)

@Acconut
Copy link
Member

Acconut commented Feb 10, 2021

I have identified the root cause, but unfortunately that patch is a bit more complex and I am lacking a bit of time right now. I hope to be able to work on this in maybe two weeks.

@acj
Copy link
Contributor

acj commented Feb 16, 2021

@Acconut I'm running into this as well after upgrading one of our services from pre-1.x. Can you share what the root cause is?

If it would be helpful, I'm willing to work with you on preparing a fix, testing, etc.

@Acconut
Copy link
Member

Acconut commented Feb 18, 2021

@acj Sure, I would love any help here!

The problem is that the last chunk of an upload with a deferred length is not stored directly as a part for the multipart upload on S3. Instead, we store it as an object with the .part extension on S3 directly (we call this an incomplete part). So when we tell S3 to complete the multipart upload, it does not take the incomplete part into consideration because it is not part of the multipart upload.

A solution, I think, would be to ensure inside the FinishUpload method of the s3store that any remaining incomplete part is copied to a part of the multipart upload. This way, the data will not get lost when completing the multipart upload.

I hope you are able to understand what I mean and that you are not confused with all the different parts :)

@acj
Copy link
Contributor

acj commented Feb 18, 2021

@Acconut Yep, I remember from when we worked on it together a few years ago 😄 (#219)

We've been depending on deferred-length uploads to S3 in production, so the feature was working in a previous version of tusd. I'm not sure when it got broken.

I investigated this a bit today, and I think the problem is that the s3Upload struct's FileInfo field gets out of sync (specifically Size and SizeIsDeferred) when we call DeclareLength here:

lengthDeclarableUpload := handler.composer.LengthDeferrer.AsLengthDeclarableUpload(upload)
if err := lengthDeclarableUpload.DeclareLength(ctx, uploadLength); err != nil {
handler.sendError(w, r, err)
return
}
info.Size = uploadLength
info.SizeIsDeferred = false

We update info with the correct values there, but they aren't propagated to the upload struct itself. That confirms the behavior that @ecoessentials mentioned earlier (which was a very helpful tip).

I noticed that DeclareLength's receiver is a value type. If I modify it to be a pointer (to match the semantics of WriteChunk) like so:

-func (upload s3Upload) DeclareLength(ctx context.Context, length int64) error {
+func (upload *s3Upload) DeclareLength(ctx context.Context, length int64) error {

... then everything works as expected. The upload completes, and the S3 object has the correct bytes.

So, it seems like a subtle interaction between UnroutedHandler and S3Store. If my fix above seems reasonable, I'll open a PR for it. Do you have any suggestions for how capture this in a test?

@acj
Copy link
Contributor

acj commented Mar 4, 2021

@Acconut ping

@Acconut
Copy link
Member

Acconut commented Mar 11, 2021

Thank you for the investigation, @acj! I think you are correct but we likely have two different bugs in here, depending on whether the request with the Upload-Length header also includes a body:

  1. If we include the Upload-Length header in the last PATCH request alongside a body, the problem is that the s3Upload struct is not properly updated, as you mentioned. Changing the signature of the DeclareLength function likely fixes that (amazing finding 👍). To prevent this in the future, we should extend the test for it to also include a GetInfo call after the DeclareLength call and assert that the size is properly persisted:
    func TestDeclareLength(t *testing.T) {
    Proper end-to-end tests would also help, but we have nothing like that in tusd yet.
  2. If the Upload-Length header is sent in a separate PATCH, i.e. no request body, we have the problem that I described in my original comment: The WriteChunk function from a previous PATCH request might create an incomplete upload object on S3 because when the function was called, the upload size was not declared yet. Then, in the final PATCH request, DeclareLength and FinishUpload is called, but the incomplete upload object is never pushed into the multipart upload, so it is not included in the final object. Does that make sense? :)

I can work on the second issue, but it would be great if you could open a PR for the first problem, @acj!

@acj
Copy link
Contributor

acj commented Mar 11, 2021

@Acconut PR opened 👍

Thanks for the additional details. I think I understand your reasoning in the second point, but I can't reproduce that behavior after fixing the first issue.

Here's an example from uploading a small (~1MB file) to a tusd instance with default config running the code from #462:

# POST
> POST /files/ HTTP/1.1
> Host: 0.0.0.0:1080
> User-Agent: curl/7.64.1
> Accept: */*
> Tus-Resumable: 1.0.0
> Content-Length: 0
> Upload-Defer-Length: 1
< HTTP/1.1 201 Created
< Location: http://0.0.0.0:1080/files/c1054c4875d2885460492725463f84ce+bfaf0c97-ff23-4e1b-a50a-620220997406
< Tus-Resumable: 1.0.0
< X-Content-Type-Options: nosniff
< Date: Thu, 11 Mar 2021 15:23:25 GMT
< Content-Length: 0

# PATCH with the entire file contents in the request body, but no upload-length header. Creates incomplete object.
> PATCH /files/c1054c4875d2885460492725463f84ce+bfaf0c97-ff23-4e1b-a50a-620220997406 HTTP/1.1
> Host: 0.0.0.0:1080
> User-Agent: curl/7.64.1
> Accept: */*
> Tus-Resumable: 1.0.0
> Upload-Offset: 0
> Content-Length:   995459
> Content-Type: application/offset+octet-stream
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
< HTTP/1.1 204 No Content
< Tus-Resumable: 1.0.0
< Upload-Offset: 995459
< X-Content-Type-Options: nosniff
< Date: Thu, 11 Mar 2021 15:23:25 GMT

# PATCH declaring the final size and clearing Upload-Defer-Length, but no request body
> PATCH /files/c1054c4875d2885460492725463f84ce+bfaf0c97-ff23-4e1b-a50a-620220997406 HTTP/1.1
> Host: 0.0.0.0:1080
> User-Agent: curl/7.64.1
> Accept: */*
> Tus-Resumable: 1.0.0
> Upload-Defer-Length: 0
> Upload-Offset:   995459
> Upload-Length:   995459
> Content-Length: 0
> Content-Type: application/offset+octet-stream
>
< HTTP/1.1 204 No Content
< Tus-Resumable: 1.0.0
< Upload-Offset: 995459
< X-Content-Type-Options: nosniff
< Date: Thu, 11 Mar 2021 15:23:25 GMT

# From the tusd logs, we see that the upload has completed with the expected size
[tusd] 2021/03/11 10:23:25 event="UploadFinished" id="c1054c4875d2885460492725463f84ce+bfaf0c97-ff23-4e1b-a50a-620220997406" size="995459"

I've tried larger files as well and see the same behavior. The upload completes as expected, and there's no leftover .info file in S3.

@Acconut
Copy link
Member

Acconut commented Mar 11, 2021

PR opened +1

❤️

I can't reproduce that behavior after fixing the first issue.

Oh, I see. I thought that S3Store.WriteChunk would not be called for an empty body, but that's not the case. So, you are completely right and your fix patched everything 🎉 Thank you very much!

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants