Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using S3 storage sometimes files are shown as uploaded but a zero byte object is stored #497

Open
OKaluza opened this issue Aug 4, 2021 · 11 comments
Labels

Comments

@OKaluza
Copy link

OKaluza commented Aug 4, 2021

Describe the bug
I'm not sure if this is an issue with Uppy or tusd, but I have a tusd server connected to an S3 compatible store (On OpenStack Swift) which I'm uploading files to with the Uppy react Dashboard element.
Everything works well most of the time, but when uploading large numbers of files (eg: 907 jpg images in my test case) all of the files are marked as uploading successfully and all appear as entries in the S3 bucket, but some of the files are zero bytes, ~5 out of 907 images in this case.

The metadata stored for the invalid images has the correct file sizes and the previews appear fine in Uppy.

It is then impossible to upload these images again without manually deleting them from the bucket, as it always sees the existing uploads as valid, even though the data is missing.
If I disable S3 storage all the files are uploaded fine.

To Reproduce
Upload a large number of images (100s) from Uppy React Dashboard to S3 backed tusd server

Expected behavior
All the images should be stored in the S3 bucket.

Setup details

I am using this helm chart to run the server https://github.com/sagikazarmark/helm-charts/tree/master/charts/tusd

  • Operating System: Linux
  • Used tusd version: I've tried both the default "v.1.6.0" and "v2.0.0-rc4" tusd docker images.
  • Used tusd data storage: AWS S3 interface to OpenStack Swift
  • Used tusd configuration: -s3-bucket xxx -s3-endpoint yyy
  • Used tus client library: Uppy.js
@OKaluza OKaluza added the bug label Aug 4, 2021
@Acconut
Copy link
Member

Acconut commented Aug 10, 2021

Interesting report, thank you! My first guess is that maybe OpenStack Swift is behaving differently than S3, confusing tusd. Can you please record the tusd logs for all uploads and send them over? If the file would be too large for GitHub, you can also mail it to marius@transloadit.com.

It is then impossible to upload these images again without manually deleting them from the bucket, as it always sees the existing uploads as valid, even though the data is missing.

Yes, that is because tus-js-client and Uppy store the upload URL associated with your files in localStorage. Either cleaning localStorage or setting resume: false for Uppy's tus plugin should help there.

@OKaluza
Copy link
Author

OKaluza commented Aug 11, 2021

Thanks @Acconut, I tried to find a log file but it doesn't seem to be there, where is it kept?
Do I need to pass a flag to enable logging? I tried adding "-verbose" but didn't seem to make a difference, according to https://github.com/tus/tusd/blob/master/docs/usage-binary.md that is on by default anyway.

@Acconut
Copy link
Member

Acconut commented Oct 17, 2021

Apologies for my late reply, @OKaluza. I was out of the office for some time. Let me know if this behavior is still occurring.

I tried to find a log file but it doesn't seem to be there, where is it kept?

tusd does not create a log file but simply writes its output to stdout/stderr. It is then the user's responsibility to store the output in a file if desired so. We have desired to not include file logging functionality in tusd to reduce complexity.

@OKaluza
Copy link
Author

OKaluza commented Oct 20, 2021

Hi @Acconut no worries, I stopped using the S3 interface as I was couldn't find enough info to debug whether it was an issue with our storage backend or tusd and was unable to make it 100% reliable.
Thanks for the info, I did end up realising the logs were all to stdout/err.
I'll reopen this if I get a chance to try and test it again on the OpenStack S3 interface.

@OKaluza OKaluza closed this as completed Oct 20, 2021
@OKaluza
Copy link
Author

OKaluza commented May 12, 2023

Hi @Acconut I am trying to use the S3 interface again and running into the same issue.
An example is this upload: https://tusd.dev.asdc.cloud.edu.au/files/004dd2ebad8322404987399b7922a280+MzJmYjVmOTgtZjk3Mi00YzhlLWFkNjItYmYzYzEyOThhMmY5
If I look at the file in our object storage it is zero bytes, the correct file size is listed in the .info file however, if I request the url with wget it responds 200 and attempts to download but the transfer fails and keeps retrying in a loop.
Here are the logs https://gist.github.com/OKaluza/30ef62c77935ccdc65d80e6242f6baed, I can't see any issues when the file was uploaded, everything seemed to go fine.

@OKaluza OKaluza reopened this May 12, 2023
@Acconut
Copy link
Member

Acconut commented May 12, 2023

The logs contain multiple errors of the same type:

[tusd] 2023/05/11 00:47:44.977929 event="ResponseOutgoing" status="500" method="PATCH" path="1dd661d49f35c3947a46654e48952ef2+MjY2ZjljYmQtYWNjYi00ZTZkLWI5ZDItODMyMTQ2MTcwMjcx" error="EntityTooSmall: Your proposed upload is smaller than the minimum allowed object size.
status code: 400, request id: tx85a5a2e141b242a3811ab-00645c3b2f, host id: tx85a5a2e141b242a3811ab-00645c3b2f" requestId="bc11b2865c89a4cea657fa44d4f16ed7"

Each upload to S3 is split into multiple parts where each part is uploaded on its own (called S3 Multipart Upload). Each part must meet a minimum part size requirement. For AWS S3, this is 5MB. But in this case a few parts did not meet this minimum. Maybe OpenStack has a different minimum size? Did you change any configuration of tusd?

Also, I am noticing multiple PATCH requests for the upload 1dd661d49f35c3947a46654e48952ef2. That looks suspicious.

@OKaluza
Copy link
Author

OKaluza commented May 17, 2023

Thanks for looking at it, I haven't changed anything in the tusd config apart from setting s3-bucket, s3-endpoint, and behind-proxy.

Is it worth playing with the "partSize" option?

OpenStack should have the same 5MB minimum, it intends to be completely compatible with the basic S3 interface and I haven't had any issues using it with other applications that support S3, or in python with boto3 etc.
I thought it might only be occurring with whose total size was < 5MB but it appears some are less and some are over 5MB. The frequency is quite low, at most ~5% of the uploads seem to end up as zero byte objects.

My attempts to debug the issue today haven't come up with anything very helpful, but there are no 500 errors or any obvious errors in the logs at all this time, the uploads appear to be fine, there are 204 - no content responses when accessing the uploads that are stored as zero bytes however. For example this upload:

$ wget https://tusd.dev.asdc.cloud.edu.au/files/9eec7f3c271cb983cc4f883c97289f23+MTc5YjIxZjktMDhjYi00OGZjLWI4MjItYTkyYjFhZDE1NWUz
--2023-05-17 15:46:15--  https://tusd.dev.asdc.cloud.edu.au/files/9eec7f3c271cb983cc4f883c97289f23+MTc5YjIxZjktMDhjYi00OGZjLWI4MjItYTkyYjFhZDE1NWUz
Resolving tusd.dev.asdc.cloud.edu.au (tusd.dev.asdc.cloud.edu.au)... 118.138.249.245
Connecting to tusd.dev.asdc.cloud.edu.au (tusd.dev.asdc.cloud.edu.au)|118.138.249.245|:443... connected.
HTTP request sent, awaiting response... 204 No Content

The upload log entries:

[tusd] 2023/05/17 05:18:00.848108 event="UploadCreated" id="9eec7f3c271cb983cc4f883c97289f23+MTc5YjIxZjktMDhjYi00OGZjLWI4MjItYTkyYjFhZDE1NWUz" size="3744012" url="https://tusd.dev.asdc.cloud.edu.au/files/9eec7f3c271cb983cc4f883c97289f23+MTc5YjIxZjktMDhjYi00OGZjLWI4MjItYTkyYjFhZDE1NWUz" 
[tusd] 2023/05/17 05:18:01.240267 event="RequestIncoming" method="PATCH" path="9eec7f3c271cb983cc4f883c97289f23+MTc5YjIxZjktMDhjYi00OGZjLWI4MjItYTkyYjFhZDE1NWUz" requestId="b10a374902da0f56c499fd63e46f7115" 
[tusd] 2023/05/17 05:18:01.408458 event="ChunkWriteStart" id="9eec7f3c271cb983cc4f883c97289f23+MTc5YjIxZjktMDhjYi00OGZjLWI4MjItYTkyYjFhZDE1NWUz" maxSize="3744012" offset="0" 
[tusd] 2023/05/17 05:18:02.919309 event="ChunkWriteComplete" id="9eec7f3c271cb983cc4f883c97289f23+MTc5YjIxZjktMDhjYi00OGZjLWI4MjItYTkyYjFhZDE1NWUz" bytesWritten="3744012" 
[tusd] 2023/05/17 05:18:04.528701 event="ResponseOutgoing" status="204" method="PATCH" path="9eec7f3c271cb983cc4f883c97289f23+MTc5YjIxZjktMDhjYi00OGZjLWI4MjItYTkyYjFhZDE1NWUz" requestId="b10a374902da0f56c499fd63e46f7115" 
[tusd] 2023/05/17 05:18:04.528723 event="UploadFinished" id="9eec7f3c271cb983cc4f883c97289f23+MTc5YjIxZjktMDhjYi00OGZjLWI4MjItYTkyYjFhZDE1NWUz" size="3744012" 

The download log entries:

[tusd] 2023/05/17 05:18:34.182769 event="RequestIncoming" method="GET" path="9eec7f3c271cb983cc4f883c97289f23+MTc5YjIxZjktMDhjYi00OGZjLWI4MjItYTkyYjFhZDE1NWUz" requestId="ab0d0994b9ab05eb1597c387d80afdbb" 
[tusd] 2023/05/17 05:18:34.518524 event="ResponseOutgoing" status="204" method="GET" path="9eec7f3c271cb983cc4f883c97289f23+MTc5YjIxZjktMDhjYi00OGZjLWI4MjItYTkyYjFhZDE1NWUz" requestId="ab0d0994b9ab05eb1597c387d80afdbb" 

The 9eec7f3c271cb983cc4f883c97289f23.info file:

{"ID":"9eec7f3c271cb983cc4f883c97289f23+MTc5YjIxZjktMDhjYi00OGZjLWI4MjItYTkyYjFhZDE1NWUz","Size":3744012,"SizeIsDeferred":false,"Offset":0,"MetaData":{"filename":"DSC_0425.JPG","filetype":"image/jpeg","name":"DSC_0425.JPG","relativePath":"null","type":"image/jpeg"},"IsPartial":false,"IsFinal":false,"PartialUploads":null,"Storage":{"Bucket":"asdc-uploads","Key":"9eec7f3c271cb983cc4f883c97289f23","Type":"s3store"}}

@dylanmcculloch
Copy link

Hi @Acconut, I help administer the Openstack Swift cluster that @OKaluza is using.
We've reproduced the issue in our test environment.

Operating System: Linux
Used tusd version: Built from git main
Used tusd data storage: AWS S3 interface to OpenStack Swift
Used tusd configuration: -s3-bucket xxx -s3-endpoint yyy -hooks-dir=./examples/hooks/
Used tus client library: tuspy 1.0.0

For an upload that results in a 0 byte object, it looks like the tusd/aws client initially sends the correct object to Swift/S3, but then the client almost immediately overwrites the object with a 0 byte object.

The main issue appears to be that after the tusd/aws client sends the UploadPart request it immediately performs a subsequent ListParts call.
It looks like there's a race condition due to the "eventually consistent" nature of Openstack Swift/S3, which means the response from the ListParts call sometimes does not include the part that was just sent. In that circumstance it performs another upload, but at this point it only sends a 0 byte object.
I patched tusd to enable aws client debug logging [1] which revealed the following.

The normal good sequence is:
s3/CreateMultipartUpload
s3/PutObject
s3/GetObject
s3/ListParts
s3/GetObject
s3/ListParts
s3/GetObject
s3/UploadPart
s3/ListParts <---- response includes both PartNumber 0 & 1
s3/CompleteMultipartUpload

But for cases in which we see the problematic 0 byte objects we see the sequence:
s3/CreateMultipartUpload
s3/PutObject
s3/GetObject
s3/ListParts
s3/GetObject
s3/ListParts
s3/GetObject
s3/UploadPart
s3/ListParts <---- response only includes PartNumber 0, does not include PartNumber 1
s3/UploadPart <---- extra-step, uploads 0 byte file for PartNumber=1
s3/CompleteMultipartUpload

For that extra step, the UploadPart Request has:
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 (sha256 of 0 byte file)
And response shows:
Etag: "d41d8cd98f00b204e9800998ecf8427e" (md5 of 0 byte file)

I've tested adding a 1 second sleep before the ListParts call [2] and I haven't been able to reproduce the same issue, so that seems to support the theory of a race condition.
This appears to be the relevant code that is sending an empty part when this bug is triggered [3]:

if len(parts) == 0 {
	// AWS expects at least one part to be present when completing the multipart
	// upload. So if the tus upload has a size of 0, we create an empty part
	// and use that for completing the multipart upload.

I'm assuming this issue wouldn't be apparent when using AWS itself as a backend as their S3 is now strongly consistent. [4]

I've attached a couple of logs:

  • a good upload sequence [5]
  • a bad upload sequence that results in a 0 byte object [6]

[1] https://gist.github.com/dylanmcculloch/41f3e0e0254c990e39d0abfc6137a4bb#file-tusd_aws_debug-patch
[2] https://gist.github.com/dylanmcculloch/809e812e223c5bd3c05b7429cd1d645c#file-tusd_sleep_before_listallparts-patch
[3]

tusd/pkg/s3store/s3store.go

Lines 648 to 651 in 318aab4

if len(parts) == 0 {
// AWS expects at least one part to be present when completing the multipart
// upload. So if the tus upload has a size of 0, we create an empty part
// and use that for completing the multipart upload.

[4] https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/
[5] https://gist.github.com/dylanmcculloch/a07b09568b5870f6bd8dde7c47a5d41e#file-tusd_s3_swift_debug_good-log
[6] https://gist.github.com/dylanmcculloch/a72ecaf9088029f6cedc2a558f429d1f#file-tusd_s3_swift_debug_bad-log

@Acconut
Copy link
Member

Acconut commented Jun 2, 2023

Wow, thank you very much for this investigation! Given the eventual consistency of ListParts for OpenStack, the error now makes sense. tusd's reliance on the eventual consistency of ListParts was questioned in the past (see #204), but since we never ran into issues with AWS S3, we didn't saw a reason to drop ListParts. Now I am wondering if our issues reported here might be caused by this reliance as well.

I would expect this to become better in the v2 release, because we cache ListParts results there. Instead of calling ListParts whenever we need a list of parts, we fetch the list at the beginning of the request and then update and use our internal cache for everything else. So if you upload your file in one PATCH request, there should not be a race condition between UploadPart and ListParts. Could you try one of the lastest release candidates for v2?

@OKaluza
Copy link
Author

OKaluza commented Jun 5, 2023

Thanks heaps @dylanmcculloch for debugging and updating this!

@Acconut I've tried v2.0.0-rc19 and so far have not seen the issue so far, will keep testing. Is there anything we should be aware of using v2, is it production ready?

@Acconut
Copy link
Member

Acconut commented Jun 5, 2023

That's great to hear! v2 incorporates many changes, but we have also been running it in production for some time without issues. So it should be good. We hope to properly release it soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants