-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When using S3 storage sometimes files are shown as uploaded but a zero byte object is stored #497
Comments
Interesting report, thank you! My first guess is that maybe OpenStack Swift is behaving differently than S3, confusing tusd. Can you please record the tusd logs for all uploads and send them over? If the file would be too large for GitHub, you can also mail it to marius@transloadit.com.
Yes, that is because tus-js-client and Uppy store the upload URL associated with your files in localStorage. Either cleaning localStorage or setting |
Thanks @Acconut, I tried to find a log file but it doesn't seem to be there, where is it kept? |
Apologies for my late reply, @OKaluza. I was out of the office for some time. Let me know if this behavior is still occurring.
tusd does not create a log file but simply writes its output to stdout/stderr. It is then the user's responsibility to store the output in a file if desired so. We have desired to not include file logging functionality in tusd to reduce complexity. |
Hi @Acconut no worries, I stopped using the S3 interface as I was couldn't find enough info to debug whether it was an issue with our storage backend or tusd and was unable to make it 100% reliable. |
Hi @Acconut I am trying to use the S3 interface again and running into the same issue. |
The logs contain multiple errors of the same type:
Each upload to S3 is split into multiple parts where each part is uploaded on its own (called S3 Multipart Upload). Each part must meet a minimum part size requirement. For AWS S3, this is 5MB. But in this case a few parts did not meet this minimum. Maybe OpenStack has a different minimum size? Did you change any configuration of tusd? Also, I am noticing multiple PATCH requests for the upload 1dd661d49f35c3947a46654e48952ef2. That looks suspicious. |
Thanks for looking at it, I haven't changed anything in the tusd config apart from setting s3-bucket, s3-endpoint, and behind-proxy. Is it worth playing with the "partSize" option? OpenStack should have the same 5MB minimum, it intends to be completely compatible with the basic S3 interface and I haven't had any issues using it with other applications that support S3, or in python with boto3 etc. My attempts to debug the issue today haven't come up with anything very helpful, but there are no 500 errors or any obvious errors in the logs at all this time, the uploads appear to be fine, there are 204 - no content responses when accessing the uploads that are stored as zero bytes however. For example this upload:
The upload log entries:
The download log entries:
The 9eec7f3c271cb983cc4f883c97289f23.info file:
|
Hi @Acconut, I help administer the Openstack Swift cluster that @OKaluza is using. Operating System: Linux For an upload that results in a 0 byte object, it looks like the tusd/aws client initially sends the correct object to Swift/S3, but then the client almost immediately overwrites the object with a 0 byte object. The main issue appears to be that after the tusd/aws client sends the UploadPart request it immediately performs a subsequent ListParts call. The normal good sequence is: But for cases in which we see the problematic 0 byte objects we see the sequence: For that extra step, the UploadPart Request has: I've tested adding a 1 second sleep before the ListParts call [2] and I haven't been able to reproduce the same issue, so that seems to support the theory of a race condition.
I'm assuming this issue wouldn't be apparent when using AWS itself as a backend as their S3 is now strongly consistent. [4] I've attached a couple of logs:
[1] https://gist.github.com/dylanmcculloch/41f3e0e0254c990e39d0abfc6137a4bb#file-tusd_aws_debug-patch Lines 648 to 651 in 318aab4
[4] https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/ [5] https://gist.github.com/dylanmcculloch/a07b09568b5870f6bd8dde7c47a5d41e#file-tusd_s3_swift_debug_good-log [6] https://gist.github.com/dylanmcculloch/a72ecaf9088029f6cedc2a558f429d1f#file-tusd_s3_swift_debug_bad-log |
Wow, thank you very much for this investigation! Given the eventual consistency of ListParts for OpenStack, the error now makes sense. tusd's reliance on the eventual consistency of ListParts was questioned in the past (see #204), but since we never ran into issues with AWS S3, we didn't saw a reason to drop ListParts. Now I am wondering if our issues reported here might be caused by this reliance as well. I would expect this to become better in the v2 release, because we cache ListParts results there. Instead of calling ListParts whenever we need a list of parts, we fetch the list at the beginning of the request and then update and use our internal cache for everything else. So if you upload your file in one PATCH request, there should not be a race condition between UploadPart and ListParts. Could you try one of the lastest release candidates for v2? |
Thanks heaps @dylanmcculloch for debugging and updating this! @Acconut I've tried v2.0.0-rc19 and so far have not seen the issue so far, will keep testing. Is there anything we should be aware of using v2, is it production ready? |
That's great to hear! v2 incorporates many changes, but we have also been running it in production for some time without issues. So it should be good. We hope to properly release it soon! |
Describe the bug
I'm not sure if this is an issue with Uppy or tusd, but I have a tusd server connected to an S3 compatible store (On OpenStack Swift) which I'm uploading files to with the Uppy react Dashboard element.
Everything works well most of the time, but when uploading large numbers of files (eg: 907 jpg images in my test case) all of the files are marked as uploading successfully and all appear as entries in the S3 bucket, but some of the files are zero bytes, ~5 out of 907 images in this case.
The metadata stored for the invalid images has the correct file sizes and the previews appear fine in Uppy.
It is then impossible to upload these images again without manually deleting them from the bucket, as it always sees the existing uploads as valid, even though the data is missing.
If I disable S3 storage all the files are uploaded fine.
To Reproduce
Upload a large number of images (100s) from Uppy React Dashboard to S3 backed tusd server
Expected behavior
All the images should be stored in the S3 bucket.
Setup details
I am using this helm chart to run the server https://github.com/sagikazarmark/helm-charts/tree/master/charts/tusd
The text was updated successfully, but these errors were encountered: