-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Upload-Defer-Length Extension #182
Conversation
That is amazing, great work! I never expected to see such as great PR out of the void! I only had a quick look over this but it appears to be done very solid. Your concern about data stores which do not support deferring lengths is very valid. As you said the S3 data store would not support this extension without additional work. Therefore, I would propose that data stores explicitly opt-in if they support deferring length. This would work similar to how data stores announce that they support the concatenation extension, for example:
What do you think about this approach? If this is a bit confusing, I am happy to explain it to you. |
@Acconut That approach sounds good to me. I'll work on implementing it for this PR. Thank you for the detailed suggestion. The S3 data store is my primary use case, so I'd like to make it capable of deferring the upload length -- probably in a separate PR. I think that the key problem is recognizing when we've written a chunk smaller than Do you think it would suffice to add a field to |
Sounds good to me 👍
Yes, exactly, that is the main problem with the S3 storage.
No, I don't think that this is a good solution. If a user sends a patch request with a size smaller than MinPartSize (this does not have to happen on purpose but can just be caused by a network interruption), they will not be able to contain the upload because HaveWrittenShortChunk will be set to true. This forces users to be aware that tusd is using AWS S3 as a backend and to act accordinly. IMO, tusd should be an abstraction above the data store and not leak such implementation details. On the other hand, I do not have a perfect alternative approach. The one that I think is best suited is following: If we receive a chunk which is larger than MinPartSize, we upload is as a part to our multipart upload. If the chunk is too small, we do not upload it to the multipart upload but instead store it inside a intermediate different S3 object (which is not a multipart upload). Then if we receive the next chunk, we first download the intermediate S3 object to prepend it to the next chunk which then (hopefully) is bigger than MinPartSize, so we can upload it as a multipart part. This approach is rather complex but might solve the problem. What do you think? |
Sorry for the delay. I hadn't considered the network interruption case -- that changes things a bit. Your approach makes sense to me, though, and seems safe if tusd is configured with a locker. I appreciate the suggestion and will try it in a separate PR. |
gcsstore/gcsstore.go
Outdated
@@ -257,3 +258,7 @@ func (store GCSStore) GetReader(id string) (io.Reader, error) { | |||
|
|||
return r, nil | |||
} | |||
|
|||
func (store GCSStore) SupportsDeferredLength() bool { | |||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, I am not so sure whether GCSStore supports this extension out of the box.
No worries, I wasn't really available either. May I ask what's the status on this PR? Is it finished or are you still working on it? |
It's finished in terms of implementation, but I wanted to ask if any additional tests are needed. Since you mentioned it in another comment, is there anything specific to GCSStore that concerns you? |
Great, however I believe there is still a function missing in the PATCH handler which permanently stores the upload's size once it's known for uploads with deferred length. Currently, the length is only attached to the PATCH request's response (https://github.com/tus/tusd/pull/182/files#diff-4e64569f67c7c0214e072f52dbe8d504R466) but further PATCH or HEAD requests will not contain the correct upload's length, AFAIK.
Yes, there is. Frankly, I am not very proficient in the GCSStore code and since it has not been shown yet that GCSStore is compatible with this extension, I would not assume it is. |
Yes, good catch. I've added a
Fair enough. I haven't found any documentation that suggests a problem, but it would be good to test this against a real GCS bucket. I've disabled it for now. Please let me know if any other changes or tests are needed. |
Great idea, however I would have made the function less general, e.g. Sorry for this inconvenience but does this make sense? Everything else is looking perfect 👍 |
No worries. That makes sense and sounds good to me. I'll make the suggested change.
Do you think this property should be documented somewhere, e.g. on each field of the |
89d69fc
to
675a499
Compare
Okay, I did a bit of branch cleanup and force-pushed. Your |
This looks amazing, thank you very much! I had a quick look over this but I am going to test it in full depth this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a deeper look into this and it's very solid work! There are still some edge cases but those are hard to spot, let me know what you think about them. Also, I took the freedom to merge master into your branch, so you don't have to deal with the merge conflict.
Furthermore, the sizeOfUploads
function, which is used for calculating the size when concatenating uploads, needs a check if the partial uploads have a deferred size: https://github.com/tus/tusd/pull/182/files#diff-4e64569f67c7c0214e072f52dbe8d504R867
unrouted_handler.go
Outdated
@@ -115,7 +119,7 @@ func NewUnroutedHandler(config Config) (*UnroutedHandler, error) { | |||
} | |||
|
|||
// Only promote extesions using the Tus-Extension header which are implemented | |||
extensions := "creation,creation-with-upload" | |||
extensions := "creation,creation-with-upload,creation-defer-length" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
creation-defer-length
should only be added if the data storage support it, similar to how the Termination extension is handled in the next lines.
unrouted_handler.go
Outdated
if handler.composer.UsesLengthDeferrer && info.SizeIsDeferred && r.Header.Get("Upload-Length") != "" { | ||
uploadLength, err := strconv.ParseInt(r.Header.Get("Upload-Length"), 10, 64) | ||
if err != nil || uploadLength < 0 { | ||
handler.sendError(w, r, ErrInvalidOffset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ErrInvalidOffset
seems to not be the correct error here. What about ErrInvalidUploadLength
?
unrouted_handler.go
Outdated
|
||
info.Size = uploadLength | ||
info.SizeIsDeferred = false | ||
if err := handler.composer.LengthDeferrer.DeclareLength(id, info.Size); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An additional check if the new length is not smaller than the current offset (info.Offset <= uploadLength
) and the new length does not exceed the file limit (see https://github.com/tus/tusd/blob/master/unrouted_handler.go#L264-L268) before the call to DeclareLength
would be needed.
unrouted_handler.go
Outdated
w.Header().Set("Upload-Offset", strconv.FormatInt(offset, 10)) | ||
handler.sendResp(w, r, http.StatusNoContent) | ||
return | ||
} | ||
|
||
if handler.composer.UsesLengthDeferrer && info.SizeIsDeferred && r.Header.Get("Upload-Length") != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this check is perfectly fine, I would love to see an error if someone tries to declare the length for an upload for a) an upload where length deferring is not enabled and b) an upload where the length has already been declared.
@@ -460,7 +488,7 @@ func (handler *UnroutedHandler) writeChunk(id string, info FileInfo, w http.Resp | |||
offset := info.Offset | |||
|
|||
// Test if this upload fits into the file's size | |||
if offset+length > info.Size { | |||
if !info.SizeIsDeferred && offset+length > info.Size { | |||
return ErrSizeExceeded | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following lines where maxSize is defined also need a consideration if the size is deferred. Something like:
maxSize := info.Size - offset
// If the upload's length is deferred and the PATCH request does not contain the Content-Length
// header (which is allowed if 'Transfer-Encoding: chunked' is used), we still need to set limits for
// the body size.
if info.SizeIsDeferred {
if handler.config.MaxSize > 0 {
// Ensure that the upload does not exceed the maximum upload size
maxSize = handler.config.MaxSize - offset
} else {
// If no upload limit is given, we will allowed arbitrary sizes
maxSize = math.MaxInt64
}
}
if length > 0 {
maxSize = length
}
34b0afd
to
8385faa
Compare
8f9785b
to
c605926
Compare
Thanks again for the detailed feedback. I've fixed the things you pointed out, rebased onto current master, and pushed. Let me know if anything else needs to be done. |
Amazing! Thank you very, very much for your hard work and patience with my feedback :) |
handler.sendError(w, r, ErrInvalidUploadLength) | ||
} | ||
} else { | ||
handler.sendError(w, r, ErrNotImplemented) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think christmas trees like these should be avoided if reasonably possible by returning home early. I think there's an official term for it, but I couldn't find it. Maybe this post that I did find helps illustrate the concept tho http://blog.timoxley.com/post/47041269194/avoid-else-return-early :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i realize this pr has been merged already and it's nothing critical. just a friendly tip for future endeavours :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, @kvz. I agree -- an early return would be better. I also realized that this PR doesn't add Upload-Defer-Length
to the CORS allowed header list, so I'll create a followup PR that addresses both.
This is a first pass at supporting the upload-defer-length protocol extension. It's worked well in my own testing, but I suspect that there may be edge cases that aren't being handled properly. For example, it's difficult for the S3 backend to ensure that the minimum chunk size is respected when the full file size isn't known.
Happy to discuss and iterate on this further.