Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Upload-Defer-Length Extension #182

Merged
merged 23 commits into from
Jun 4, 2018

Conversation

acj
Copy link
Contributor

@acj acj commented Apr 29, 2018

This is a first pass at supporting the upload-defer-length protocol extension. It's worked well in my own testing, but I suspect that there may be edge cases that aren't being handled properly. For example, it's difficult for the S3 backend to ensure that the minimum chunk size is respected when the full file size isn't known.

Happy to discuss and iterate on this further.

@Acconut
Copy link
Member

Acconut commented Apr 29, 2018

That is amazing, great work! I never expected to see such as great PR out of the void! I only had a quick look over this but it appears to be done very solid.

Your concern about data stores which do not support deferring lengths is very valid. As you said the S3 data store would not support this extension without additional work. Therefore, I would propose that data stores explicitly opt-in if they support deferring length. This would work similar to how data stores announce that they support the concatenation extension, for example:

What do you think about this approach? If this is a bit confusing, I am happy to explain it to you.

@acj
Copy link
Contributor Author

acj commented Apr 30, 2018

@Acconut That approach sounds good to me. I'll work on implementing it for this PR. Thank you for the detailed suggestion.

The S3 data store is my primary use case, so I'd like to make it capable of deferring the upload length -- probably in a separate PR. I think that the key problem is recognizing when we've written a chunk smaller than MinPartSize (which should be the terminal chunk) and then try to write a subsequent chunk.

Do you think it would suffice to add a field to FileInfo, something like HaveWrittenShortChunk bool, that we can inspect in the S3 data store's WriteChunk function and then abort the upload with an appropriate error?

@Acconut
Copy link
Member

Acconut commented Apr 30, 2018

The S3 data store is my primary use case, so I'd like to make it capable of deferring the upload length -- probably in a separate PR

Sounds good to me 👍

I think that the key problem is recognizing when we've written a chunk smaller than MinPartSize (which should be the terminal chunk) and then try to write a subsequent chunk.

Yes, exactly, that is the main problem with the S3 storage.

Do you think it would suffice to add a field to FileInfo, something like HaveWrittenShortChunk bool, that we can inspect in the S3 data store's WriteChunk function and then abort the upload with an appropriate error?

No, I don't think that this is a good solution. If a user sends a patch request with a size smaller than MinPartSize (this does not have to happen on purpose but can just be caused by a network interruption), they will not be able to contain the upload because HaveWrittenShortChunk will be set to true. This forces users to be aware that tusd is using AWS S3 as a backend and to act accordinly. IMO, tusd should be an abstraction above the data store and not leak such implementation details.

On the other hand, I do not have a perfect alternative approach. The one that I think is best suited is following: If we receive a chunk which is larger than MinPartSize, we upload is as a part to our multipart upload. If the chunk is too small, we do not upload it to the multipart upload but instead store it inside a intermediate different S3 object (which is not a multipart upload). Then if we receive the next chunk, we first download the intermediate S3 object to prepend it to the next chunk which then (hopefully) is bigger than MinPartSize, so we can upload it as a multipart part.

This approach is rather complex but might solve the problem. What do you think?

@acj
Copy link
Contributor Author

acj commented May 3, 2018

Sorry for the delay. I hadn't considered the network interruption case -- that changes things a bit. Your approach makes sense to me, though, and seems safe if tusd is configured with a locker. I appreciate the suggestion and will try it in a separate PR.

@@ -257,3 +258,7 @@ func (store GCSStore) GetReader(id string) (io.Reader, error) {

return r, nil
}

func (store GCSStore) SupportsDeferredLength() bool {
return true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I am not so sure whether GCSStore supports this extension out of the box.

@Acconut
Copy link
Member

Acconut commented May 8, 2018

No worries, I wasn't really available either. May I ask what's the status on this PR? Is it finished or are you still working on it?

@acj
Copy link
Contributor Author

acj commented May 8, 2018

It's finished in terms of implementation, but I wanted to ask if any additional tests are needed. Since you mentioned it in another comment, is there anything specific to GCSStore that concerns you?

@Acconut
Copy link
Member

Acconut commented May 10, 2018

Great, however I believe there is still a function missing in the PATCH handler which permanently stores the upload's size once it's known for uploads with deferred length. Currently, the length is only attached to the PATCH request's response (https://github.com/tus/tusd/pull/182/files#diff-4e64569f67c7c0214e072f52dbe8d504R466) but further PATCH or HEAD requests will not contain the correct upload's length, AFAIK.

Since you mentioned it in another comment, is there anything specific to GCSStore that concerns you?

Yes, there is. Frankly, I am not very proficient in the GCSStore code and since it has not been shown yet that GCSStore is compatible with this extension, I would not assume it is.

@acj
Copy link
Contributor Author

acj commented May 12, 2018

I believe there is still a function missing in the PATCH handler which permanently stores the upload's size once it's known for uploads with deferred length.

Yes, good catch. I've added a PutInfo function on DataStore to handle persistence.

since it has not been shown yet that GCSStore is compatible with this extension, I would not assume it is.

Fair enough. I haven't found any documentation that suggests a problem, but it would be good to test this against a real GCS bucket. I've disabled it for now.

Please let me know if any other changes or tests are needed.

@Acconut
Copy link
Member

Acconut commented May 13, 2018

I've added a PutInfo function on DataStore to handle persistence.

Great idea, however I would have made the function less general, e.g. DeclareLength instead of PutInfo. The reason is that users may assume that the PutInfo function is able of overwriting any value in the FileInfo struct. However, this is not always the case:
For example, the ID, IsPartial, IsFinal and PartialUploads properties should never be changed after the upload has been created. The Offset property should only be changed when WriteChunk is involved. On S3 (and also possibly other stores) metadata cannot be modified after an upload has been created.
Therefore, I would propose to just add a DeclareLength(length int) error function to the LengthDeferrerDataStore inferface. In that turn we can also get rid of the SupportsDeferredLength which I only proposed so that we do not have an empty interface.

Sorry for this inconvenience but does this make sense? Everything else is looking perfect 👍

@acj
Copy link
Contributor Author

acj commented May 13, 2018

No worries. That makes sense and sounds good to me. I'll make the suggested change.

the ID, IsPartial, IsFinal and PartialUploads properties should never be changed after the upload has been created

Do you think this property should be documented somewhere, e.g. on each field of the FileInfo struct? It seems obvious in retrospect, but I didn't think about it when I was working on the last batch of changes.

@acj acj force-pushed the acj/add-upload-defer-length-support branch from 89d69fc to 675a499 Compare May 13, 2018 14:36
@acj
Copy link
Contributor Author

acj commented May 13, 2018

Okay, I did a bit of branch cleanup and force-pushed. Your DeclareLength idea worked out very well, I think. It's cleaner, simpler, and easier to understand. 👍

@Acconut
Copy link
Member

Acconut commented May 15, 2018

This looks amazing, thank you very much! I had a quick look over this but I am going to test it in full depth this week.

Copy link
Member

@Acconut Acconut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a deeper look into this and it's very solid work! There are still some edge cases but those are hard to spot, let me know what you think about them. Also, I took the freedom to merge master into your branch, so you don't have to deal with the merge conflict.

Furthermore, the sizeOfUploads function, which is used for calculating the size when concatenating uploads, needs a check if the partial uploads have a deferred size: https://github.com/tus/tusd/pull/182/files#diff-4e64569f67c7c0214e072f52dbe8d504R867

@@ -115,7 +119,7 @@ func NewUnroutedHandler(config Config) (*UnroutedHandler, error) {
}

// Only promote extesions using the Tus-Extension header which are implemented
extensions := "creation,creation-with-upload"
extensions := "creation,creation-with-upload,creation-defer-length"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creation-defer-length should only be added if the data storage support it, similar to how the Termination extension is handled in the next lines.

if handler.composer.UsesLengthDeferrer && info.SizeIsDeferred && r.Header.Get("Upload-Length") != "" {
uploadLength, err := strconv.ParseInt(r.Header.Get("Upload-Length"), 10, 64)
if err != nil || uploadLength < 0 {
handler.sendError(w, r, ErrInvalidOffset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ErrInvalidOffset seems to not be the correct error here. What about ErrInvalidUploadLength?


info.Size = uploadLength
info.SizeIsDeferred = false
if err := handler.composer.LengthDeferrer.DeclareLength(id, info.Size); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An additional check if the new length is not smaller than the current offset (info.Offset <= uploadLength) and the new length does not exceed the file limit (see https://github.com/tus/tusd/blob/master/unrouted_handler.go#L264-L268) before the call to DeclareLength would be needed.

w.Header().Set("Upload-Offset", strconv.FormatInt(offset, 10))
handler.sendResp(w, r, http.StatusNoContent)
return
}

if handler.composer.UsesLengthDeferrer && info.SizeIsDeferred && r.Header.Get("Upload-Length") != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this check is perfectly fine, I would love to see an error if someone tries to declare the length for an upload for a) an upload where length deferring is not enabled and b) an upload where the length has already been declared.

@@ -460,7 +488,7 @@ func (handler *UnroutedHandler) writeChunk(id string, info FileInfo, w http.Resp
offset := info.Offset

// Test if this upload fits into the file's size
if offset+length > info.Size {
if !info.SizeIsDeferred && offset+length > info.Size {
return ErrSizeExceeded
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following lines where maxSize is defined also need a consideration if the size is deferred. Something like:

maxSize := info.Size - offset
// If the upload's length is deferred and the PATCH request does not contain the Content-Length
// header (which is allowed if 'Transfer-Encoding: chunked' is used), we still need to set limits for
// the body size.
if info.SizeIsDeferred {
	if handler.config.MaxSize > 0 {
		// Ensure that the upload does not exceed the maximum upload size
		maxSize = handler.config.MaxSize - offset
	} else {
		// If no upload limit is given, we will allowed arbitrary sizes
		maxSize = math.MaxInt64
	}
}
if length > 0 {
	maxSize = length
}

@acj acj force-pushed the acj/add-upload-defer-length-support branch from 34b0afd to 8385faa Compare June 3, 2018 17:01
@acj acj force-pushed the acj/add-upload-defer-length-support branch from 8f9785b to c605926 Compare June 3, 2018 20:03
@acj
Copy link
Contributor Author

acj commented Jun 3, 2018

Thanks again for the detailed feedback. I've fixed the things you pointed out, rebased onto current master, and pushed. Let me know if anything else needs to be done.

@Acconut
Copy link
Member

Acconut commented Jun 4, 2018

Amazing! Thank you very, very much for your hard work and patience with my feedback :)

@Acconut Acconut merged commit d9d0f7c into tus:master Jun 4, 2018
handler.sendError(w, r, ErrInvalidUploadLength)
}
} else {
handler.sendError(w, r, ErrNotImplemented)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think christmas trees like these should be avoided if reasonably possible by returning home early. I think there's an official term for it, but I couldn't find it. Maybe this post that I did find helps illustrate the concept tho http://blog.timoxley.com/post/47041269194/avoid-else-return-early :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i realize this pr has been merged already and it's nothing critical. just a friendly tip for future endeavours :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, @kvz. I agree -- an early return would be better. I also realized that this PR doesn't add Upload-Defer-Length to the CORS allowed header list, so I'll create a followup PR that addresses both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants