Skip to content
This repository has been archived by the owner on Sep 10, 2022. It is now read-only.

How does a client handle backtracking? #53

Closed
guoye-zhang opened this issue Jun 21, 2022 · 7 comments
Closed

How does a client handle backtracking? #53

guoye-zhang opened this issue Jun 21, 2022 · 7 comments

Comments

@guoye-zhang
Copy link
Contributor

If the request body is dynamically generated and the server backtracks to an offset that's no longer available, what should a client do?

If the server backtracks to 0, could the client upload a different thing? (My thinking is no)

@gregw
Copy link

gregw commented Jun 21, 2022

A client that is sending a dynamically generated body that cannot be regenerated exactly MUST NOT send an upload-token

Also a resumed request must send identical bytes to any already sent.

@gregw
Copy link

gregw commented Jun 21, 2022

I also think a note in the security implications about non-idempotent requests would be good.

@guoye-zhang
Copy link
Contributor Author

I think how the content is generated shouldn’t be the focus. It works fine as long as the client can reproduce the exact same bytes, so that requirement alone should be sufficient.

We should also mention upload cancellation in case that the client cannot resume.

@Acconut
Copy link
Member

Acconut commented Jul 19, 2022

I can share some details on how we handle backtracking in tus-js-client and tus-java-client. Basically, we divide data sources into two categories:

  1. Seekable data sources, where we can easily seek back to any position of the data source without any additional buffering. Examples are files on disk or in-memory blobs. Backtracking for them is easy and non-problematic.
  2. Non-seekable data sources, which are generated dynamically and are not permanently stored on disk or in-memory. As such, seeking backwards is not possible. Examples are web cam streams, streaming output from other programs etc.

For non-seekable data sources, the client forces the end user to specify a maximum request payload size, which is the maximum amount of bytes in a single Upload Transfer Procedure. If the entire upload size exceeds this payload size, the upload has to be split across multiple consecutive Upload Transfer Procedures, which is easily possible.

The clue is that the client will buffer the entire data that it sends in this Upload Transfer Procedure (either in memory or on disk). If this procedure gets interrupted due to connection issues, the client can simply backtrack to any position in this Upload Transfer Procedure and resume from there. If the Upload Transfer Procedure is complete, the client will simply discard the previous buffer, and read the next chunk of data to fill up the new buffer for the next Upload Transfer Procedure. We can safely discard the previous buffer because the Upload-Offset header in the response guarantees us the server has successfully saved all of the upload until this offset.

The maximum request payload size allows to balance between smaller buffer usage and larger request sizes (for less overhead).

@guoye-zhang
Copy link
Contributor Author

Yes, that makes a lot of sense. What I'm thinking about is implementing a rolling window buffer (maybe the last 1MB, but will adjust based on telemetry) in the case that the content is dynamically generated. Having chunked uploads is definitely more reliable, since you get real confirmation from the server that a chunk has been received, but in our situation if we are not aware of server support, we can't start chunking uploads.

@Acconut
Copy link
Member

Acconut commented Jul 20, 2022

Your approach is more optimistic and sounds like an best-effort approach without strict guarantees. But that might be OK, depending on your application and situation.

Maybe in the future it will be possible to receive new Upload-Offsets using multiple intermediate 1XX responses, so the client has guarantees about the uploaded data without having to use multiple Upload Transfer Procedures. However, I don't know if multiple intermediate responses are allowed right now in HTTP.

I think the tus v2 specification should not force clients to a specific behavior regarding backtracking, so they can choose how to handle such situations. But we can add recommendations so people know how to handle it.

@guoye-zhang
Copy link
Contributor Author

Yes, multiple intermediate responses are allowed. It could work, sort of like an HTTP-level ACK.

Agreed, we don't need to specify how to handle backtracking, just need to mention that backtracking is expected and should be handled gracefully.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants