Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the concept of an Upload Collection to rate-limit hook firing better #268

Open
tim-kos opened this issue Apr 18, 2019 · 5 comments

Comments

@tim-kos
Copy link
Member

tim-kos commented Apr 18, 2019

Is your feature request related to a problem? Please describe.
There is one hook fired per second per upload by default. In a multi-upload scenario (let's say a Transloadit Assembly with 50 files), up to 50 hooks are fired every second. This is a lot. draining resources.

Depending on what you want from the hooks, it would be enough to know how many bytes (in sum, across all upload files) were received, versus the sum of bytes expected. And then have this communicated at most once per second.

Describe the solution you'd like
It would be really nice if we could introduce the concept of an upload collection, which has its own hooks. The upload collection would consist of:

  • ID
  • EXPECTED_NUM_OF_FILES
  • EXPECTED_BYTES
  • RECEIVED_BYTES
  • ?

... and come with its own hooks. It would fire the collection hooks as soon as a chunk of an upload of that collection was received, communicating these variables. But the hook would then also be fired at most every 1s.

Describe alternatives you've considered
One could introduce something like that in the hook system itself and use a local data store like redis within the post-receive hooks to calculate how many bytes in total were received so far and if we need to notify the underlying system about a received chunk.

However, this would still mean the cli/http hooks would be invoked once per second per uploaded file. This is not so big of a problem for cli hooks, but for http hooks it could be hundreds if not thousands of requests in a short time span, just for one upload context.

It would be nice to have first class support for this and limit the number of hook invocations in the first place.

Can you provide help with implementing this feature?
Possibly, yes.

Additional context
None

@Acconut
Copy link
Member

Acconut commented Apr 29, 2019

I have to say that this is a very Transloadit-specific request and I have never heard something comparable from another person. Especially in conjunction with #269, implementing this could get hairy but let's worry about that later. If I understand it correctly, the goal is to reduce the cost of delivering upload progress events to the application and there are basically two ways to achieve this:

Option A: Group multiple uploads into a collection by putting uploads with a shared meta data value in the same collection. After that, upload progress is only reported for the entire group as you mentioned, bringing down the number of total hook executions.

Option B would be to reduce the cost of a single hook execution. You mention that HTTP hooks would hammer an endpoint, however, I believe that HTTP/2 might be a solution here. HTTP/2 uses a single, multiplexed TCP connection meaning that multiple request/responses can be transmitted at the same time. Requests also do not need a new connection (as is already the case using HTTP/1.1 keep alive). Furthermore, the protocol is binary and not text-based so the parsing overhead is even less. If it's possible that tusd sends a HTTP/2 request to the application directly (i.e. without it getting converted to HTTP/1.1), that might be a possible alternative as the cost of a HTTP/2 request is significantly lower than for HTTP/1.1.

One could introduce something like that in the hook system itself and use a local data store like redis within the post-receive hooks to calculate how many bytes in total were received so far and if we need to notify the underlying system about a received chunk.

I don't think that that's a good solution in conjunction with #269 where multiple hooks depend on the response from the underlying system to determine which exit code they should return. Achieving that is possible but likely too complicated.

@SomeDeveloper13
Copy link

We have a similar situation where a bunch of users are uploading multiple files at the same time. the post-receive hook is kicked off once per second per concurrent upload. For hundreds of large files being uploaded concurrently, that's hundreds hooks per second for quite a long time, just for the post-receive hook alone.

For me I'd prefer a way of rate-limiting the hooks being fired on a per-hook basis by making the hard-coded once per second maximum per file a configurable value controlled by a series of environment variables, or a single environment variable that takes in a comma delimited string.
This would allow control per environment, per hook, while still getting the benefit of tracking the upload progress.

@Acconut
Copy link
Member

Acconut commented Nov 21, 2022

@SomeDeveloper13 The tusd v2 branch already contains a flag for controlling the interval of the post-receive hook:

flag.Int64Var(&Flags.ProgressHooksInterval, "progress-hooks-interval", 1000, "Interval in milliseconds at which the post-receive progress hooks are emitted for each active upload")

That might be interesting for you.

@mshunjan
Copy link

I have a similar situation where each user is uploading hundreds of files (sometimes small, sometimes big, depending on what they are doing). An upload collection would greatly reduce the number of requests we get from hooks.

@Acconut
Copy link
Member

Acconut commented Mar 26, 2023

Thank you for your feedback, @mshunjan. We have currently no plans to implement this right now, but we will consider it in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants