Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFD 110: Discussion #52

Open
jmwski opened this issue Aug 22, 2017 · 5 comments
Open

RFD 110: Discussion #52

jmwski opened this issue Aug 22, 2017 · 5 comments

Comments

@jmwski
Copy link
Contributor

jmwski commented Aug 22, 2017

This issue represents an opportunity for discussion of RFD 108 Operator Configurable Throttles for Manta while it remains in a pre-draft state.

@jmwski jmwski changed the title RFD 108: Discussion RFD 109: Discussion Aug 23, 2017
@trentm trentm changed the title RFD 109: Discussion RFD 108: Discussion Aug 23, 2017
@jmwski jmwski changed the title RFD 108: Discussion RFD 110: Discussion Sep 1, 2017
@trentm
Copy link
Contributor

trentm commented Sep 11, 2017

Note that restify@5.x has server.inflightRequests() and a simple plugin to respond with 429s if the number of inflight requests goes above a hard threshold: https://github.com/restify/node-restify/blob/master/lib/plugins/inflightRequestThrottle.js
This seems equiv to the "concurrency + queueTolerance" described in the current proposal.

A discussion of or comparison with restify's current throttle (http://restify.com/docs/plugins-api/#throttle) plugin would be useful. Sorry if I missed that discussion elsewhere.

@jmwski
Copy link
Contributor Author

jmwski commented Sep 11, 2017

At a manta call a couple of weeks back I proposed using a similar restify plugin but we came to the conclusion to not use it because (1) not all services that we want to throttle use restify, and (2) we also just didn't want to depend on restify. I can include some notes about this in the RFD itself.

@rmustacc
Copy link
Contributor

Thanks for putting this together.

Is it possible to get a discussion of why we ended up going with a per-process model with no state shared between them? I'm not sure if this was just for simplicity, if other things were ruled out already, or not. I mostly ask because other major cases that have come up for throttling are to have per-user throttles, where the stateless nature can break down right away. In addition, it can be harder to reason about what the per-process throttle should be, as that ends up meaning that the throttle scales with the number of zones. Also, was there any consideration for doing the throttling based on the backend resources? For example, the global handled request rate may vary a lot depending on the actual consumed resources.

Is throttling going to be enabled by default or will it be disabled by default and it will be up to operators to enable it somehow? If so, how do we plan on managing this across the fleet even statically. Would these be sapi properties that config-agent picks up or is there some other means that we're expecting to use?

@jmwski
Copy link
Contributor Author

jmwski commented Sep 12, 2017

The per-process muskie throttle was just an initial iteration intended mostly as a proof-of-concept but also as a base implementation to work off of to build a more generic library.

I think making throttling decisions based on the availability of backend resources is a good idea. Do you have any suggestions as to what specific resources would be useful to take into account and how I might be able to interface with such metrics from a node module?

The eventual plan is to have a globally aware stateful throttling service. If I recall correctly in an early discussion at the manta calls we decided that sapi might not be the right place for these tunable to be pushed out because they're intended for long-lived properties that don't change frequently (I'm not 100% certain but I seem to recall dap saying something along those lines).

In the RFD I propose the idea of having http endpoint for setting and getting throttle parameters dynamically. I guess I really what I really want is for it to play the role of an agent that could be queried periodically by a global throttling service that could make decisions based in information from all the muskie processes.

I think maybe leaving throttling off by default might be less invasive. But I could also see having a boolean per-service parameter that decided whether throttling is on or not.

UPDATE: I've modified the RFD to (hopefully) address the above feedback! Thanks to everyone who has read through it so far!

@kellymclaughlin
Copy link

kellymclaughlin commented Sep 18, 2017

Returning a 429 for every throttled request seems potentially confusing since the throttle is not actually looking at the user account making the requests to make decisions. It seems like there are two ideas here: user rate-limiting and backpressure to prevent system overload. This makes me think about two different things:

  1. It seems like most of the cases we're trying to handle with the initial iteration are actually server-side problems and something like a 503 would be more appropriate. For internal services like moray where the database replication is laggy that isn't a user problem, it's our internal service that is unable to keep up. If the return status code is static maybe it should be configurable.

  2. Maybe the throttle (now or in the future) should have a backpressure component for managing overload that receives feedback from the servers it communicates with and also a user rate-limiting component. It be great to be able distinguish these two cases and return different status codes based on the problem.

As an aside, I wonder if it would be a good idea to standardize our manta components to all use either an HTTP framework like restify or the RPC framework. I can't claim to know the history about why different pieces use one or the other, but having more commonality has a lot of advantages when learning and debugging and also means less software (and accompanying bugs) to support.

@jmwski jmwski closed this as completed Sep 29, 2017
@jmwski jmwski reopened this Sep 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants