proxy: add mcp.backend_depth_limit(int) #1137

dormando · 2024-05-16T00:29:38Z

If a backend has a queue depth limt over this amount, fast-fail any further requests.

A global parallel request limit can mean a single slow or overloaded backend causes the entire proxy to stop working. As a first layer of defence the depth for a particular backend should be capped.

TODO:

~~- [ ] might add a counter for failed-due-to-bad while I'm in here.~~
~~- [ ] might also add a tunable for fast-failing while be is connecting/validating. been meaning to do that forever.~~

delaying further TODO's until after the TLS work is in testing.

dormando · 2024-05-16T21:06:23Z

definitely failing the stability suite... requires a combo of running the fault-blocked test first and then any other test after that. working on it.

dormando · 2024-05-16T22:13:09Z

dumbassery.

dormando · 2024-05-17T00:59:25Z

got a bizarre protocol desync failure an hour in... not reproducing it on a retry using short times. need to re-run using longer timings. blocking merge until I can get multiple clean full runs.

dormando · 2024-05-17T04:27:49Z

passed full suite at 10s time and 30s time per test. running again at 60s with the same instance as the 30s. sadly fat-fingered the wrong window and didn't dump a core when it failed earlier. :/ This might be a rare unrelated problem.

If it passes the 60s suite I'll still merge but, but will try to keep running the tests every day for a while to see if there are any other rare issues.

dormando · 2024-05-17T04:31:30Z

New stat does get hit:

STAT proxy_config_reloads 875
STAT proxy_config_reload_fails 0
STAT proxy_config_cron_runs 0
STAT proxy_config_cron_fails 0
STAT proxy_backend_total 60
STAT proxy_backend_marked_bad 222
STAT proxy_backend_failed 9844
STAT proxy_request_failed_depth 848466

wonder if I have a mcshredder bug that's seeing a protocol desync?

dormando · 2024-05-20T16:29:51Z

STAT proxy_conn_requests 24064087641
STAT proxy_conn_errors 0
STAT proxy_conn_oom 0
STAT proxy_req_active 0
STAT proxy_await_active 0
STAT proxy_config_reloads 7692
STAT proxy_config_reload_fails 0
STAT proxy_config_cron_runs 0
STAT proxy_config_cron_fails 0
STAT proxy_backend_total 60
STAT proxy_backend_marked_bad 1450
STAT proxy_backend_failed 83787
STAT proxy_request_failed_depth 8268609

24b queries. 7.6k reloads... didn't see the issue I saw again. going to merge this now.

If a backend has a queue depth limt over this amount, fast-fail any further requests. A global parallel request limit can mean a single slow or overloaded backend causes the entire proxy to stop working. As a first layer of defence the depth for a particular backend should be capped.

dormando added needs review/testing proxy worklogs and issues related to proxy labels May 16, 2024

dormando force-pushed the proxy_bedepthlim branch from 0e9d0da to b23c737 Compare May 16, 2024 00:35

dormando force-pushed the proxy_bedepthlim branch from c3d7c30 to 4a75ac2 Compare May 20, 2024 16:35

dormando merged commit 4a75ac2 into memcached:next May 20, 2024
1 check passed

dormando deleted the proxy_bedepthlim branch May 20, 2024 16:35

dormando added merged/fixed for next and removed needs review/testing labels May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proxy: add mcp.backend_depth_limit(int) #1137

proxy: add mcp.backend_depth_limit(int) #1137

dormando commented May 16, 2024 •

edited

dormando commented May 16, 2024

dormando commented May 16, 2024

dormando commented May 17, 2024

dormando commented May 17, 2024 •

edited

dormando commented May 17, 2024

dormando commented May 20, 2024

proxy: add mcp.backend_depth_limit(int) #1137

proxy: add mcp.backend_depth_limit(int) #1137

Conversation

dormando commented May 16, 2024 • edited

dormando commented May 16, 2024

dormando commented May 16, 2024

dormando commented May 17, 2024

dormando commented May 17, 2024 • edited

dormando commented May 17, 2024

dormando commented May 20, 2024

dormando commented May 16, 2024 •

edited

dormando commented May 17, 2024 •

edited