feat(relay): prioritize mux dispatch and expose script health by maybeknott · Pull Request #1388 · therealaleph/MasterHttpRelayVPN-RUST

maybeknott · 2026-05-24T17:20:06Z

This groups the relay batching, local quota steering, failure quarantine, and script-health visibility pieces into one coherent relay behavior slice.

TunnelMux now classifies queued operations by dispatch urgency before applying the coalescing wait. Connection opens, connect-and-send opens, and non-empty TCP/UDP data operations bypass the short batching delay once already queued work has been drained. Empty polls and close notices remain batch-friendly so idle long-poll cadence and cleanup traffic can still piggyback without forcing extra Apps Script batches.

DomainFronter also keeps a local rolling 24-hour ledger per configured Apps Script deployment. Selection prunes expired observations and prefers non-blacklisted deployments whose locally observed call count is still below the free-tier steering threshold. If all healthy deployments are locally saturated, routing continues instead of creating a client-side outage, preserving compatibility with paid quotas, shared deployments, and cases where the local estimate is conservative.

Failure handling now separates hard quota/account failures from transient relay failures. HTTP 429/403 and recognizable quota-exceeded bodies quarantine the affected deployment for 24 hours. Transient relay failures such as 5xx responses use a short cooldown, while timeout strikes continue to protect against repeatedly selecting a stalled deployment.

The desktop UI exposes a read-only Script health panel with masked deployment IDs, local rolling-window usage, saturation status, cooldown expiry, failure reason, and timeout strike counts. The guide documents that this is local telemetry rather than an authoritative Google quota reading.

Validation:

git diff --check
cargo test mux_priority --lib
cargo test should_fire --lib
cargo test quota --lib
cargo test quarantine --lib
cargo test script_health --lib
cargo test compact_seconds_formatter --features ui --bin mhrv-rs-ui

Classify tunnel mux messages by dispatch urgency before applying the coalescing wait. Plain connection opens, connect-and-send opens, and data-bearing TCP or UDP operations now bypass the short batching delay once any already queued work has been drained. Empty polling operations and close notices remain batch-friendly so idle long-poll cadence and cleanup traffic can still piggyback without forcing extra Apps Script batches. The change leaves batch serialization, response indexing, payload-size limits, operation-count limits, deployment selection, and Apps Script quota accounting unchanged. It only decides whether the mux should wait for additional operations before processing the current group, reducing avoidable latency for interactive flows while preserving batching behavior for low-urgency traffic. Add focused unit coverage for immediate opening and payload-carrying messages, batchable empty polls and closes, and mixed groups where one urgent operation should short-circuit the wait.

Apps Script quota is consumed per relay invocation, but a plain round-robin selector has no memory of how heavily this client has used each deployment inside the recent quota window. When multiple script IDs are configured, continuing to select an already saturated deployment while another configured deployment is still locally underused wastes available capacity and increases the chance of quota-related relay stalls. DomainFronter now keeps a per-script local ledger of selection timestamps in a rolling 24-hour window. Before choosing a script ID, the selector prunes expired observations and prefers non-blacklisted deployments whose local call count remains below the free-tier request budget. Both the single-request selector and the parallel fan-out selector use the same ledger so Apps Script batches and relay fan-out draw from the same local capacity model. The ledger records selections at dispatch time. That deliberately accounts for concurrent fan-out attempts and for requests that may still complete server-side after the Rust future is dropped. The ledger is a local steering signal rather than an authoritative Google quota reading: if every non-blacklisted deployment is locally saturated, the selector still returns a deployment instead of creating a client-side outage. This preserves connectivity for paid Workspace quotas, shared deployments whose external usage is invisible to this process, and cases where the local estimate is conservative. Selection remains decoupled from the existing failure blacklist. Blacklisted deployments are still skipped first; the rolling quota ledger only orders otherwise healthy deployments by locally observed capacity. If all deployments are blacklisted, the existing earliest-cooldown recovery path is preserved and the selected deployment is recorded in the ledger. The guide now describes the local rolling 24-hour ledger in the Full Mode deployment-scaling section, including the fact that it steers away from deployments this client has already driven near the free-tier request budget. Unit coverage exercises saturated deployment skipping, expired observation pruning, all-saturated connectivity fallback, and parallel selection preferring unsaturated deployments.

A single cooldown duration is too coarse for Apps Script deployment failures. Quota exhaustion and account-level authorization failures recover on a much longer cadence than transient Google edge or Apps Script backend failures. Treating both classes the same either probes exhausted deployments too aggressively or removes transiently unhealthy deployments for longer than necessary. Relay failure handling now classifies script failures into two explicit quarantine classes. HTTP 429, HTTP 403, and response bodies that match quota or service-invocation limit text are treated as hard quota/account failures and quarantined for 24 hours. Google or Apps Script transient 5xx responses are treated as temporary relay failures and use the existing short cooldown window. The transient class is deliberately narrow. Generic upstream 5xx bodies such as a destination-origin bad gateway do not quarantine a script ID by themselves; the body must look like a Google, Apps Script, GFE, backend, service-unavailable, temporary, or timeout failure. This avoids punishing healthy deployments for ordinary origin-side errors that Apps Script relayed correctly. The same classifier is used across the direct relay path, h1 fallback path, tunnel single-operation path, and tunnel batch path. Quota-like errors returned inside the Apps Script JSON envelope still force the hard quarantine path even when the outer HTTP status is 200. The English and Persian guides now describe auto-quarantine as two failure classes instead of a single ten-minute blacklist. Unit coverage verifies hard quota/account classification, transient Google-edge classification, ordinary upstream 5xx pass-through, and the quarantine durations for both classes.

Add a read-only per-deployment health snapshot over the existing relay state so operators can inspect how deployment selection is behaving without changing the scheduler itself. The snapshot reports masked script IDs, locally observed rolling quota usage, the configured local quota threshold, saturation state, active cooldown seconds, cooldown reason, and timeout strike count. Cooldown reasons are tracked alongside the existing blacklist timestamps and are pruned whenever expired blacklist entries are removed. Surface the snapshot in the desktop UI as a collapsible Script health table, clear stale rows when the proxy stops or exits, and document that these values are local client observations rather than authoritative Google-side quota counters. Add focused unit coverage for quota saturation, cooldown reason exposure, timeout strike visibility, and compact duration formatting. The relay routing, quarantine durations, and selection behavior remain unchanged.

CaptainMirage · 2026-05-24T18:48:36Z

one thing i would recommend, try to first do the task you wanna do, push the branch and then make the PR, generally makes it ezier to merge and it can cause problems in more active repos, currently the owner which is the only maintainer is sleeping for a few days so its fine in this case, just telling for later on

specially the push thing, if you make a mistake or want to merge some commits with one another, its alot ezier to fix those problems if you havent pushed the branch yet, if you do push it the history stays and it can make the branch commits a bit messy, a fully local branch is alot eizer to change around and fix than one that is already pushed, just so you know :)

github-actions Bot added the type: feature feat: PR — auto-applied by release-drafter label May 24, 2026

maybeknott added 3 commits May 24, 2026 20:59

maybeknott changed the title ~~feat(tunnel): prioritize interactive mux operations~~ feat(relay): prioritize mux dispatch and expose script health May 24, 2026

This was referenced May 24, 2026

feat(relay): steer deployments with rolling quota ledger #1367

Closed

fix(relay): classify script quarantine by failure type #1368

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(relay): prioritize mux dispatch and expose script health#1388

feat(relay): prioritize mux dispatch and expose script health#1388
maybeknott wants to merge 4 commits into
therealaleph:mainfrom
maybeknott:feat/tunnel-request-priority

maybeknott commented May 24, 2026 •

edited

Loading

Uh oh!

CaptainMirage commented May 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maybeknott commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CaptainMirage commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maybeknott commented May 24, 2026 •

edited

Loading

CaptainMirage commented May 24, 2026 •

edited

Loading