Gate policy phases by Mode() and add tests#1686
Conversation
Make Mode() the authoritative source for whether a policy participates in each phase and perform mode-first gating at execution time. The chain executor now skips header and streaming callbacks unless the policy's Mode indicates processing/streaming. Chain construction (kernel, xds, handler) computes phase requirements from Mode() and uses type assertions only to validate streaming/method support, emitting warnings when a policy declares STREAM/PROCESS but doesn't implement the corresponding interface. Updated registry comments and SDK interface docs to reflect the Mode-first model. Added comprehensive tests that verify header and streaming gating and chain-build behavior, and added structured slog warnings where mismatches are detected.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
SummaryThis pull request refactors the policy engine to establish Key ChangesExecution Gating by Mode Declaration
Chain Construction Refactoring
Documentation and Intent Clarity
Test Coverage
ImpactThese changes unify policy routing logic around mode declarations, reducing complexity from the previous hybrid approach of mode checks and type assertions. The stricter routing prevents misrouted policy executions while maintaining validation of declared capabilities. WalkthroughChain building and execution now use each policy’s Mode() to decide phase participation; executor skips policies whose declared header/body/streaming modes don’t match the current phase. Kernel, registry, xDS, executor, tests, and docs updated; mismatches emit warning logs. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Kernel
participant Registry
participant Executor
participant Policy
Client->>Kernel: Incoming request
Kernel->>Registry: BuildPolicyChain(policies)
Registry-->>Kernel: PolicyChain (flags derived from Mode())
Kernel->>Executor: Execute phase with PolicyChain
loop per policy
Executor->>Policy: Read Mode()
alt Mode==PROCESS / BodyMode==STREAM matches phase
Executor->>Policy: Invoke phase handler (OnRequestHeaders / OnRequestBodyChunk / ...)
Policy-->>Executor: Handler result
else Mode does not match
Executor-->>Policy: Skip handler (no call)
end
end
Executor-->>Kernel: Aggregated execution results
Kernel-->>Client: Response / actions
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
gateway/gateway-runtime/policy-engine/internal/executor/chain.go (2)
255-259:⚠️ Potential issue | 🟠 MajorRequire an explicit body mode before running
OnRequestBody.This still executes buffered request callbacks for policies whose
Mode().RequestBodyModeis the zero value (""). If another policy makes the chain buffered, those policies will run even though they never opted into request-body participation, which breaks the new “Mode()is authoritative” contract.Proposed fix
- // Skip if the policy's mode says to skip request body processing - if pol.Mode().RequestBodyMode == policy.BodyModeSkip { + // Mode()-first: only execute buffered body callbacks for explicit body participants. + switch pol.Mode().RequestBodyMode { + case policy.BodyModeBuffer, policy.BodyModeStream: + // allowed + default: span.End() continue }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/gateway-runtime/policy-engine/internal/executor/chain.go` around lines 255 - 259, The code currently treats the zero-value RequestBodyMode as participating; change the guard so OnRequestBody is only invoked when the policy explicitly opted in (i.e., RequestBodyMode == policy.BodyModeBuffered). In the loop in chain.go where you check pol.Mode().RequestBodyMode and currently only compare to policy.BodyModeSkip, update the condition to first reject empty/zero values and only proceed when RequestBodyMode == policy.BodyModeBuffered (keep the existing BodyModeSkip check as explicit skip). This ensures policies with "" do not run their buffered request callbacks.
572-576:⚠️ Potential issue | 🟠 MajorRequire an explicit body mode before running
OnResponseBody.The response path has the same gap: a policy with
ResponseBodyMode == ""still executesOnResponseBodywhenever the chain is buffered for some other policy. That reintroduces interface-based participation on the buffered path and undermines the mode-first gating this PR is trying to enforce.Proposed fix
- // Skip if the policy's mode says to skip response body processing - if pol.Mode().ResponseBodyMode == policy.BodyModeSkip { + // Mode()-first: only execute buffered body callbacks for explicit body participants. + switch pol.Mode().ResponseBodyMode { + case policy.BodyModeBuffer, policy.BodyModeStream: + // allowed + default: span.End() continue }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/gateway-runtime/policy-engine/internal/executor/chain.go` around lines 572 - 576, The current response body handling still calls OnResponseBody for policies where pol.Mode().ResponseBodyMode is empty; modify the gating in chain.go so OnResponseBody is only invoked when the policy explicitly opts-in: change the existing check that skips when ResponseBodyMode == policy.BodyModeSkip to instead skip unless pol.Mode().ResponseBodyMode is non-empty and not policy.BodyModeSkip (i.e., require pol.Mode().ResponseBodyMode != "" && pol.Mode().ResponseBodyMode != policy.BodyModeSkip before calling OnResponseBody), ensuring you preserve existing span.End()/continue behavior for skipped policies.
🧹 Nitpick comments (1)
gateway/gateway-runtime/policy-engine/internal/xdsclient/handler.go (1)
352-480: Extract the shared chain-builder logic before these copies drift again.This method is now a near-verbatim copy of the mode/interface reconciliation in
gateway/gateway-runtime/policy-engine/internal/kernel/xds.goandgateway/gateway-runtime/policy-engine/internal/kernel/body_mode.go. Since the PR is changing those rules right now, the next tweak can easily leave file-loaded, kernel-built, and xDS-loaded chains with different behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/gateway-runtime/policy-engine/internal/xdsclient/handler.go` around lines 352 - 480, This is a duplication of the policy mode/interface reconciliation logic: extract the shared chain-builder code used by ResourceHandler.buildPolicyChain into a single reusable function (e.g., BuildPolicyChain or NewPolicyChain) and call it from both ResourceHandler.buildPolicyChain and the kernel side (previously in kernel.ConfigLoader.buildPolicyChain / kernel/xds.go / kernel/body_mode.go); move the mode/implementation checks (RequestBodyMode/ResponseBodyMode/HeaderMode and StreamingRequestPolicy/StreamingResponsePolicy/RequestHeaderPolicy/ResponseHeaderPolicy type assertions) into that new function returning a registry.PolicyChain, update both callers to pass the policy config + metadata + registry instance, and add a unit test to ensure behavior parity so future changes affect both code paths consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@sdk/core/policy/v1alpha2/interface.go`:
- Around line 3-6: Update the doc comments for RequestPolicy and ResponsePolicy
to reflect the new Mode() contract: state that Mode() is the authoritative
source that declares phase participation and that implementing RequestPolicy or
ResponsePolicy only supplies the corresponding method implementations
(body-handling capability) for phases selected by Mode(), rather than implying
these interfaces alone enable participation; reference the Mode() method and the
RequestPolicy/ResponsePolicy interface names in the updated comments so readers
clearly understand the separation of participation (Mode()) and available
handlers (the interfaces).
---
Outside diff comments:
In `@gateway/gateway-runtime/policy-engine/internal/executor/chain.go`:
- Around line 255-259: The code currently treats the zero-value RequestBodyMode
as participating; change the guard so OnRequestBody is only invoked when the
policy explicitly opted in (i.e., RequestBodyMode == policy.BodyModeBuffered).
In the loop in chain.go where you check pol.Mode().RequestBodyMode and currently
only compare to policy.BodyModeSkip, update the condition to first reject
empty/zero values and only proceed when RequestBodyMode ==
policy.BodyModeBuffered (keep the existing BodyModeSkip check as explicit skip).
This ensures policies with "" do not run their buffered request callbacks.
- Around line 572-576: The current response body handling still calls
OnResponseBody for policies where pol.Mode().ResponseBodyMode is empty; modify
the gating in chain.go so OnResponseBody is only invoked when the policy
explicitly opts-in: change the existing check that skips when ResponseBodyMode
== policy.BodyModeSkip to instead skip unless pol.Mode().ResponseBodyMode is
non-empty and not policy.BodyModeSkip (i.e., require pol.Mode().ResponseBodyMode
!= "" && pol.Mode().ResponseBodyMode != policy.BodyModeSkip before calling
OnResponseBody), ensuring you preserve existing span.End()/continue behavior for
skipped policies.
---
Nitpick comments:
In `@gateway/gateway-runtime/policy-engine/internal/xdsclient/handler.go`:
- Around line 352-480: This is a duplication of the policy mode/interface
reconciliation logic: extract the shared chain-builder code used by
ResourceHandler.buildPolicyChain into a single reusable function (e.g.,
BuildPolicyChain or NewPolicyChain) and call it from both
ResourceHandler.buildPolicyChain and the kernel side (previously in
kernel.ConfigLoader.buildPolicyChain / kernel/xds.go / kernel/body_mode.go);
move the mode/implementation checks (RequestBodyMode/ResponseBodyMode/HeaderMode
and
StreamingRequestPolicy/StreamingResponsePolicy/RequestHeaderPolicy/ResponseHeaderPolicy
type assertions) into that new function returning a registry.PolicyChain, update
both callers to pass the policy config + metadata + registry instance, and add a
unit test to ensure behavior parity so future changes affect both code paths
consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 400b5452-cd6c-4a86-8240-4ffa2d463b4b
📒 Files selected for processing (8)
gateway/gateway-runtime/policy-engine/internal/executor/chain.gogateway/gateway-runtime/policy-engine/internal/executor/chain_test.gogateway/gateway-runtime/policy-engine/internal/kernel/body_mode.gogateway/gateway-runtime/policy-engine/internal/kernel/body_mode_test.gogateway/gateway-runtime/policy-engine/internal/kernel/xds.gogateway/gateway-runtime/policy-engine/internal/registry/chain.gogateway/gateway-runtime/policy-engine/internal/xdsclient/handler.gosdk/core/policy/v1alpha2/interface.go
| // Policies declare phase participation via Mode(), which is the authoritative | ||
| // source for all six processing phases. Phase-specific sub-interfaces provide | ||
| // the method implementations; the kernel uses type assertions only for method | ||
| // dispatch and streaming capability validation at chain-build time. |
There was a problem hiding this comment.
Update the per-interface comments to match the new Mode() contract.
The package header now says Mode() is authoritative, but the RequestPolicy and ResponsePolicy comments below still imply that implementing those interfaces alone enables body participation. That contradiction will mislead policy authors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk/core/policy/v1alpha2/interface.go` around lines 3 - 6, Update the doc
comments for RequestPolicy and ResponsePolicy to reflect the new Mode()
contract: state that Mode() is the authoritative source that declares phase
participation and that implementing RequestPolicy or ResponsePolicy only
supplies the corresponding method implementations (body-handling capability) for
phases selected by Mode(), rather than implying these interfaces alone enable
participation; reference the Mode() method and the RequestPolicy/ResponsePolicy
interface names in the updated comments so readers clearly understand the
separation of participation (Mode()) and available handlers (the interfaces).
Insert 'And I wait for 2 seconds' after HTTP 201 responses in gateway/it/features/llm-cost-based-ratelimit.feature across multiple scenarios. This short fixed delay helps reduce test flakiness by allowing the backend to stabilize before the subsequent policy snapshot sync steps.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@gateway/it/features/llm-cost-based-ratelimit.feature`:
- Line 81: Replace brittle fixed sleeps in the feature scenarios with a
version-aware synchronization: update the feature file steps that say "And I
wait for 2 seconds" to instead call the existing waitForPolicySnapshotSync step,
and harden the waitForPolicySnapshotSync implementation to wait until the
controller/runtime report the expected policy version has BOTH changed from the
previous observed version and equals the new target version; specifically modify
the waitForPolicySnapshotSync function to first read the current version, poll
until it observes a different version, then continue polling until the reported
version equals the expected version (with the existing timeout/backoff), and
remove the explicit "And I wait for 2 seconds" lines from scenarios so they rely
on the strengthened waitForPolicySnapshotSync behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 15c30454-228b-4ae0-9538-1ecab3a2a37c
📒 Files selected for processing (1)
gateway/it/features/llm-cost-based-ratelimit.feature
Remove redundant "And I wait for 2 seconds" steps from llm-cost-based-ratelimit.feature to eliminate fixed sleeps and speed up the integration tests. Refactor waitForPolicySnapshotSync in steps_health.go to first detect a new controller policy version, then wait for the policy-engine to reach that target version; improve error handling, remove unused variables, and reduce flakiness by controlling sleeps between attempts.
Insert 2s waits in the LLM cost-based rate limit feature steps after policy creation to give the system time to propagate snapshots. Refactor waitForPolicySnapshotSync to poll controller and policy-engine versions concurrently and succeed when they match, returning a detailed error with the last observed versions and errors if sync fails. This removes the previous two-stage initial/target logic and improves robustness and error reporting for flaky syncs.
Co-authored-by: Renuka Piyumal Fernando <renukapiyumal@gmail.com>
Related Discussion
Purpose
The primary motivation for this change is the upcoming Python Policy Runtime. To support Python policies, the engine requires a single unified Go
PythonBridgestruct that implements all six policy interfaces. This bridge will dynamically delegate execution to the Python environment based on the specific policy's configuration.Previously, the policy engine relied on an inconsistent mix of
Mode()declarations and Go type assertions to determine phase participation. Because the unifiedPythonBridgeimplements every interface, the engine's legacy type-assertion gating was incorrectly routing buffered Python policies into the streaming executors.To fix this, we are refactoring the engine to establish
Mode()as the strict, sole source of truth for phase participation, explicitly removing arbitrary type assertion fallbacks.Goals
Establish
Mode()as the single, authoritative source of truth for all policy chain building and runtime execution gating. This ensures safe dynamic multi-interface dispatch and resolves the bug where streaming executors dispatched non-streaming traffic simply because a policy returned!= BodyModeSkip.Approach
kernel/body_mode.go,kernel/xds.go, andxdsclient/handler.goto dynamically compute boolean flags (e.g.,RequiresRequestBody,RequiresRequestHeader) explicitly fromMode()configuration prior to interface validation.ChainExecutorinexecutor/chain.goto explicitly gate header executions againstpol.Mode().RequestHeaderMode == HeaderModeProcess.ChainExecutorto strictly gate on== BodyModeStreamrather than the legacy!= BodyModeSkip, safely enabling the Python bridge to operate under buffered logic execution without accidental routing into streaming paths.registry/chain.goaccurately separating Envoy lifecycle obligations from phase execution intent.User stories
As a gateway developer, I need the policy engine to cleanly decouple phase participation from interface existence, allowing dynamic environments (like the new Python bridge) to execute policy callbacks precisely according to their explicit
Mode()without impacting or hijacking streaming traffic.Documentation
N/A. Engine-internal architectural changes; SDK structural impacts and execution intent nuances were documented directly via inline codebase comments (
registry/chain.goandinterface.go).Automation tests
Security checks
Samples
N/A
Related PRs
Test environment