Skip to content

Gate policy phases by Mode() and add tests#1686

Merged
renuka-fernando merged 5 commits intowso2:mainfrom
sehan-dissanayake:modeCheck
Apr 16, 2026
Merged

Gate policy phases by Mode() and add tests#1686
renuka-fernando merged 5 commits intowso2:mainfrom
sehan-dissanayake:modeCheck

Conversation

@sehan-dissanayake
Copy link
Copy Markdown
Contributor

@sehan-dissanayake sehan-dissanayake commented Apr 10, 2026

Related Discussion

Purpose

The primary motivation for this change is the upcoming Python Policy Runtime. To support Python policies, the engine requires a single unified Go PythonBridge struct that implements all six policy interfaces. This bridge will dynamically delegate execution to the Python environment based on the specific policy's configuration.

Previously, the policy engine relied on an inconsistent mix of Mode() declarations and Go type assertions to determine phase participation. Because the unified PythonBridge implements every interface, the engine's legacy type-assertion gating was incorrectly routing buffered Python policies into the streaming executors.

To fix this, we are refactoring the engine to establish Mode() as the strict, sole source of truth for phase participation, explicitly removing arbitrary type assertion fallbacks.

Goals

Establish Mode() as the single, authoritative source of truth for all policy chain building and runtime execution gating. This ensures safe dynamic multi-interface dispatch and resolves the bug where streaming executors dispatched non-streaming traffic simply because a policy returned != BodyModeSkip.

Approach

  • Chain-Build Refactoring: Updated kernel/body_mode.go, kernel/xds.go, and xdsclient/handler.go to dynamically compute boolean flags (e.g., RequiresRequestBody, RequiresRequestHeader) explicitly from Mode() configuration prior to interface validation.
  • Execution Gating: Refactored ChainExecutor in executor/chain.go to explicitly gate header executions against pol.Mode().RequestHeaderMode == HeaderModeProcess.
  • Tightened Streaming: Modified streaming executor routing in ChainExecutor to strictly gate on == BodyModeStream rather than the legacy != BodyModeSkip, safely enabling the Python bridge to operate under buffered logic execution without accidental routing into streaming paths.
  • Documentation: Modernized struct documentation in registry/chain.go accurately separating Envoy lifecycle obligations from phase execution intent.

User stories

As a gateway developer, I need the policy engine to cleanly decouple phase participation from interface existence, allowing dynamic environments (like the new Python bridge) to execute policy callbacks precisely according to their explicit Mode() without impacting or hijacking streaming traffic.

Documentation

N/A. Engine-internal architectural changes; SDK structural impacts and execution intent nuances were documented directly via inline codebase comments (registry/chain.go and interface.go).

Automation tests

  • Unit tests

    Updated ChainExecutor unit tests verifying streaming callbacks are immediately bypassed for any policy returning BodyModeBuffer, maintaining strict parity with mode configuration.

Security checks

Samples

N/A

Related PRs

Test environment

  • macOS / Linux

Make Mode() the authoritative source for whether a policy participates in each phase and perform mode-first gating at execution time. The chain executor now skips header and streaming callbacks unless the policy's Mode indicates processing/streaming. Chain construction (kernel, xds, handler) computes phase requirements from Mode() and uses type assertions only to validate streaming/method support, emitting warnings when a policy declares STREAM/PROCESS but doesn't implement the corresponding interface. Updated registry comments and SDK interface docs to reflect the Mode-first model. Added comprehensive tests that verify header and streaming gating and chain-build behavior, and added structured slog warnings where mismatches are detected.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 439f4af7-768e-41a0-b1a6-14234772c7b6

📥 Commits

Reviewing files that changed from the base of the PR and between fa882aa and d9d90c4.

📒 Files selected for processing (2)
  • gateway/gateway-runtime/policy-engine/internal/kernel/body_mode.go
  • gateway/gateway-runtime/policy-engine/internal/kernel/xds.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • gateway/gateway-runtime/policy-engine/internal/kernel/xds.go
  • gateway/gateway-runtime/policy-engine/internal/kernel/body_mode.go

Summary

This pull request refactors the policy engine to establish Mode() as the authoritative source for determining which processing phases each policy participates in and whether those phases execute at runtime. The changes improve consistency in policy execution routing and enable support for a unified Python Policy Runtime bridge.

Key Changes

Execution Gating by Mode Declaration

  • Header execution now gates based on explicit mode declarations: RequestHeaderMode and ResponseHeaderMode must be set to PROCESS for header handlers to execute
  • Streaming execution is restricted to policies that declare BodyModeStream, preventing buffered policies from being routed into streaming execution paths
  • These gates apply at the executor level during policy invocation

Chain Construction Refactoring

  • Policy chain building now derives participation flags (RequiresRequestBody, RequiresResponseBody, RequiresRequestHeader, RequiresResponseHeader) primarily from each policy's declared Mode() values
  • Type assertions are retained to validate that policies declaring participation in a phase implement the corresponding interfaces; mismatches generate warnings rather than silently allowing execution
  • The refactoring affects the kernel body mode handler, XDS builder, and XDS client handler

Documentation and Intent Clarity

  • Inline documentation updated to distinguish between Envoy lifecycle requirements (headers always flow for lifecycle reasons) and phase execution intent (which is now declared via Mode())
  • Package documentation clarified to establish that policies declare phase participation through Mode() method, with type assertions used only for method dispatch and capability validation

Test Coverage

  • New unit tests verify mode-first gating behavior for header and streaming execution paths
  • Tests confirm that policies with non-matching modes have their handlers bypassed and do not contribute to result processing
  • Feature test timing adjusted to ensure proper synchronization of policy snapshots

Impact

These changes unify policy routing logic around mode declarations, reducing complexity from the previous hybrid approach of mode checks and type assertions. The stricter routing prevents misrouted policy executions while maintaining validation of declared capabilities.

Walkthrough

Chain building and execution now use each policy’s Mode() to decide phase participation; executor skips policies whose declared header/body/streaming modes don’t match the current phase. Kernel, registry, xDS, executor, tests, and docs updated; mismatches emit warning logs.

Changes

Cohort / File(s) Summary
Executor chain execution
gateway/gateway-runtime/policy-engine/internal/executor/chain.go
Added mode-first gating to header and streaming execution paths: policies are skipped unless their Mode() declares the matching header/body mode; span ending/continuation preserved.
Executor tests
gateway/gateway-runtime/policy-engine/internal/executor/chain_test.go
Added four counting policy test doubles and tests asserting header/body streaming handlers are invoked only when the policy Mode() permits processing; verifies skip behavior and empty results otherwise.
Kernel chain building
gateway/gateway-runtime/policy-engine/internal/kernel/body_mode.go, gateway/gateway-runtime/policy-engine/internal/kernel/xds.go
BuildPolicyChain now derives participation flags from policy.Mode() and validates required interfaces; streaming support gated on BodyMode==STREAM plus streaming interface, non-skip body/header modes without interfaces emit warnings.
Registry docs / semantics
gateway/gateway-runtime/policy-engine/internal/registry/chain.go
Clarified comments: RequiresRequestHeader/RequiresResponseHeader represent mode+interface "callback participation intent" rather than Envoy header transport behavior.
xDS client chain construction
gateway/gateway-runtime/policy-engine/internal/xdsclient/handler.go
Mirrors kernel changes: capability/requirement flags set from declared modes with interface checks and warning logs when modes and implemented interfaces disagree.
Kernel tests
gateway/gateway-runtime/policy-engine/internal/kernel/body_mode_test.go
Added test helper policies and cases validating that chain flags (header/body/streaming) follow declared ProcessingMode and are gated by implemented interfaces.
Policy interface docs
sdk/core/policy/v1alpha2/interface.go
Updated package documentation to state Mode() is authoritative for phase participation; type assertions are used for dispatch and capability validation only.
Integration test timing
gateway/it/features/llm-cost-based-ratelimit.feature
Inserted a 2s wait after provider creation and before policy snapshot sync in scenarios, changing test timing for snapshot propagation.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Kernel
  participant Registry
  participant Executor
  participant Policy

  Client->>Kernel: Incoming request
  Kernel->>Registry: BuildPolicyChain(policies)
  Registry-->>Kernel: PolicyChain (flags derived from Mode())
  Kernel->>Executor: Execute phase with PolicyChain
  loop per policy
    Executor->>Policy: Read Mode()
    alt Mode==PROCESS / BodyMode==STREAM matches phase
      Executor->>Policy: Invoke phase handler (OnRequestHeaders / OnRequestBodyChunk / ...)
      Policy-->>Executor: Handler result
    else Mode does not match
      Executor-->>Policy: Skip handler (no call)
    end
  end
  Executor-->>Kernel: Aggregated execution results
  Kernel-->>Client: Response / actions
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: establishing Mode() as the authoritative source for phase participation and adding corresponding test cases.
Description check ✅ Passed The description includes all major template sections (Purpose, Goals, Approach, User stories, Documentation, Automation tests, Security checks) with substantial content addressing the architectural refactoring and motivation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
gateway/gateway-runtime/policy-engine/internal/executor/chain.go (2)

255-259: ⚠️ Potential issue | 🟠 Major

Require an explicit body mode before running OnRequestBody.

This still executes buffered request callbacks for policies whose Mode().RequestBodyMode is the zero value (""). If another policy makes the chain buffered, those policies will run even though they never opted into request-body participation, which breaks the new “Mode() is authoritative” contract.

Proposed fix
-		// Skip if the policy's mode says to skip request body processing
-		if pol.Mode().RequestBodyMode == policy.BodyModeSkip {
+		// Mode()-first: only execute buffered body callbacks for explicit body participants.
+		switch pol.Mode().RequestBodyMode {
+		case policy.BodyModeBuffer, policy.BodyModeStream:
+			// allowed
+		default:
 			span.End()
 			continue
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gateway/gateway-runtime/policy-engine/internal/executor/chain.go` around
lines 255 - 259, The code currently treats the zero-value RequestBodyMode as
participating; change the guard so OnRequestBody is only invoked when the policy
explicitly opted in (i.e., RequestBodyMode == policy.BodyModeBuffered). In the
loop in chain.go where you check pol.Mode().RequestBodyMode and currently only
compare to policy.BodyModeSkip, update the condition to first reject empty/zero
values and only proceed when RequestBodyMode == policy.BodyModeBuffered (keep
the existing BodyModeSkip check as explicit skip). This ensures policies with ""
do not run their buffered request callbacks.

572-576: ⚠️ Potential issue | 🟠 Major

Require an explicit body mode before running OnResponseBody.

The response path has the same gap: a policy with ResponseBodyMode == "" still executes OnResponseBody whenever the chain is buffered for some other policy. That reintroduces interface-based participation on the buffered path and undermines the mode-first gating this PR is trying to enforce.

Proposed fix
-		// Skip if the policy's mode says to skip response body processing
-		if pol.Mode().ResponseBodyMode == policy.BodyModeSkip {
+		// Mode()-first: only execute buffered body callbacks for explicit body participants.
+		switch pol.Mode().ResponseBodyMode {
+		case policy.BodyModeBuffer, policy.BodyModeStream:
+			// allowed
+		default:
 			span.End()
 			continue
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gateway/gateway-runtime/policy-engine/internal/executor/chain.go` around
lines 572 - 576, The current response body handling still calls OnResponseBody
for policies where pol.Mode().ResponseBodyMode is empty; modify the gating in
chain.go so OnResponseBody is only invoked when the policy explicitly opts-in:
change the existing check that skips when ResponseBodyMode ==
policy.BodyModeSkip to instead skip unless pol.Mode().ResponseBodyMode is
non-empty and not policy.BodyModeSkip (i.e., require pol.Mode().ResponseBodyMode
!= "" && pol.Mode().ResponseBodyMode != policy.BodyModeSkip before calling
OnResponseBody), ensuring you preserve existing span.End()/continue behavior for
skipped policies.
🧹 Nitpick comments (1)
gateway/gateway-runtime/policy-engine/internal/xdsclient/handler.go (1)

352-480: Extract the shared chain-builder logic before these copies drift again.

This method is now a near-verbatim copy of the mode/interface reconciliation in gateway/gateway-runtime/policy-engine/internal/kernel/xds.go and gateway/gateway-runtime/policy-engine/internal/kernel/body_mode.go. Since the PR is changing those rules right now, the next tweak can easily leave file-loaded, kernel-built, and xDS-loaded chains with different behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gateway/gateway-runtime/policy-engine/internal/xdsclient/handler.go` around
lines 352 - 480, This is a duplication of the policy mode/interface
reconciliation logic: extract the shared chain-builder code used by
ResourceHandler.buildPolicyChain into a single reusable function (e.g.,
BuildPolicyChain or NewPolicyChain) and call it from both
ResourceHandler.buildPolicyChain and the kernel side (previously in
kernel.ConfigLoader.buildPolicyChain / kernel/xds.go / kernel/body_mode.go);
move the mode/implementation checks (RequestBodyMode/ResponseBodyMode/HeaderMode
and
StreamingRequestPolicy/StreamingResponsePolicy/RequestHeaderPolicy/ResponseHeaderPolicy
type assertions) into that new function returning a registry.PolicyChain, update
both callers to pass the policy config + metadata + registry instance, and add a
unit test to ensure behavior parity so future changes affect both code paths
consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@sdk/core/policy/v1alpha2/interface.go`:
- Around line 3-6: Update the doc comments for RequestPolicy and ResponsePolicy
to reflect the new Mode() contract: state that Mode() is the authoritative
source that declares phase participation and that implementing RequestPolicy or
ResponsePolicy only supplies the corresponding method implementations
(body-handling capability) for phases selected by Mode(), rather than implying
these interfaces alone enable participation; reference the Mode() method and the
RequestPolicy/ResponsePolicy interface names in the updated comments so readers
clearly understand the separation of participation (Mode()) and available
handlers (the interfaces).

---

Outside diff comments:
In `@gateway/gateway-runtime/policy-engine/internal/executor/chain.go`:
- Around line 255-259: The code currently treats the zero-value RequestBodyMode
as participating; change the guard so OnRequestBody is only invoked when the
policy explicitly opted in (i.e., RequestBodyMode == policy.BodyModeBuffered).
In the loop in chain.go where you check pol.Mode().RequestBodyMode and currently
only compare to policy.BodyModeSkip, update the condition to first reject
empty/zero values and only proceed when RequestBodyMode ==
policy.BodyModeBuffered (keep the existing BodyModeSkip check as explicit skip).
This ensures policies with "" do not run their buffered request callbacks.
- Around line 572-576: The current response body handling still calls
OnResponseBody for policies where pol.Mode().ResponseBodyMode is empty; modify
the gating in chain.go so OnResponseBody is only invoked when the policy
explicitly opts-in: change the existing check that skips when ResponseBodyMode
== policy.BodyModeSkip to instead skip unless pol.Mode().ResponseBodyMode is
non-empty and not policy.BodyModeSkip (i.e., require pol.Mode().ResponseBodyMode
!= "" && pol.Mode().ResponseBodyMode != policy.BodyModeSkip before calling
OnResponseBody), ensuring you preserve existing span.End()/continue behavior for
skipped policies.

---

Nitpick comments:
In `@gateway/gateway-runtime/policy-engine/internal/xdsclient/handler.go`:
- Around line 352-480: This is a duplication of the policy mode/interface
reconciliation logic: extract the shared chain-builder code used by
ResourceHandler.buildPolicyChain into a single reusable function (e.g.,
BuildPolicyChain or NewPolicyChain) and call it from both
ResourceHandler.buildPolicyChain and the kernel side (previously in
kernel.ConfigLoader.buildPolicyChain / kernel/xds.go / kernel/body_mode.go);
move the mode/implementation checks (RequestBodyMode/ResponseBodyMode/HeaderMode
and
StreamingRequestPolicy/StreamingResponsePolicy/RequestHeaderPolicy/ResponseHeaderPolicy
type assertions) into that new function returning a registry.PolicyChain, update
both callers to pass the policy config + metadata + registry instance, and add a
unit test to ensure behavior parity so future changes affect both code paths
consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 400b5452-cd6c-4a86-8240-4ffa2d463b4b

📥 Commits

Reviewing files that changed from the base of the PR and between aa2d35b and dcc3402.

📒 Files selected for processing (8)
  • gateway/gateway-runtime/policy-engine/internal/executor/chain.go
  • gateway/gateway-runtime/policy-engine/internal/executor/chain_test.go
  • gateway/gateway-runtime/policy-engine/internal/kernel/body_mode.go
  • gateway/gateway-runtime/policy-engine/internal/kernel/body_mode_test.go
  • gateway/gateway-runtime/policy-engine/internal/kernel/xds.go
  • gateway/gateway-runtime/policy-engine/internal/registry/chain.go
  • gateway/gateway-runtime/policy-engine/internal/xdsclient/handler.go
  • sdk/core/policy/v1alpha2/interface.go

Comment on lines +3 to +6
// Policies declare phase participation via Mode(), which is the authoritative
// source for all six processing phases. Phase-specific sub-interfaces provide
// the method implementations; the kernel uses type assertions only for method
// dispatch and streaming capability validation at chain-build time.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update the per-interface comments to match the new Mode() contract.

The package header now says Mode() is authoritative, but the RequestPolicy and ResponsePolicy comments below still imply that implementing those interfaces alone enables body participation. That contradiction will mislead policy authors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/core/policy/v1alpha2/interface.go` around lines 3 - 6, Update the doc
comments for RequestPolicy and ResponsePolicy to reflect the new Mode()
contract: state that Mode() is the authoritative source that declares phase
participation and that implementing RequestPolicy or ResponsePolicy only
supplies the corresponding method implementations (body-handling capability) for
phases selected by Mode(), rather than implying these interfaces alone enable
participation; reference the Mode() method and the RequestPolicy/ResponsePolicy
interface names in the updated comments so readers clearly understand the
separation of participation (Mode()) and available handlers (the interfaces).

Insert 'And I wait for 2 seconds' after HTTP 201 responses in gateway/it/features/llm-cost-based-ratelimit.feature across multiple scenarios. This short fixed delay helps reduce test flakiness by allowing the backend to stabilize before the subsequent policy snapshot sync steps.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@gateway/it/features/llm-cost-based-ratelimit.feature`:
- Line 81: Replace brittle fixed sleeps in the feature scenarios with a
version-aware synchronization: update the feature file steps that say "And I
wait for 2 seconds" to instead call the existing waitForPolicySnapshotSync step,
and harden the waitForPolicySnapshotSync implementation to wait until the
controller/runtime report the expected policy version has BOTH changed from the
previous observed version and equals the new target version; specifically modify
the waitForPolicySnapshotSync function to first read the current version, poll
until it observes a different version, then continue polling until the reported
version equals the expected version (with the existing timeout/backoff), and
remove the explicit "And I wait for 2 seconds" lines from scenarios so they rely
on the strengthened waitForPolicySnapshotSync behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 15c30454-228b-4ae0-9538-1ecab3a2a37c

📥 Commits

Reviewing files that changed from the base of the PR and between dcc3402 and fa882aa.

📒 Files selected for processing (1)
  • gateway/it/features/llm-cost-based-ratelimit.feature

Comment thread gateway/it/features/llm-cost-based-ratelimit.feature
Remove redundant "And I wait for 2 seconds" steps from llm-cost-based-ratelimit.feature to eliminate fixed sleeps and speed up the integration tests. Refactor waitForPolicySnapshotSync in steps_health.go to first detect a new controller policy version, then wait for the policy-engine to reach that target version; improve error handling, remove unused variables, and reduce flakiness by controlling sleeps between attempts.
Insert 2s waits in the LLM cost-based rate limit feature steps after policy creation to give the system time to propagate snapshots. Refactor waitForPolicySnapshotSync to poll controller and policy-engine versions concurrently and succeed when they match, returning a detailed error with the last observed versions and errors if sync fails. This removes the previous two-stage initial/target logic and improves robustness and error reporting for flaky syncs.
Comment thread gateway/gateway-runtime/policy-engine/internal/kernel/body_mode.go Outdated
Comment thread gateway/gateway-runtime/policy-engine/internal/kernel/xds.go Outdated
Co-authored-by: Renuka Piyumal Fernando <renukapiyumal@gmail.com>
@renuka-fernando renuka-fernando merged commit 7c1cf54 into wso2:main Apr 16, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants