Skip to content

feat(conformance) add EPP unavailable fail-open test #999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

zetxqx
Copy link
Contributor

@zetxqx zetxqx commented Jun 17, 2025

This pull request introduces a new conformance test, epp_unavailable_fail_open, to validate the behavior of the extensionRef.failureMode==FailOpen setting.

  • A new conformance test, epp_unavailable_fail_open, has been added.
  • The test simulates an EPP becoming unavailable by deleting the EPP deployment and verifies that the gateway fails open and routes traffic to the backend service.
  • This PR is based on the work done in PR feat(conformance): Add EPP conformance test for Gateway routing #961, which added the initial EPP conformance test for Gateway routing.

Verification

gke-l7

go test -v ./conformance -args -debug -gateway-class gke-l7-regional-external-managed -cleanup-base-resources=false -allow-crds-mismatch=true -run-test EppUnAvailableFailOpen
=== NAME  TestConformance/EppUnAvailableFailOpen
    apply.go:283: 2025-06-17T01:19:39.311334464Z: Deleting epp-to-inference-model-reader RoleBinding
    apply.go:283: 2025-06-17T01:19:39.403450067Z: Deleting inference-model-reader Role
    apply.go:283: 2025-06-17T01:19:39.482466834Z: Deleting infra-backend-epp Deployment
    apply.go:283: 2025-06-17T01:19:39.494211326Z: Deleting infra-backend-endpoint-picker Service
    apply.go:283: 2025-06-17T01:19:39.61129453Z: Deleting httproute-for-primary-gw HTTPRoute
    apply.go:283: 2025-06-17T01:19:39.679690563Z: Deleting normal-gateway-pool InferencePool
    apply.go:283: 2025-06-17T01:19:39.746823836Z: Deleting conformance-fake-model-server InferenceModel
    apply.go:283: 2025-06-17T01:19:39.823244608Z: Deleting infra-backend-deployment Deployment
=== RUN   TestConformance/GatewayFollowingEPPRouting
    conformance.go:68: Skipping GatewayFollowingEPPRouting: test explicitly skipped
=== RUN   TestConformance/HTTPRouteInvalidInferencePoolRef
    conformance.go:68: Skipping HTTPRouteInvalidInferencePoolRef: test explicitly skipped
=== RUN   TestConformance/InferencePoolAccepted
    conformance.go:68: Skipping InferencePoolAccepted: test explicitly skipped
=== RUN   TestConformance/InferencePoolResolvedRefsCondition
    conformance.go:68: Skipping InferencePoolResolvedRefsCondition: test explicitly skipped
--- PASS: TestConformance (71.37s)
    --- PASS: TestConformance/EppUnAvailableFailOpen (68.24s)
        --- PASS: TestConformance/EppUnAvailableFailOpen/Phase_1:_Verify_baseline_connectivity_with_EPP_available (48.22s)
        --- PASS: TestConformance/EppUnAvailableFailOpen/Phase_2:_Verify_fail-open_behavior_after_EPP_becomes_unavailable (0.16s)
    --- SKIP: TestConformance/GatewayFollowingEPPRouting (0.00s)
    --- SKIP: TestConformance/HTTPRouteInvalidInferencePoolRef (0.00s)
    --- SKIP: TestConformance/InferencePoolAccepted (0.00s)
    --- SKIP: TestConformance/InferencePoolResolvedRefsCondition (0.00s)
PASS
ok  	sigs.k8s.io/gateway-api-inference-extension/conformance	71.575s

istio

go test -v ./conformance -args -debug -gateway-class istio -cleanup-base-resources=false -allow-crds-mismatch=true -run-test EppUnAvailableFailOpen
=== RUN   TestConformance/InferencePoolResolvedRefsCondition
    conformance.go:68: Skipping InferencePoolResolvedRefsCondition: test explicitly skipped
--- PASS: TestConformance (11.52s)
    --- PASS: TestConformance/EppUnAvailableFailOpen (8.80s)
        --- PASS: TestConformance/EppUnAvailableFailOpen/Phase_1:_Verify_baseline_connectivity_with_EPP_available (7.06s)
        --- PASS: TestConformance/EppUnAvailableFailOpen/Phase_2:_Verify_fail-open_behavior_after_EPP_becomes_unavailable (0.09s)
    --- SKIP: TestConformance/GatewayFollowingEPPRouting (0.00s)
    --- SKIP: TestConformance/HTTPRouteInvalidInferencePoolRef (0.00s)
    --- SKIP: TestConformance/InferencePoolAccepted (0.00s)
    --- SKIP: TestConformance/InferencePoolResolvedRefsCondition (0.00s)
PASS
ok  	sigs.k8s.io/gateway-api-inference-extension/conformance	11.715s

Copy link

netlify bot commented Jun 17, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 32627c8
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6850bee445ccc200083ddbce
😎 Deploy Preview https://deploy-preview-999--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zetxqx
Once this PR has been reviewed and has the lgtm label, please assign danehans for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 17, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @zetxqx. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 17, 2025
@zetxqx zetxqx changed the title feat(conformance) add epp_unavailable_fail_open test to test the extensionRef.failureMode==FailOpen behavior feat(conformance) add EPP unavailable fail-open test Jun 17, 2025
@zetxqx zetxqx mentioned this pull request Jun 17, 2025
12 tasks
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zetxqx! A few small comments, otherwise LGTM.

type: PathPrefix
value: /primary-gateway-test
---
# --- Conformance EPP service Definition ---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would really like to see everything below this in base manifests + reused to simplify test cases.

Comment on lines +51 to +62
# --- InferenceModel Definition ---
# Service for the infra-backend-deployment.
apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: conformance-fake-model-server
namespace: gateway-conformance-app-backend
spec:
modelName: conformance-fake-model
criticality: Critical # Mark it as critical to bypass the saturation check since the model server is fake and don't have such metrics.
poolRef:
name: normal-gateway-pool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend removing - follow up issue is fine since I know it requires corresponding EPP changes

Headers: map[string]string{eppSelectionHeaderName: targetPodIP},
Method: http.MethodPost,
Body: requestBody,
Backend: appPodBackendPrefix,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the first test don't we need to ensure that the Gateway routed to a specific Pod, while in the second test any Pod with this prefix is sufficient?

fieldRef:
fieldPath: status.podIP
---
# --- InferenceModel Definition ---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as above, would really like to minimize repetition here.

extensionRef:
name: infra-backend-endpoint-picker
---
# --- HTTPRoute for Primary Gateway (conformance-gateway) ---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that the only variation we need for most tests is on HTTPRoute, likely worth exploring how much we can share across tests. (Can be follow up issue)


// MakeRequestAndExpectEventuallyConsistentResponse makes a request using the parameters
// from the Request struct and waits for the response to consistently match the expectations.
func MakeRequestAndExpectEventuallyConsistentResponse(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure you're converging with @SinaChavoshi on these functions across PRs.

@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants