Skip to content

feat(gatewayapi): mirror tigera-ca-bundle into each Gateway namespace#4822

Closed
electricjesus wants to merge 9 commits into
tigera:masterfrom
electricjesus:seth/ca-bundle-per-ns
Closed

feat(gatewayapi): mirror tigera-ca-bundle into each Gateway namespace#4822
electricjesus wants to merge 9 commits into
tigera:masterfrom
electricjesus:seth/ca-bundle-per-ns

Conversation

@electricjesus
Copy link
Copy Markdown
Member

⏸️ Hold review — pending merge of #4690 ("Gatewayapi Namespaced Mode" — Walter Neto). This PR is stacked on radixo:gatewayapi-deployment-enterprise, so the diff currently includes Walter's commits. Once #4690 merges, this diff collapses to just the 13-line ca-bundle copy + test (~30 lines of pkg/render/gatewayapi/).

Summary

Under deploy.type=GatewayNamespace (#4690), envoy-proxy pods land in each Gateway's own namespace and mount the operator trust bundle at /etc/pki/tls/certs (added by #4796). The mount references a ConfigMap in the proxy pod's own namespace, but #4796 only writes the ConfigMap into calico-system (the controller's namespace), so the proxy Pod stops at Init:0/2 with:

Warning  FailedMount  MountVolume.SetUp failed for volume
                      "tigera-ca-bundle": configmap not found

This PR mirrors the trust bundle into each Gateway namespace alongside the existing per-namespace propagation of tigera-pull-secret and the waf-http-filter SA / RoleBindings. Reuses the existing reserved-NS guard and follows the same delete-before-RoleBinding ordering as the pull-secret cleanup.

Changes

  • pkg/render/gatewayapi/gateway_api.go: in both the create and delete per-NS loops, append pr.cfg.TrustedBundle.ConfigMap(ns) alongside the existing pull-secret propagation. Gated by the existing !isReservedOperatorNamespace(ns) + a nil-check on TrustedBundle.
  • pkg/render/gatewayapi/gateway_api_test.go: positive test (should copy the trust bundle ConfigMap into each Gateway namespace) covering the create path with two Gateway namespaces.

Verification

End-to-end reproducer + fix verified live on seth-ez-a3b5:

  1. Baseline (operator built off radixo:gatewayapi-deployment-enterprise HEAD): fresh Gateway namespace → proxy pod stuck Init:0/2 with FailedMount tigera-ca-bundle not found. Manually cloning the CM unblocks.
  2. Patch applied + image redeployed: fresh Gateway namespace → tigera-ca-bundle ConfigMap auto-created in NS, proxy pod reaches 4/4 Running, Gateway Accepted=True. No manual cloning.

Brief with full reproducer + observed-vs-expected table: tigera/gateway-extensions-controller/docs/planning/briefs/2026-05-19-ca-bundle-propagation-brief.md.

Test plan

  • go test ./pkg/render/gatewayapi/... -count=1 — all green
  • go build ./... — clean
  • E2E on seth-ez-a3b5 — see Verification above

Release Note

```release-note
Operator now mirrors the trusted CA bundle ConfigMap into every Gateway-hosting namespace under namespaced-mode (`deploy.type=GatewayNamespace`), so envoy-proxy pods in user namespaces can mount the bundle and successfully validate TLS to public upstreams (wasm OCI registries, OIDC providers).
```

Linked

radixo and others added 9 commits May 7, 2026 17:31
- Swap the checked-in gateway_api_resources.yaml for the embedded gateway-helm.tgz rendered via the helm SDK at startup; K8SGatewayAPICRDs/GatewayAPICRDs now take a runtime.Scheme and return an error (istio_controller updated for the new signature)
- Deploy two envoy-gateway controllers: legacy in tigera-gateway (user-declared classes via Spec.GatewayClasses) and a new one in calico-system with deploy.type=GatewayNamespace; auto-provision the tigera-gateway-class-ns GatewayClass bound to the new controller
- Group the tigera-gateway install behind legacyObjects/legacyTeardownObjects so the eventual deprecation is a single delete
- HasLegacyGateways classifier in the controller: build a className -> controllerName map seeded from Spec.GatewayClasses + existing GatewayClass resources, classify every live Gateway; when no Gateway targets the tigera-gateway controller, the install is torn down; during the teardown-then-redeploy race the legacy render is deferred to avoid a "Namespace is terminating, skipping creation" log flood
- Legacy teardown queues only the Namespace + cluster-scoped objects + the Deployment (for status.RemoveDeployments); in-namespace RBAC/Secrets ride the cascade to avoid the tigera-operator-secrets RoleBinding race
- Move the shared waf-http-filter ClusterRoles out of the legacy bundle so the calico-system-side proxies keep their cluster-scoped perms after tigera-gateway is retired
- Per-namespace Enterprise resources (SA, RoleBindings, pull secret, shared CRB subject) for namespaces hosting a namespaced-class Gateway; reserved namespaces skip shared resource create/delete; Secret goes before RoleBinding on cleanup to avoid 403
- Gate v3 NetworkPolicies on the calico-system Tier; render calico-system.envoy-gateway allow for the controller and certgen
- Update unit tests and Makefile/docs accordingly

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Cover the calico-system envoy-gateway controller lifecycle, per-namespace resource provisioning and cleanup, custom EnvoyProxy and EnvoyGateway ConfigMap watches, owning-gateway env vars in l7-log-collector, and the legacy-class teardown path
- Teardown sequencing for tigera-gateway cascading

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lico-system

- Render one envoy-gateway controller in calico-system with deploy.type=GatewayNamespace
- Auto-provision tigera-gateway-class; honour user overrides if redeclared in Spec.GatewayClasses
- Enumerate every operator-owned object from the legacy tigera-gateway install for cleanup (pull Secrets before tigera-operator-secrets); keep the Namespace itself in case users placed their own resources there
- Point GatewayAPI finalizer at the calico-system envoy-gateway Deployment
- Drop dual-controller fixtures and the legacy-undeploy test; consolidate FV tests to the calico-system layout
Upstream envoy-gateway rejects the combination of mergeGateways: true
and GatewayNamespaceMode, so any user-supplied EnvoyProxy with merging
enabled would cause its referenced Gateways to silently stop being
programmed after the switch to GatewayNamespace
(https://gateway.envoyproxy.io/docs/tasks/operations/gateway-namespace-mode/).

In the GatewayAPI reconciler, when a Spec.GatewayClasses[].EnvoyProxyRef
points at an EnvoyProxy with Spec.MergeGateways == true, force the
field to false in our managed copy and log a warning naming the
EnvoyProxy and GatewayClass. The user's source CR is not mutated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- remove controllerName param (never set by callers)
- inline ReleaseName and GatewayNamespace deploy type
- add DeploymentNamespace constant for the install namespace
- drop now-unused helmGateway type
- parseManifest now errors on kinds it doesn't recognize so a chart
  bump that emits a new kind trips the existing render tests
Under deploy.type=GatewayNamespace (tigera#4690), envoy-proxy pods land in the
Gateway's own namespace and mount the operator trust bundle at
/etc/pki/tls/certs (added by tigera#4796).  The mount references a ConfigMap
in the proxy pod's own namespace, but tigera#4796 only writes the ConfigMap
into calico-system (the controller's namespace), so the proxy Pod stops
at Init:0/2 with:

  Warning  FailedMount  MountVolume.SetUp failed for volume
                        "tigera-ca-bundle": configmap not found

Mirror the trust bundle into each Gateway namespace alongside the
existing per-namespace propagation of tigera-pull-secret and the
waf-http-filter SA / RoleBindings.  Reuses the existing reserved-NS
guard and follows the same delete-before-RoleBinding ordering as the
pull-secret cleanup.

Reproduced live on seth-ez-a3b5 2026-05-19 with operator
walter-merge-2026-05-18 (has both tigera#4690 and tigera#4796): fresh Gateway
namespace -> everything else propagates but tigera-ca-bundle does not,
proxy Pod stuck Init:0/2.

Brief:
  tigera/gateway-extensions-controller/docs/planning/briefs/2026-05-19-ca-bundle-propagation-brief.md
Walter-supplied positive test: configure two Gateway namespaces
("default" and "app-ns") with a TrustedBundle, render, assert the
trust bundle ConfigMap (TrustedCertConfigMapName) lands in each
Gateway namespace.

Companion to the per-NS ConfigMap copy added in the previous commit.
@electricjesus
Copy link
Copy Markdown
Member Author

Closing — Walter asked us to push directly to #4690 instead of stacking a follow-up. The two commits (mirror trust bundle into per-NS loop + the positive test Walter supplied) are now on radixo:gatewayapi-deployment-enterprise at SHAs 2593ce036 and 9b8e574ee. Verified end-to-end on seth-ez-a3b5 before the push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hold merge Do not merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants