ZEP-0033: registry proxy by AustinAbro321 · Pull Request #37 · zarf-dev/proposals

AustinAbro321 · 2025-07-07T15:00:47Z

One-line PR description: This is building off of ZEP-0032: support ipv6 #31 to make a broader proposal

Issue link: Add proxy method to accessing the Zarf registry #33 / Initial Zarf IPv6 support #32

Signed-off-by: Austin Abro <AustinAbro321@gmail.com>

Signed-off-by: Austin Abro <austinabro321@gmail.com>

soltysh · 2025-07-22T14:15:50Z

+
+The proxy and the registry will connect over mTLS. Zarf will create a certificate authority along with a client and server certificate using the authority. These certificates will be automatically rotated during `zarf init` if they have less than half of their total duration remaining. Users will be able to specify their own certificates through flags on `zarf init`: `--registry-server-cert-file`, `--registry-server-key-file`, `--registry-client-key-file`, and `--registry-client-cert-file`. 
+
+A controller will run in the cluster that watches when the proxy daemonset fails to pull an image and spins up the injector when this happens. This is important because when a new node is added to the cluster the injector will not be present for the daemonset to bootstrap itself, and the proxy image will not be cached on the node.


A different idea: how about using initcontainer in the daemonset to handle that injector? This way you're not required to use any controller, and it will automatically work for every node without extra work.

Good idea, this would mean that the injector would now be long lived rather than short lived, but we'd need a long lived controller anyway. I'll have to make sure I'm not missing anything, but that does seem like a cleaner approach.

I think the problem with an init containers is that the regular container for the proxy won't start until after the init container is complete, and we'll have no way to trigger the pulling of that image or to know when it's done. As you pointed out in a synchronous discussion, we could likely use a sidecar instead of a DaemonSet. I'll have to test that out, but that'd likely be better.

It looks like sidecars will be most likely the best solution, at least for now.

For whatever reason, I could not get sidecar containers to work, so I have it as it's own daemonset currently.

Co-authored-by: Maciej Szulik <maciej@defenseunicorns.com> Signed-off-by: Austin Abro <37223396+AustinAbro321@users.noreply.github.com>

Signed-off-by: Austin Abro <austinabro321@gmail.com>

…into host-network-support

Signed-off-by: Austin Abro <austinabro321@gmail.com>

mkcp · 2025-08-01T16:08:31Z

+The current NodePort service solution does not support IPv6 as IPv6 does not enable route_localnet which is required to call NodePort services using [::1] ([#90236](https://github.com/kubernetes/kubernetes/issues/90236#issuecomment-624721859)). There is a mandate ([wayback machine link because white house site is flaky ATM](https://web.archive.org/web/20250116092323/https://www.whitehouse.gov/wp-content/uploads/2020/11/M-21-07.pdf)) for government agencies to migrate to IPv6 single stack by end of fiscal year (FY) 2025. Given how often Zarf is used in government environments it's important IPv6 is enabled.
+A similar issue also makes the Zarf registry unusable for distros such as OpenShift which blocks rewriting traffic to localhost. In both of these situations, hostPort will not work, however hostNetwork will. 
+
+The registry proxy solution comes with security advantages. The registry will only be accessible from within the cluster. This is an advantage over the current solution since NodePort services default to being accessible externally to anyone who can connect to a node. Additionally, we will force the registry to connect to the proxy and Zarf CLI with mTLS. With this approach, the only unencrypted traffic during a kubelet call occurs between the kubelet and proxy, ensuring this traffic never leaves the host. The Zarf CLI will connect directly to the registry over mTLS and Kubernetes port forwards.


With this approach, the only unencrypted traffic during a kubelet call occurs between the kubelet and proxy, ensuring this traffic never leaves the host.

🔑

Signed-off-by: Austin Abro <austinabro321@gmail.com>

mkcp · 2025-08-04T15:05:50Z

+
+As an administrator of a Kubernetes cluster who wants a greater security posture when using the Zarf registry, I run `zarf init --registry-proxy`.
+
+### Risks and Mitigations


Good exploration here

mkcp · 2025-08-04T15:13:29Z

+```
+
+The baseline [pod security standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/) recommends that pods should not set hostPort or HostNetwork. Users with controllers that enforce these standards, such as Kyverno, will need to make an exemption. Additionally, some distros will disable hostPort and hostNetwork by default and users will need to use admin permissions to allow these features. 
+For example, OpenShift requires hostPort or hostNetwork pods to be run with a privileged service account while Talos requires that the namespace be privileged for hostPort to be enabled. For this feature to be considered stable, the Zarf documentation must include instructions for which settings to change to enable hostPort / hostNetwork on the most common Kubernetes distributions. Zarf currently has no distro specific documentation, but plans to add this, see ([#3686](https://github.com/zarf-dev/zarf/issues/3686)).


For this feature to be considered stable, the Zarf documentation must include instructions for which settings to change to enable hostPort / hostNetwork on the most common Kubernetes distributions

++

When we get here, I expect that taking a few different approaches will help users the most. Some combination of a high level networking architecture overview after Start Here will help orient users, then under Best Practices or Reference we can guide users through granular settings like distro-specific settings.

mkcp · 2025-08-04T15:19:49Z

+The baseline [pod security standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/) recommends that pods should not set hostPort or HostNetwork. Users with controllers that enforce these standards, such as Kyverno, will need to make an exemption. Additionally, some distros will disable hostPort and hostNetwork by default and users will need to use admin permissions to allow these features. 
+For example, OpenShift requires hostPort or hostNetwork pods to be run with a privileged service account while Talos requires that the namespace be privileged for hostPort to be enabled. For this feature to be considered stable, the Zarf documentation must include instructions for which settings to change to enable hostPort / hostNetwork on the most common Kubernetes distributions. Zarf currently has no distro specific documentation, but plans to add this, see ([#3686](https://github.com/zarf-dev/zarf/issues/3686)).
+
+The registry is no longer accessible from outside of the cluster by default. Some users may rely on this, and will instead have to setup their own exposed service to connect to the registry.


Would it make sense to target example configs as part of GA?

For exposing the registry, no imo. It'll be exposable through standard Kubernetes means, and I believe only a very small percentage of our user base will want to do this.

brandtkeller

Non-blocking comments / questions. There are a lot of benefits wrapped into this proposal.

brandtkeller · 2025-08-04T16:32:43Z

+below is for the real nitty-gritty.
+-->
+
+A new `--registry-proxy` flag will be added to zarf init. Enabling this flag causes Zarf to create a DaemonSet running a proxy on each node that will connect directly to the registry service. Both the injector and proxy will require DaemonSets, and the injector will be long lived. Eventually, `--registry-proxy` may default to true. 


Benefits of the Injector being long-lived are good - more resilient to restarts to the registry.

Both the injector and proxy will require DaemonSets, and the injector will be long lived

This makes sense from the current context of the injector being temporary - could confuse some people as the proxy will be long-lived as well.

It could yeah. We'll definitely want docs explaining how and why the proxy process works

brandtkeller · 2025-08-04T16:34:08Z

+
+A user can run `--registry-proxy` during `zarf init` and their choice will be saved to the cluster and used on subsequent runs during `init`. If a user wants to switch back to the localhost NodePort solution they must run `zarf init --registry-proxy=false`. If a user runs `zarf init` without the `--registry-proxy` flag on an already initialized cluster, Zarf will continue using the registry setup that was used during the initial init, whether that is the registry proxy or NodePort solution. 
+
+The proxy and the registry will connect over mTLS. Zarf will create a certificate authority along with a client and server certificate using the authority. If a certificate has less than half of its total lifecycle remaining, then it will be rotated automatically during `zarf init`. Users will be able to specify their own certificates through flags on `zarf init`: `--registry-server-cert-file`, `--registry-server-key-file`, `--registry-client-key-file`, and `--registry-client-cert-file`. 


Nit/Out-of-scope: Do we have any intention of supporting other services wanting to access the Registy - now over https - given that the TLS secrets are not accessible globally in the cluster?

Not yet, I'd rather for those use cases change the allowed hostIP and have apps connect to the proxy. We will have to see how this works out as these use cases become more concrete

brandtkeller · 2025-08-04T16:37:47Z

+[testing-guidelines]: https://docs.zarf.dev/contribute/testing/
+-->
+
+There should be a test that verifies that `zarf init --registry-proxy` works with both NFTables and IPv6. These should both go through the entire e2e suite.


out-of-scope: would like to see some chaos engineering work its way into these tests - Purposely restart components of the init infrastructure and expect them to recover during package operations.

Agreed, or other things like restarting / adding new nodes

AustinAbro321 added 2 commits July 3, 2025 20:20

WIP additional notes

51280ef

Signed-off-by: Austin Abro <AustinAbro321@gmail.com>

WIP host network support

2b89951

Signed-off-by: Austin Abro <AustinAbro321@gmail.com>

AustinAbro321 changed the title ~~Host network support~~ registry proxy initial proposal Jul 7, 2025

AustinAbro321 added 3 commits July 7, 2025 15:45

go over nodeport solution

5538a63

Signed-off-by: Austin Abro <AustinAbro321@gmail.com>

cleanup notes

35b939d

Signed-off-by: Austin Abro <AustinAbro321@gmail.com>

update security risks

d2bfe9c

Signed-off-by: Austin Abro <AustinAbro321@gmail.com>

AustinAbro321 mentioned this pull request Jul 7, 2025

Improve security of zarf registry NodePort zarf-dev/zarf#2146

Closed

AustinAbro321 added 3 commits July 7, 2025 19:12

notes on what's needed

2cbb057

Signed-off-by: Austin Abro <AustinAbro321@gmail.com>

mention spegel as a potential solution to new nodes spinning up

825ba55

Signed-off-by: Austin Abro <AustinAbro321@gmail.com>

start on alternative

c2067fd

Signed-off-by: Austin Abro <AustinAbro321@gmail.com>

AustinAbro321 changed the title ~~registry proxy initial proposal~~ ZEP-0033: registry proxy Jul 8, 2025

AustinAbro321 changed the title ~~ZEP-0033: registry proxy~~ ZEP-0033: registry proxy initial proposal Jul 8, 2025

todos

43ae8f8

Signed-off-by: Austin Abro <AustinAbro321@gmail.com>

AustinAbro321 mentioned this pull request Jul 8, 2025

ZEP-0032: support ipv6 #31

Merged

AustinAbro321 and others added 6 commits July 9, 2025 18:25

upgrade / downgrade is in proposal

237ee55

Signed-off-by: Austin Abro <AustinAbro321@gmail.com>

move section to proposals

bc0737c

Signed-off-by: Austin Abro <AustinAbro321@gmail.com>

update controller strat

8937a4c

Signed-off-by: Austin Abro <AustinAbro321@gmail.com>

update spelling

bc02d00

Signed-off-by: Austin Abro <austinabro321@gmail.com>

fix grammar

dd32a23

Signed-off-by: Austin Abro <austinabro321@gmail.com>

corrections

2dcaef9

Signed-off-by: Austin Abro <austinabro321@gmail.com>