Conversation
Signed-off-by: Austin Abro <AustinAbro321@gmail.com>
Signed-off-by: Austin Abro <AustinAbro321@gmail.com>
Signed-off-by: Austin Abro <AustinAbro321@gmail.com>
Signed-off-by: Austin Abro <AustinAbro321@gmail.com>
Signed-off-by: Austin Abro <AustinAbro321@gmail.com>
Signed-off-by: Austin Abro <AustinAbro321@gmail.com>
Signed-off-by: Austin Abro <AustinAbro321@gmail.com>
Signed-off-by: Austin Abro <AustinAbro321@gmail.com>
Signed-off-by: Austin Abro <AustinAbro321@gmail.com>
Signed-off-by: Austin Abro <AustinAbro321@gmail.com>
Signed-off-by: Austin Abro <AustinAbro321@gmail.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
|
|
||
| The proxy and the registry will connect over mTLS. Zarf will create a certificate authority along with a client and server certificate using the authority. These certificates will be automatically rotated during `zarf init` if they have less than half of their total duration remaining. Users will be able to specify their own certificates through flags on `zarf init`: `--registry-server-cert-file`, `--registry-server-key-file`, `--registry-client-key-file`, and `--registry-client-cert-file`. | ||
|
|
||
| A controller will run in the cluster that watches when the proxy daemonset fails to pull an image and spins up the injector when this happens. This is important because when a new node is added to the cluster the injector will not be present for the daemonset to bootstrap itself, and the proxy image will not be cached on the node. |
There was a problem hiding this comment.
A different idea: how about using initcontainer in the daemonset to handle that injector? This way you're not required to use any controller, and it will automatically work for every node without extra work.
There was a problem hiding this comment.
Good idea, this would mean that the injector would now be long lived rather than short lived, but we'd need a long lived controller anyway. I'll have to make sure I'm not missing anything, but that does seem like a cleaner approach.
There was a problem hiding this comment.
I think the problem with an init containers is that the regular container for the proxy won't start until after the init container is complete, and we'll have no way to trigger the pulling of that image or to know when it's done. As you pointed out in a synchronous discussion, we could likely use a sidecar instead of a DaemonSet. I'll have to test that out, but that'd likely be better.
There was a problem hiding this comment.
It looks like sidecars will be most likely the best solution, at least for now.
There was a problem hiding this comment.
For whatever reason, I could not get sidecar containers to work, so I have it as it's own daemonset currently.
Co-authored-by: Maciej Szulik <maciej@defenseunicorns.com> Signed-off-by: Austin Abro <37223396+AustinAbro321@users.noreply.github.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
…into host-network-support
Signed-off-by: Austin Abro <austinabro321@gmail.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
| The current NodePort service solution does not support IPv6 as IPv6 does not enable route_localnet which is required to call NodePort services using [::1] ([#90236](https://github.com/kubernetes/kubernetes/issues/90236#issuecomment-624721859)). There is a mandate ([wayback machine link because white house site is flaky ATM](https://web.archive.org/web/20250116092323/https://www.whitehouse.gov/wp-content/uploads/2020/11/M-21-07.pdf)) for government agencies to migrate to IPv6 single stack by end of fiscal year (FY) 2025. Given how often Zarf is used in government environments it's important IPv6 is enabled. | ||
| A similar issue also makes the Zarf registry unusable for distros such as OpenShift which blocks rewriting traffic to localhost. In both of these situations, hostPort will not work, however hostNetwork will. | ||
|
|
||
| The registry proxy solution comes with security advantages. The registry will only be accessible from within the cluster. This is an advantage over the current solution since NodePort services default to being accessible externally to anyone who can connect to a node. Additionally, we will force the registry to connect to the proxy and Zarf CLI with mTLS. With this approach, the only unencrypted traffic during a kubelet call occurs between the kubelet and proxy, ensuring this traffic never leaves the host. The Zarf CLI will connect directly to the registry over mTLS and Kubernetes port forwards. |
There was a problem hiding this comment.
With this approach, the only unencrypted traffic during a kubelet call occurs between the kubelet and proxy, ensuring this traffic never leaves the host.
🔑
Signed-off-by: Austin Abro <austinabro321@gmail.com>
Signed-off-by: Austin Abro <austinabro321@gmail.com>
|
|
||
| As an administrator of a Kubernetes cluster who wants a greater security posture when using the Zarf registry, I run `zarf init --registry-proxy`. | ||
|
|
||
| ### Risks and Mitigations |
| ``` | ||
|
|
||
| The baseline [pod security standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/) recommends that pods should not set hostPort or HostNetwork. Users with controllers that enforce these standards, such as Kyverno, will need to make an exemption. Additionally, some distros will disable hostPort and hostNetwork by default and users will need to use admin permissions to allow these features. | ||
| For example, OpenShift requires hostPort or hostNetwork pods to be run with a privileged service account while Talos requires that the namespace be privileged for hostPort to be enabled. For this feature to be considered stable, the Zarf documentation must include instructions for which settings to change to enable hostPort / hostNetwork on the most common Kubernetes distributions. Zarf currently has no distro specific documentation, but plans to add this, see ([#3686](https://github.com/zarf-dev/zarf/issues/3686)). |
There was a problem hiding this comment.
For this feature to be considered stable, the Zarf documentation must include instructions for which settings to change to enable hostPort / hostNetwork on the most common Kubernetes distributions
++
When we get here, I expect that taking a few different approaches will help users the most. Some combination of a high level networking architecture overview after Start Here will help orient users, then under Best Practices or Reference we can guide users through granular settings like distro-specific settings.
| The baseline [pod security standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/) recommends that pods should not set hostPort or HostNetwork. Users with controllers that enforce these standards, such as Kyverno, will need to make an exemption. Additionally, some distros will disable hostPort and hostNetwork by default and users will need to use admin permissions to allow these features. | ||
| For example, OpenShift requires hostPort or hostNetwork pods to be run with a privileged service account while Talos requires that the namespace be privileged for hostPort to be enabled. For this feature to be considered stable, the Zarf documentation must include instructions for which settings to change to enable hostPort / hostNetwork on the most common Kubernetes distributions. Zarf currently has no distro specific documentation, but plans to add this, see ([#3686](https://github.com/zarf-dev/zarf/issues/3686)). | ||
|
|
||
| The registry is no longer accessible from outside of the cluster by default. Some users may rely on this, and will instead have to setup their own exposed service to connect to the registry. |
There was a problem hiding this comment.
Would it make sense to target example configs as part of GA?
There was a problem hiding this comment.
For exposing the registry, no imo. It'll be exposable through standard Kubernetes means, and I believe only a very small percentage of our user base will want to do this.
brandtkeller
left a comment
There was a problem hiding this comment.
Non-blocking comments / questions. There are a lot of benefits wrapped into this proposal.
| below is for the real nitty-gritty. | ||
| --> | ||
|
|
||
| A new `--registry-proxy` flag will be added to zarf init. Enabling this flag causes Zarf to create a DaemonSet running a proxy on each node that will connect directly to the registry service. Both the injector and proxy will require DaemonSets, and the injector will be long lived. Eventually, `--registry-proxy` may default to true. |
There was a problem hiding this comment.
Benefits of the Injector being long-lived are good - more resilient to restarts to the registry.
Both the injector and proxy will require DaemonSets, and the injector will be long lived
This makes sense from the current context of the injector being temporary - could confuse some people as the proxy will be long-lived as well.
There was a problem hiding this comment.
It could yeah. We'll definitely want docs explaining how and why the proxy process works
|
|
||
| A user can run `--registry-proxy` during `zarf init` and their choice will be saved to the cluster and used on subsequent runs during `init`. If a user wants to switch back to the localhost NodePort solution they must run `zarf init --registry-proxy=false`. If a user runs `zarf init` without the `--registry-proxy` flag on an already initialized cluster, Zarf will continue using the registry setup that was used during the initial init, whether that is the registry proxy or NodePort solution. | ||
|
|
||
| The proxy and the registry will connect over mTLS. Zarf will create a certificate authority along with a client and server certificate using the authority. If a certificate has less than half of its total lifecycle remaining, then it will be rotated automatically during `zarf init`. Users will be able to specify their own certificates through flags on `zarf init`: `--registry-server-cert-file`, `--registry-server-key-file`, `--registry-client-key-file`, and `--registry-client-cert-file`. |
There was a problem hiding this comment.
Nit/Out-of-scope: Do we have any intention of supporting other services wanting to access the Registy - now over https - given that the TLS secrets are not accessible globally in the cluster?
There was a problem hiding this comment.
Not yet, I'd rather for those use cases change the allowed hostIP and have apps connect to the proxy. We will have to see how this works out as these use cases become more concrete
| [testing-guidelines]: https://docs.zarf.dev/contribute/testing/ | ||
| --> | ||
|
|
||
| There should be a test that verifies that `zarf init --registry-proxy` works with both NFTables and IPv6. These should both go through the entire e2e suite. |
There was a problem hiding this comment.
out-of-scope: would like to see some chaos engineering work its way into these tests - Purposely restart components of the init infrastructure and expect them to recover during package operations.
There was a problem hiding this comment.
Agreed, or other things like restarting / adding new nodes
Uh oh!
There was an error while loading. Please reload this page.