Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utility Cluster #186

Closed
runyontr opened this issue Dec 6, 2021 · 10 comments
Closed

Utility Cluster #186

runyontr opened this issue Dec 6, 2021 · 10 comments

Comments

@runyontr
Copy link
Contributor

runyontr commented Dec 6, 2021

As a system admin and application admin, I would like to not be dependent on connectivity to upstream repo1/registry1 or the DevSecOps platform my application is being developed on. This use case should also cover situations where my production environment is airgapped from my development environment.

Until #174 is closed, I'll be using the terms the terms "Hub/Spoke/appliance" clusters in line with this comment: #174 (comment)

Architecture:

Proposed architecture, requirements and workflows:

Zarf-Hub-Spoke-Page-1 drawio

  1. Deploy 1 (or more) appliance mode registry
  2. (Optional) front multiple appliance registries with load balancer
  3. Load BigBang components into appliance mode registries
  4. Deploy BBHub backed by appliance registries
  5. load BigBang components into Hub registry
  6. deploy spoke cluster BB pointed at Hub registry
  7. load mission app artifacts in hub registry
  8. deploy mission app pointed at Hub registry

Appliance

The appliance cluster(s) are responsible for providing the images and git repositories to support running the Hub cluster. Each node of this cluster is its own k3s appliance cluster that is stood up with the same zarf init.

Requirements:

  • Highly Available (HA). We don't want the hub cluster (which depends on this for images/repos) to go down.
  • Iron Bank images. As part of a production system, the images used in this deployment should be from IronBank
  • Docker Registry. This needs to be able to host a docker registry that can be used by a SINGLE tenant (the hub cluster)
  • Git Repo. To gitops the Hub cluster, we need to be able to host git repos in here

In order to make the availability of these images/repos HA, multiple appliance nodes will be stood up on individual VMs and could be proxied by the use of an explicit load balancer, or as an External IP Service

Hub Cluster

The Hub cluster is responsible for providing the images and git repositories to support running the spoke clusters.

Requirements:

  • HA - This cluster needs to be HA.
  • Kubernetes Agnostic - It should be possible to use any kubernetes cluster for this, e.g. EKS, k3s, RKE2, etc
  • Pulls images/repos from Appliance Cluster (maybe by
  • Hosts multi tenant docker registry (ironbank image)
  • Host multi tenant git repo (ironbank image)
  • Runs BB core
  • Single Tenant (i.e. just one cluster owner)
  • Spoke Tenants can upload their own artifacts to git/docker

Spoke Cluster

Requirements:

  • Kubernetes agnostic - it should be possible to use any kubernetes cluster for this
  • Pulls images/repos from the Hub cluster
  • Either single tenant or multi tenant
  • Runs BB core pulling from Hub cluster
  • Runs spoke tenant apps pulling from Hub cluster

Relates to #134

@RothAndrew
Copy link
Contributor

@runyontr what's the outcome you are looking for here? A new example that implements this architecture? Or just validation that this can all be done?

@runyontr
Copy link
Contributor Author

The outcome is three fold:

  1. agreement that this is an appropriate architecture for cloud deployments that need to be HA. Particular focus is on:
    i. Ensuring the Hub has HA access to pull images
    ii. enabling multi tenancy in the registry on the Hub to allow multiple spoke tenants to be used with the same hub
    iii. enabling multi tenancy in the git repo on the Hub to allow multiple spoke tenants to be used with the same hub
    iv. agreement on the work flow for upgrading this environment with the 3 unique zarf packages (appliance, hub and mission app)
  2. Ensuring capabilities are in place to meet the architecture and workflows
  3. Guide/tutorial/demo of creating sed environment for end users. This is Main README: Coming Soon - Disconnected GitOps (The Utility Cluster) #134

@RothAndrew
Copy link
Contributor

RothAndrew commented Dec 16, 2021

Note: I'm gonna hit "Comment" prematurely as I write this so people can start looking at it. It's gonna be kinda long.

Edit: Okay, I think I'm done for now


There's a lot here, such that I think we should break this down into multiple issues (I think I'm seeing somewhere around 6 issues)

Note: A lot of this will be assuming we have the native apply stuff done, and some of it might change as we finish out that work

In no particular order:

Example on how to do HA bootstrapping

Once the native apply stuff is done, the initial images for the Hub's docker registry have to come from somewhere. Right now the plan is to use a bastion host where the Zarf binary is installed and when you zarf init a Dockerv2 registry is run in what we're calling "embedded" mode as a systemd service.

Doing so would not be HA, so if that's desired we'd need to run multiple bastions, all with Zarf "embedded registries", behind a load balancer. We see this as an extreme use case, where your SLA has lots of 9's in it.

My initial SWAG on the work here is a Terraform project with documentation that deploys the bastions and the load balancer to AWS. It would also have to include how to keep each bastion in sync when updates are needed.

Multi-Tenant "Utility Cluster" / "GitOps Service" (both registry & git server)

Need to talk through this more. The way I understand this is that User A has access to these X number of images and repos, and User B does not have access to User A's images and repos. They have access to their own images and repos that User A doesn't have access to.

If that's the way you're thinking, we'll need to come together and decide what's in scope and get that feature request into the roadmap.

Example of an HA "Utility Cluster" / "GitOps Service" / "Hub Cluster"

This one looks pretty straightforward, we need an example people can look at that does a prod-like hub cluster with multiple k8s nodes and HA services for docker registry and Gitea (can Gitea do HA?). It would likely be worth doing simply first without Big Bang, then followed up with an enhancement to run it in Big Bang (would require work to get the docker registry and gitea running in Big Bang, as right now they do not)

Maybe we don't use Dockerv2 and Gitea for a prod-like HA hub cluster? We could do a gitops-managed, HA, deployment of other more prod-ready services. This would require that Zarf be modified to be able to push images and repos to any git service, not just Dockerv2 and Gitea

Example of an HA "Spoke" / "Workload" cluster

Also pretty straightforward. We need an example of doing a multi-node workload cluster that talks to a multi-node "hub" cluster. It would likely be worth doing simply first without Big Bang, then followed up with an enhancement to run the workload on top of Big Bang (potential issues with Traefik, cluster policies, etc)

No non-Iron Bank images

See #214 and #215 (maybe #213 also, depending on how this all shakes out)

Example that "puts it all together"

Also straightforward. We need an example that composes all of this into a prod-like holistic system architecture.

@RothAndrew
Copy link
Contributor

RothAndrew commented Dec 16, 2021

Here's an alternative diagram that I came up with a while back to the one Tom posted in the original description. It's simpler in scope and complexity, but not as highly available as Tom's would be. It's also not multitenant.

image (1)

@zack-is-cool
Copy link
Contributor

Is there a "utility cluster" example coming down the pipe? I think at this point it would basically be airgapping all git repos/images + throwing an ingress on the zarf deployment so the gitea and docker registry can be reached.

@jeff-mccoy
Copy link
Contributor

This is how Zarf worked < v0.15.0 out of the box, but we found it much more complicated for most of our use-cases. I think we could do a utility cluster example, which would essentially just add an ingress to expose the registry and git server. The utility cluster concept was really changed with the new design in v0.15.0, namely the "hub/spoke" model went away in favor of in-cluster services for the default configuration. Happy to discuss further or work together on an example, but our primary design now works of this in-cluster architecture

@zack-is-cool
Copy link
Contributor

Yeah, this is why we are still using v0.14.0 for our use case - just works out of the box. Haven't had a ton of time to look at all the differences with the zarf init package and deploy traefik/some other ingress

@jeff-mccoy
Copy link
Contributor

Yeah one of the issues we ran into when Zarf went multi-distro is the ingress, it gets pretty complicated pretty fast and some distros don't even have one available out of the box. Prior to v0.15.0 Zarf was effectively a single-node K3s wrapper for air gap. It's certainly a lot more now, but if that's your exclusive use-case it might be possible to stay there for now. We haven't had the bandwidth to discuss backporting anything for that version as we're still pre-release, but I'm sure if someone in the community wanted to, we could support PRs to do that.

@JasonvanBrackel
Copy link
Contributor

@jeff-mccoy @runyontr @RothAndrew Is this still a need. Given #186 (comment)

This either relevant to @ActionN and related work and should be an epic for a supportable use case, or is no longer relevant and should be closed.

Labeling an epic until I hear back from y'all.

@jeff-mccoy
Copy link
Contributor

Going to close this now as a stale issue. A lot of things have changed with the support of #560 and #570 and how Zarf usage has evolved over time. This doesn't feel like a thing we need to directly support outside of documentation & examples at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants