Currently the only way to manage EKS Anywhere CloudStack cluster is by using CLI. We need to support native K8s experience on CloudStack cluster lifecycle.
- Create/Upgrade/Delete CloudStack workload clusters using: kubectl apply -f cluster.yaml
- Create/Update/Upgrade/Delete CloudStack workload clusters using GitOps/Terraform
- Create/Upgrade/Delete self-managed clusters with API
- We currently only supports k8s version < 1.24 on CloudStack due to current CAPC version.
The EKS-A controller running in management cluster fully manages workload clusters, including reconciling EKS-A CRDs and installing the CNI in the workload cluster. Though with different machine provider, the CloudStack cluster reconciling process watches and reconciles the same CAPI objects and EKS-A cluster object as vSphere, the reconciliation flow is similar to vSphere reconciler which watches resources and use an event handler to enqueue reconcile requests in response to those events.
In order to maintain same level of validations logic we currently run in CLI, we'll import validations into data validation and run time validation.
- Data validation: Kubernetes offers validation webhook for CRDs which runs data validation before accepted by kube-api server. Data validations should be light and fast.
- Run time validation: some validations require calling the CloudStack API, like if provided availability zone/account exits, which is too heavy for the webhook. We will implement those validations in a separate datacenter reconciler, and block the reconciliation process on failure, which is similar to vSphere datacenter reconciler
As part of the CLI logic, we set default values to some spec if they're missing from what customers provide, like datacenter availability zone, which can be leveraged in mutation webhook.
We don't want to modify the spec that customers have already specified. Control Plane Host Port could be missing from what customers provide in control plane host, and the reconciler would set a default port when generating CAPI objects.
CloudStack machine config doesn’t have default values at this stage.
In order to better sort out data validation and run time validation, we’ll reorganize validations in CLI.
- Extract data validation from ValidateClusterMachineConfigs (https://github.com/aws/eks-anywhere/blob/main/pkg/providers/cloudstack/validator.go#L127) and called from webhook.
- We have immutable fields validation in both webhook (validateImmutableFieldsCloudStackMachineConfig and provider validateMachineConfigImmutability). There’re some gaps between two places so we’ll need to decide the final immutable fields.
- affinity group
- disk offering (can be modified from CAPI v1.4)
- We have k8s version limitation due to capc version. This lightweight validation can be done in cluster webhook.
We’ll have CloudStackDatacenterReconciler to validate provided datacenter spec and update corresponding status and failure message. Any validation failure in this step would stop further reconciliation like cni/control plane node/worker node until customer can provide valid datacenter information.
The datacenter reconciler needs a validator with CloudStack credential to build cmk to talk to CloudStack API. The reconciler will retrieve the CloudStack credentials from secrets. These secrets are created by the CLI during the management cluster creation or added by customers. CloudStack supports multiple secrets by referring secret name and credential profile name.
CLI doesn't allow customers to rotate credentials. Since reconciler reads credentials from secrets dynamically, customers are allowed to rotate secrets.
We can use the validator factory to pass validatorRegistry to controller, so the validator is built in every reconcile loop.
func (r *CloudstackReconciler) Reconcile(ctx context.Context, log logr.Logger, cluster *anywherev1.Cluster) (controller.Result, error) {
log = log.WithValues("provider", "cloudstack")
clusterSpec, err := c.BuildSpec(ctx, clientutil.NewKubeClient(r.client), cluster)
if err != nil {
return controller.Result{}, err
}
return controller.NewPhaseRunner().Register(
r.ipValidator.ValidateControlPlaneIP, // checks whether the control plane ip is used by another cluster
r.ValidateDatacenterConfig, // checks no failre from datacenter reconciler
r.ValidateMachineConfig, // Once datacenter is validated, generate availability zone from datacenter to perform run time validation on machine config
clusters.CleanupStatusAfterValidate, // removes errors from the cluster status after all validation phases have been executed
r.ReconcileControlPlane,
r.CheckControlPlaneReady,
r.ReconcileCNI,
r.ReconcileWorkers,
).Run(ctx, log, clusterSpec)
}