Description
Context
- cluster-api-provider-vsphere, cloud-provider-vsphere & image-builder are currently using a VMware owned and managed VMC project for CI
- Goal of this issue is to track the migration to a new community-owned environment (Google Cloud VMware Engine (GVE))
Prerequisites
- [WIP] Finalize funding & billing (responsible people are already engaged)
- Currently waiting for final approval
- Clarify how billing works exactly
Critical path
- Setup new GCP project + GVE instance (+ related objects)
- WIP PR: [WIP][DoNotReview] infra: add k8s-infra-vsphere for vSphere based CI #6851 (folder: infra/gcp/terraform/k8s-infra-vsphere)
- Setup authentication / authorization (Okta?) (see section below for details)
- Setup networking (see section below for details)
- Setup vCenter configuration
- WIP PR: [WIP][DoNotReview] infra: add k8s-infra-vsphere for vSphere based CI #6851 (folder: infra/gcp/terraform/k8s-infra-vsphere/vsphere)
- Ensure we can upload new OVA images after Kubernetes minor releases
- Setup Boskos configuration (see separate section below for details)
- Add new resources to the Boskos resources ConfigMap
- Add YAML with user data configuration
- Ensure we have a reaper configured for the new resource types
- Documentation: especially about how to apply the user data (we already have bash snippets here: https://github.com/sbueringer/k8s.io/blob/pr-vsphere/vsphere-boskos-poc/README.md#boskos)
- Change existing ProwJobs & presets (+ corresponding secrets in GCP Secrets Manager)
- Update ProwJobs to run on a community-owned cluster (including setting resources)
- Update presets to point to secrets with the new vSphere config (URL, credentials, thumbprint, ...) and VPN credentials
Open points
Networking
Requirements:
- ProwJobs (tests & janitor) need access to the vCenter API and to VMs running inside of vCenter
- VMs running inside of vCenter need access to the vCenter API
Current implementation in VMC: (VPN tunnel)
- VPN VM with public IP running within vCenter
- vCenter and VM IPs are not public
- ProwJobs get VPN certificates & config via presets
- Advantages:
- We already have a working implementation, we just have to replicate it
- No restrictions regarding how many IPs we can use for VMs within vCenter because they are private (we need ~ at least 1024, more would be better)
Alternatives to be explored: (sorry didn't understand the entire discusison in the meeting, just chime in below)
- Expose vCenter API & VM IPs publicly
- I really would like to avoid this for security reasons
- Peering between existing Prow cluster and GVE instance
- Additional Prow cluster for vSphere jobs in the same private network as the GVE instance
Authentication / Authorization (Okta?)
Requirements:
- vCenter access for the following users:
- for tests: (technical users)
- cluster-api-provider-vsphere
- cloud-provider-vsphere
- image-builder
- for cleanup: (technical user)
- janitor (currently implemented as periodic ProwJob, cleans up resources from Boskos)
- administrative access:
- for tests: (technical users)
Boskos configuration & presets
The following describes our current setup in VMC. We would like to use the same in the new GVE environment, using the same in GVE will also make the migration simpler and faster.
Notes:
- vCenter:
- Resource pools and folders have the following structure, e.g. /prow/cluster-api-provider-vsphere/{001, 002, ...}
- This allows us to track resource usage per repository/project
- One user per project (which only has permissions on the corresponding project resource pool & folder)
- This ensures we have isolation between projects
- One user for janitor which has access to all project resource pools / folders to cleanup
- Resource pools and folders have the following structure, e.g. /prow/cluster-api-provider-vsphere/{001, 002, ...}
- Presets:
- VPN credentials and the respective user credentials are injected into the ProwJobs via presets
- Boskos:
- Contains one resource for each (resource pool, folder) pair (user data also contains the corresponding IP pool configuration)
- We use different resource types for the different repositories/projects
(picture source on sbueringer#1, can be opened with drawio)
(current Boskos setup in the old VMC environment can be seen here: sbueringer#1)
Jobs that still have to be migrated
I checked all jobs that are still using the current vSphere environment and also the ones that are still using credentials from a VMware-owned GCP project to push images for: cluster-api-provider-vsphere, cloud-provider-vsphere, vsphere-csi-driver and image-builder. No suprises there.
The following jobs can be migrated once the new env is functional:
- cluster-api-provider-vsphere:
periodic-cluster-api-provider-vsphere-e2e-{{ $mode }}-{{ ReplaceAll $.branch "." "-" }}
periodic-cluster-api-provider-vsphere-e2e-{{ $mode }}-conformance-{{ ReplaceAll $.branch "." "-" }}
periodic-cluster-api-provider-vsphere-e2e-{{ $mode }}-conformance-ci-latest-{{ ReplaceAll $.branch "." "-" }}
periodic-cluster-api-provider-vsphere-janitor
periodic-cluster-api-provider-vsphere-e2e-exp-kk-alpha-features
periodic-cluster-api-provider-vsphere-e2e-exp-kk-serial
periodic-cluster-api-provider-vsphere-e2e-exp-kk-slow
periodic-cluster-api-provider-vsphere-e2e-exp-kk
periodic-cluster-api-provider-vsphere-e2e-{{ $mode }}-upgrade
pull-cluster-api-provider-vsphere-e2e-{{ $mode }}-blocking-{{ ReplaceAll $.branch "." "-" }}
pull-cluster-api-provider-vsphere-e2e-{{ $mode }}-{{ ReplaceAll $.branch "." "-" }}
pull-cluster-api-provider-vsphere-e2e-{{ $mode }}-upgrade
pull-cluster-api-provider-vsphere-e2e-{{ $mode }}-conformance-{{ ReplaceAll $.branch "." "-" }}
pull-cluster-api-provider-vsphere-e2e-{{ $mode }}-conformance-ci-latest-{{ ReplaceAll $.branch "." "-" }}
pull-cluster-api-provider-vsphere-janitor-main
- cloud-provider-vsphere:
pull-cloud-provider-vsphere-e2e-test
pull-cloud-provider-vsphere-e2e-test-on-latest-k8s-version
pull-cloud-provider-vsphere-e2e-test-1-26-minus
- image-builder:
pull-ova-all
The following jobs can be migrated today: (I talked to the maintainers of vsphere-csi-driver about it)
- vsphere-csi-driver:
post-vsphere-csi-driver-deploy
post-vsphere-csi-driver-release
Metadata
Metadata
Assignees
Labels
Type
Projects
Status