Skip to content

Migration to new vSphere environment #6877

Open
@sbueringer

Description

@sbueringer

Context

  • cluster-api-provider-vsphere, cloud-provider-vsphere & image-builder are currently using a VMware owned and managed VMC project for CI
  • Goal of this issue is to track the migration to a new community-owned environment (Google Cloud VMware Engine (GVE))

Prerequisites

  • [WIP] Finalize funding & billing (responsible people are already engaged)
    • Currently waiting for final approval
    • Clarify how billing works exactly

Critical path

  • Setup new GCP project + GVE instance (+ related objects)
  • Setup vCenter configuration
  • Setup Boskos configuration (see separate section below for details)
  • Change existing ProwJobs & presets (+ corresponding secrets in GCP Secrets Manager)
    • Update ProwJobs to run on a community-owned cluster (including setting resources)
    • Update presets to point to secrets with the new vSphere config (URL, credentials, thumbprint, ...) and VPN credentials

Open points

Networking

Requirements:

  • ProwJobs (tests & janitor) need access to the vCenter API and to VMs running inside of vCenter
  • VMs running inside of vCenter need access to the vCenter API

Current implementation in VMC: (VPN tunnel)

  • VPN VM with public IP running within vCenter
  • vCenter and VM IPs are not public
  • ProwJobs get VPN certificates & config via presets
  • Advantages:
    • We already have a working implementation, we just have to replicate it
    • No restrictions regarding how many IPs we can use for VMs within vCenter because they are private (we need ~ at least 1024, more would be better)

Alternatives to be explored: (sorry didn't understand the entire discusison in the meeting, just chime in below)

  • Expose vCenter API & VM IPs publicly
    • I really would like to avoid this for security reasons
  • Peering between existing Prow cluster and GVE instance
  • Additional Prow cluster for vSphere jobs in the same private network as the GVE instance

Authentication / Authorization (Okta?)

Requirements:

  • vCenter access for the following users:
    • for tests: (technical users)
      • cluster-api-provider-vsphere
      • cloud-provider-vsphere
      • image-builder
    • for cleanup: (technical user)
      • janitor (currently implemented as periodic ProwJob, cleans up resources from Boskos)
    • administrative access:

Boskos configuration & presets

The following describes our current setup in VMC. We would like to use the same in the new GVE environment, using the same in GVE will also make the migration simpler and faster.

Notes:

  • vCenter:
    • Resource pools and folders have the following structure, e.g. /prow/cluster-api-provider-vsphere/{001, 002, ...}
      • This allows us to track resource usage per repository/project
    • One user per project (which only has permissions on the corresponding project resource pool & folder)
      • This ensures we have isolation between projects
    • One user for janitor which has access to all project resource pools / folders to cleanup
  • Presets:
    • VPN credentials and the respective user credentials are injected into the ProwJobs via presets
  • Boskos:
    • Contains one resource for each (resource pool, folder) pair (user data also contains the corresponding IP pool configuration)
    • We use different resource types for the different repositories/projects

boskos drawio

(picture source on sbueringer#1, can be opened with drawio)
(current Boskos setup in the old VMC environment can be seen here: sbueringer#1)

Jobs that still have to be migrated

I checked all jobs that are still using the current vSphere environment and also the ones that are still using credentials from a VMware-owned GCP project to push images for: cluster-api-provider-vsphere, cloud-provider-vsphere, vsphere-csi-driver and image-builder. No suprises there.

The following jobs can be migrated once the new env is functional:

  • cluster-api-provider-vsphere:
    • periodic-cluster-api-provider-vsphere-e2e-{{ $mode }}-{{ ReplaceAll $.branch "." "-" }}
    • periodic-cluster-api-provider-vsphere-e2e-{{ $mode }}-conformance-{{ ReplaceAll $.branch "." "-" }}
    • periodic-cluster-api-provider-vsphere-e2e-{{ $mode }}-conformance-ci-latest-{{ ReplaceAll $.branch "." "-" }}
    • periodic-cluster-api-provider-vsphere-janitor
    • periodic-cluster-api-provider-vsphere-e2e-exp-kk-alpha-features
    • periodic-cluster-api-provider-vsphere-e2e-exp-kk-serial
    • periodic-cluster-api-provider-vsphere-e2e-exp-kk-slow
    • periodic-cluster-api-provider-vsphere-e2e-exp-kk
    • periodic-cluster-api-provider-vsphere-e2e-{{ $mode }}-upgrade
    • pull-cluster-api-provider-vsphere-e2e-{{ $mode }}-blocking-{{ ReplaceAll $.branch "." "-" }}
    • pull-cluster-api-provider-vsphere-e2e-{{ $mode }}-{{ ReplaceAll $.branch "." "-" }}
    • pull-cluster-api-provider-vsphere-e2e-{{ $mode }}-upgrade
    • pull-cluster-api-provider-vsphere-e2e-{{ $mode }}-conformance-{{ ReplaceAll $.branch "." "-" }}
    • pull-cluster-api-provider-vsphere-e2e-{{ $mode }}-conformance-ci-latest-{{ ReplaceAll $.branch "." "-" }}
    • pull-cluster-api-provider-vsphere-janitor-main
  • cloud-provider-vsphere:
    • pull-cloud-provider-vsphere-e2e-test
    • pull-cloud-provider-vsphere-e2e-test-on-latest-k8s-version
    • pull-cloud-provider-vsphere-e2e-test-1-26-minus
  • image-builder:
    • pull-ova-all

The following jobs can be migrated today: (I talked to the maintainers of vsphere-csi-driver about it)

  • vsphere-csi-driver:
    • post-vsphere-csi-driver-deploy
    • post-vsphere-csi-driver-release

Metadata

Metadata

Labels

lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.priority/backlogHigher priority than priority/awaiting-more-evidence.sig/k8s-infraCategorizes an issue or PR as relevant to SIG K8s Infra.

Type

No type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions