Usage-based billing for AI Clouds running vCluster or vMetal
Auto-discovers Tenant Clusters, meters node capacity and GPU SKUs, and streams usage events to your billing adapter.
Built for AI Clouds and platform teams running Kubernetes with vCluster or vMetal. vBilling is the pipe, not the billing engine. You keep your billing backend (Lago today; Metronome, Stripe Meters, OpenMeter, or a custom adapter coming next).
┌─────────────────────────────────────────────────────────────────────┐
│ Control Plane Cluster │
│ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ Tenant │ │ Tenant │ │ Tenant │ │
│ │ Cluster │ │ Cluster │ │ Cluster │ │
│ │ team-alpha │ │ team-beta │ │ team-gpu │ │
│ │ │ │ │ │ │ │
│ │ private · │ │ private · │ │ private · │ │
│ │ 4× A100 │ │ 2× L40S │ │ 8× H100 │ │
│ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │
│ │ │ │ │
│ └──────────────────┼──────────────────┘ │
│ │ │
│ ┌───────────▼────────────┐ │
│ │ vBilling │ │
│ │ Controller │ │
│ │ │ │
│ │ • Auto-discovers │ │
│ │ • Meters node capacity│ │
│ │ • Streams events │ │
│ └───────────┬────────────┘ │
└─────────────────────────────┼───────────────────────────────────────┘
│ Usage events (HTTP)
┌────────────▼────────────┐
│ Billing Adapter │
│ │
│ Lago · Stripe · │
│ Metronome · Custom │
│ │
│ Plans · Subscriptions │ ← Provider configures
│ Invoices · Wallets │
└─────────────────────────┘
vBilling handles metrics collection and event delivery. Your billing adapter handles pricing, plans, and invoicing. Providers configure pricing in the adapter. vBilling never decides what to charge.
Each tenant gets dedicated bare-metal nodes (GPUs, high-memory, etc.). Billing is based on full node allocation. The entire node is theirs.
team-gpu's dedicated nodes
├── node-1: 8× H100, 96 CPU, 1TB RAM → metered at full node capacity
├── node-2: 8× H100, 96 CPU, 1TB RAM → metered at full node capacity
└── node-3: 8× H100, 96 CPU, 1TB RAM → metered at full node capacity
Pod-level metering for shared-node platforms is present in the code but not the documented path. Primary focus is dedicated-node tenants running on AI Clouds.
| Metric | Source | Granularity |
|---|---|---|
| Node hours | Node watch | Per dedicated node |
| CPU core-hours | Node capacity | Full node capacity |
| Memory GB-hours | Node capacity | Full node capacity |
| GPU hours (by SKU) | Node labels | Per GPU SKU |
| GPU utilization | DCGM via Prometheus | Per GPU % |
| Storage GB-hours | PVC sizes | Per PVC |
| Network egress GB | CNI / Prometheus | Per tenant |
| LoadBalancer hours | Service count | Per LB service |
| Control plane hours | Tenant Cluster watch | 1 per cluster |
GPU SKU detection reads from node labels:
nvidia.com/gpu.product(NVIDIA GPU Operator)cloud.google.com/gke-accelerator(GKE)k8s.amazonaws.com/accelerator(EKS)
An H100 hour and a T4 hour are emitted as separate events so providers can price each SKU differently in their billing adapter.
vBilling ships with a Lago adapter today. Metronome, Stripe Meters, OpenMeter, and custom adapters are on the roadmap. The install pattern is the same regardless of adapter. The walkthrough below uses Lago.
- Kubernetes cluster with Tenant Clusters running (vCluster)
- metrics-server installed
- A billing adapter: Lago instance (see Deploying Lago)
- Optional: Prometheus with DCGM Exporter for GPU utilization
# Clone the vBilling repo (includes Lago docker-compose)
git clone https://github.com/vClusterLabs-Experiments/vbilling.git
cd vbilling/deploy/lago
# Generate RSA key for Lago (required for JWT signing)
openssl genrsa 2048 > lago_rsa.key
openssl rsa -in lago_rsa.key -out lago_rsa.key -traditional 2>/dev/null
# Create .env with Base64-encoded RSA key (Lago expects LAGO_RSA_PRIVATE_KEY)
echo "LAGO_RSA_PRIVATE_KEY=$(base64 -i lago_rsa.key | tr -d '\n')" > .env
# Start Lago
docker compose --env-file .env up -d
# Wait for API to be ready (~30s for database migrations)
# UI: http://localhost:8080 | API: http://localhost:3000Create an organization in Lago (first-time only):
curl -s -X POST http://localhost:3000/graphql \
-H "Content-Type: application/json" \
-d '{"query":"mutation { registerUser(input: { email: \"admin@example.com\", password: \"yourpassword\", organizationName: \"My Org\" }) { token } }"}'
# Get your API key
docker exec lago-db-1 psql -U lago -d lago -t -c "SELECT value FROM api_keys LIMIT 1;"Option A: Helm (production)
# Build and push the image first
docker buildx build --platform linux/amd64,linux/arm64 \
-t <your-registry>/vbilling:v0.1.0 --push .
# Install via Helm, point vBilling at your adapter
helm upgrade --install vbilling deploy/helm/vbilling \
--namespace vbilling-system --create-namespace \
--set image.repository=<your-registry>/vbilling \
--set image.tag=v0.1.0 \
--set adapter=lago \
--set lago.apiURL=http://lago-api.lago-system:3000 \
--set lago.apiKey=YOUR_LAGO_API_KEYOption B: Run locally (development/testing)
make build
LAGO_API_KEY=<key> LAGO_API_URL=http://localhost:3000 ./bin/vbillingvBilling creates billable metrics and a skeleton plan with $0 pricing. You set your own prices in the adapter's UI or API:
- Open Lago UI → Plans → vCluster Standard
- Edit each charge with your pricing:
- CPU Core-Hours:
$0.065(your cost + margin) - Memory GB-Hours:
$0.009 - GPU Hours (H100):
$4.50, or use Lago's graduated pricing for volume discounts - Storage GB-Hours:
$0.0002 - Network Egress GB:
$0.09 - Node Hours:
$25.00(for dedicated node billing)
- CPU Core-Hours:
- Save. Pricing takes effect immediately for all tenants
You can also create multiple plans (e.g., "GPU Premium", "Dev Tier") and assign different plans to different customers via the adapter's API.
vBilling will:
- Auto-discover all Tenant Clusters (via StatefulSet labels or Platform API)
- Create a billing customer in your adapter for each Tenant Cluster
- Stream usage events every 60 seconds
- Your adapter generates invoices at the end of each billing period
vBilling finds Tenant Clusters using two methods:
- Label scanning (works with OSS vCluster): Watches StatefulSets and Deployments with
app=vclusterlabel - Platform API (works with vCluster Platform): Lists
VirtualClusterInstanceresources via the management API
Every collection interval (default 60s):
For each discovered Tenant Cluster:
1. Read dedicated node capacity (labels: vcluster.loft.sh/managed-by=<name>)
→ Read full node capacity: CPU, memory, GPUs, storage
2. Collect storage from PVCs in the namespace
3. Collect GPU allocation from pod nvidia.com/gpu requests
→ Detect GPU SKU from the node's nvidia.com/gpu.product label
4. Count LoadBalancer services
5. Check spot vs on-demand node status for cost attribution
6. (If Prometheus configured) Query DCGM for GPU utilization
7. (If Prometheus configured) Query network egress bytes
8. Convert all metrics to billing units:
CPU: cores × interval_hours = core-hours
Memory: GB × interval_hours = GB-hours
GPU: count × interval_hours = GPU-hours (tagged with GPU SKU)
9. Stream all events to the configured billing adapter in batch
Tenant Cluster created → Customer auto-created in adapter
→ Subscription started (plan: vcluster-standard)
→ Usage events every 60s
→ Adapter aggregates over billing period
→ Invoice generated (monthly)
→ Webhook to payment provider (optional)
Tenant Cluster deleted → Subscription terminated
→ Final prorated invoice
| Variable | Default | Description |
|---|---|---|
ADAPTER |
lago |
Billing adapter to use (lago today; more adapters coming) |
LAGO_API_URL |
http://localhost:3000 |
Lago API endpoint (when ADAPTER=lago) |
LAGO_API_KEY |
(required for Lago) | Lago API key |
COLLECTION_INTERVAL |
60s |
How often to scrape metrics |
RECONCILE_INTERVAL |
30s |
How often to discover Tenant Clusters |
DEFAULT_PLAN_CODE |
vcluster-standard |
Default plan code in the adapter |
BILLING_CURRENCY |
USD |
Currency for billing |
PROMETHEUS_URL |
(empty) | Prometheus URL for DCGM/network |
SPOT_DISCOUNT_PERCENT |
60 |
Discount for pods on spot nodes |
Note: Pricing is NOT configured via environment variables. Configure pricing in your billing adapter's UI or API.
adapter: lago # billing adapter (lago today)
lago:
apiURL: "http://lago-api:3000"
apiKey: ""
existingSecret: "lago-credentials" # or use existing K8s secret
billing:
collectionInterval: "60s"
reconcileInterval: "30s"
prometheus:
url: "http://prometheus.monitoring:9090" # optionalEach customer gets a Tenant Cluster with dedicated bare-metal GPU nodes. vBilling meters the full node allocation by GPU SKU and streams events into whichever billing adapter you run.
Customer signs up
→ Platform provisions Tenant Cluster + dedicated nodes (8× H100)
→ vBilling discovers Tenant Cluster, detects the dedicated nodes
→ Streams events: 8 GPU-hours (H100) + 96 CPU-hours + 1 TB memory-hours per hour
→ Adapter (Lago) invoices monthly at the provider's rates
→ Customer pays via the adapter's payments integration (e.g., Stripe webhooks)
A lightweight billing dashboard is included at dashboard/index.html. It queries the Lago API directly and shows per-tenant usage breakdowns. (Adapter-specific dashboards for Metronome and Stripe will ship alongside those adapters.)
# Serve the dashboard
cd dashboard
python3 -m http.server 9090
# Open http://localhost:9090Features:
- Per-tenant usage cards with metric breakdown
- Total spend across all tenants
- Auto-refresh every 30 seconds
- No framework dependencies
cmd/vbilling/main.go Entry point
internal/
config/config.go Configuration from env vars
lago/
client.go Lago HTTP API client (current adapter)
bootstrap.go Auto-creates metrics + skeleton plan
discovery/discovery.go Tenant Cluster discovery (labels + Platform API)
metrics/collector.go All metrics: CPU, memory, GPU, storage,
network, DCGM, dedicated nodes, spot/on-demand
controller/controller.go Main reconciliation + event-streaming loop
deploy/
helm/vbilling/ Helm chart with RBAC
lago/ Docker Compose for Lago
dashboard/index.html Billing dashboard
scripts/demo.sh End-to-end demo using vind
Dockerfile Multi-stage distroless build
Makefile Build targets
Multi-adapter refactor (Source/Destination plugin pattern) is planned. Today Lago is wired directly; future adapters will live under
internal/destinations/<name>/.
make build # Build binary (local OS/arch)
make docker-build # Build Docker image (local arch)
make test # Run tests
make helm-install # Install via Helm
make tidy # go mod tidyFor production K8s clusters (linux/amd64) and Apple Silicon (linux/arm64):
# Build and push multi-arch image
docker buildx create --use --name vbilling-builder 2>/dev/null || true
docker buildx build \
--platform linux/amd64,linux/arm64 \
-t <your-registry>/vbilling:v0.1.0 \
--push .The Dockerfile uses multi-stage build with gcr.io/distroless/static:nonroot as the final image (~10MB).
cd deploy/lago
docker compose --env-file .env up -dDeploy Lago as Kubernetes workloads. Key components: PostgreSQL, Redis, API (Rails), Sidekiq worker, Clock, Frontend. See Lago docs for production guidance.
- Source / Destination plugin refactor (adapter pattern)
- Metronome adapter
- Stripe Meters adapter
- OpenMeter adapter (native CloudEvents)
- Custom adapter developer guide
- MIG (Multi-Instance GPU) partition tracking
- Grafana dashboard integration
- Budget alerts per Tenant Cluster
- Reserved capacity / commitment pricing
- Auto Nodes billing (dynamic node provisioning events)
- Netris network isolation billing integration
Apache 2.0