Skip to content

vClusterLabs-Experiments/vbilling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vBilling

Usage-based billing for AI Clouds running vCluster or vMetal
Auto-discovers Tenant Clusters, meters node capacity and GPU SKUs, and streams usage events to your billing adapter.

Build Go Report License Lago adapter Coming soon


Built for AI Clouds and platform teams running Kubernetes with vCluster or vMetal. vBilling is the pipe, not the billing engine. You keep your billing backend (Lago today; Metronome, Stripe Meters, OpenMeter, or a custom adapter coming next).

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                      Control Plane Cluster                          │
│                                                                     │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐            │
│  │ Tenant        │  │ Tenant        │  │ Tenant        │            │
│  │ Cluster       │  │ Cluster       │  │ Cluster       │            │
│  │ team-alpha    │  │ team-beta     │  │ team-gpu      │            │
│  │               │  │               │  │               │            │
│  │ private ·     │  │ private ·     │  │ private ·     │            │
│  │ 4× A100       │  │ 2× L40S       │  │ 8× H100       │            │
│  └───────┬───────┘  └───────┬───────┘  └───────┬───────┘            │
│          │                  │                  │                    │
│          └──────────────────┼──────────────────┘                    │
│                             │                                       │
│                 ┌───────────▼────────────┐                          │
│                 │      vBilling          │                          │
│                 │      Controller        │                          │
│                 │                        │                          │
│                 │  • Auto-discovers      │                          │
│                 │  • Meters node capacity│                          │
│                 │  • Streams events      │                          │
│                 └───────────┬────────────┘                          │
└─────────────────────────────┼───────────────────────────────────────┘
                              │ Usage events (HTTP)
                 ┌────────────▼────────────┐
                 │     Billing Adapter     │
                 │                         │
                 │  Lago · Stripe ·        │
                 │  Metronome · Custom     │
                 │                         │
                 │  Plans · Subscriptions  │  ← Provider configures
                 │  Invoices · Wallets     │
                 └─────────────────────────┘

vBilling handles metrics collection and event delivery. Your billing adapter handles pricing, plans, and invoicing. Providers configure pricing in the adapter. vBilling never decides what to charge.

Deployment Model

Each tenant gets dedicated bare-metal nodes (GPUs, high-memory, etc.). Billing is based on full node allocation. The entire node is theirs.

team-gpu's dedicated nodes
├── node-1: 8× H100, 96 CPU, 1TB RAM → metered at full node capacity
├── node-2: 8× H100, 96 CPU, 1TB RAM → metered at full node capacity
└── node-3: 8× H100, 96 CPU, 1TB RAM → metered at full node capacity

Pod-level metering for shared-node platforms is present in the code but not the documented path. Primary focus is dedicated-node tenants running on AI Clouds.

What Gets Metered

Metric Source Granularity
Node hours Node watch Per dedicated node
CPU core-hours Node capacity Full node capacity
Memory GB-hours Node capacity Full node capacity
GPU hours (by SKU) Node labels Per GPU SKU
GPU utilization DCGM via Prometheus Per GPU %
Storage GB-hours PVC sizes Per PVC
Network egress GB CNI / Prometheus Per tenant
LoadBalancer hours Service count Per LB service
Control plane hours Tenant Cluster watch 1 per cluster

GPU SKU detection reads from node labels:

  • nvidia.com/gpu.product (NVIDIA GPU Operator)
  • cloud.google.com/gke-accelerator (GKE)
  • k8s.amazonaws.com/accelerator (EKS)

An H100 hour and a T4 hour are emitted as separate events so providers can price each SKU differently in their billing adapter.

Quick Start

vBilling ships with a Lago adapter today. Metronome, Stripe Meters, OpenMeter, and custom adapters are on the roadmap. The install pattern is the same regardless of adapter. The walkthrough below uses Lago.

Prerequisites

1. Choose & deploy your billing adapter (Lago)

# Clone the vBilling repo (includes Lago docker-compose)
git clone https://github.com/vClusterLabs-Experiments/vbilling.git
cd vbilling/deploy/lago

# Generate RSA key for Lago (required for JWT signing)
openssl genrsa 2048 > lago_rsa.key
openssl rsa -in lago_rsa.key -out lago_rsa.key -traditional 2>/dev/null

# Create .env with Base64-encoded RSA key (Lago expects LAGO_RSA_PRIVATE_KEY)
echo "LAGO_RSA_PRIVATE_KEY=$(base64 -i lago_rsa.key | tr -d '\n')" > .env

# Start Lago
docker compose --env-file .env up -d

# Wait for API to be ready (~30s for database migrations)
# UI: http://localhost:8080 | API: http://localhost:3000

Create an organization in Lago (first-time only):

curl -s -X POST http://localhost:3000/graphql \
  -H "Content-Type: application/json" \
  -d '{"query":"mutation { registerUser(input: { email: \"admin@example.com\", password: \"yourpassword\", organizationName: \"My Org\" }) { token } }"}'

# Get your API key
docker exec lago-db-1 psql -U lago -d lago -t -c "SELECT value FROM api_keys LIMIT 1;"

2. Install vBilling

Option A: Helm (production)

# Build and push the image first
docker buildx build --platform linux/amd64,linux/arm64 \
  -t <your-registry>/vbilling:v0.1.0 --push .

# Install via Helm, point vBilling at your adapter
helm upgrade --install vbilling deploy/helm/vbilling \
  --namespace vbilling-system --create-namespace \
  --set image.repository=<your-registry>/vbilling \
  --set image.tag=v0.1.0 \
  --set adapter=lago \
  --set lago.apiURL=http://lago-api.lago-system:3000 \
  --set lago.apiKey=YOUR_LAGO_API_KEY

Option B: Run locally (development/testing)

make build
LAGO_API_KEY=<key> LAGO_API_URL=http://localhost:3000 ./bin/vbilling

3. Configure pricing in your adapter

vBilling creates billable metrics and a skeleton plan with $0 pricing. You set your own prices in the adapter's UI or API:

  1. Open Lago UI → PlansvCluster Standard
  2. Edit each charge with your pricing:
    • CPU Core-Hours: $0.065 (your cost + margin)
    • Memory GB-Hours: $0.009
    • GPU Hours (H100): $4.50, or use Lago's graduated pricing for volume discounts
    • Storage GB-Hours: $0.0002
    • Network Egress GB: $0.09
    • Node Hours: $25.00 (for dedicated node billing)
  3. Save. Pricing takes effect immediately for all tenants

You can also create multiple plans (e.g., "GPU Premium", "Dev Tier") and assign different plans to different customers via the adapter's API.

4. Done

vBilling will:

  • Auto-discover all Tenant Clusters (via StatefulSet labels or Platform API)
  • Create a billing customer in your adapter for each Tenant Cluster
  • Stream usage events every 60 seconds
  • Your adapter generates invoices at the end of each billing period

How It Works

Discovery

vBilling finds Tenant Clusters using two methods:

  1. Label scanning (works with OSS vCluster): Watches StatefulSets and Deployments with app=vcluster label
  2. Platform API (works with vCluster Platform): Lists VirtualClusterInstance resources via the management API

Metrics Collection Loop

Every collection interval (default 60s):

For each discovered Tenant Cluster:
  1. Read dedicated node capacity (labels: vcluster.loft.sh/managed-by=<name>)
     → Read full node capacity: CPU, memory, GPUs, storage

  2. Collect storage from PVCs in the namespace

  3. Collect GPU allocation from pod nvidia.com/gpu requests
     → Detect GPU SKU from the node's nvidia.com/gpu.product label

  4. Count LoadBalancer services

  5. Check spot vs on-demand node status for cost attribution

  6. (If Prometheus configured) Query DCGM for GPU utilization
  7. (If Prometheus configured) Query network egress bytes

  8. Convert all metrics to billing units:
     CPU: cores × interval_hours = core-hours
     Memory: GB × interval_hours = GB-hours
     GPU: count × interval_hours = GPU-hours (tagged with GPU SKU)

  9. Stream all events to the configured billing adapter in batch

Billing Flow

Tenant Cluster created  →  Customer auto-created in adapter
                        →  Subscription started (plan: vcluster-standard)
                        →  Usage events every 60s
                        →  Adapter aggregates over billing period
                        →  Invoice generated (monthly)
                        →  Webhook to payment provider (optional)

Tenant Cluster deleted  →  Subscription terminated
                        →  Final prorated invoice

Configuration

Environment Variables

Variable Default Description
ADAPTER lago Billing adapter to use (lago today; more adapters coming)
LAGO_API_URL http://localhost:3000 Lago API endpoint (when ADAPTER=lago)
LAGO_API_KEY (required for Lago) Lago API key
COLLECTION_INTERVAL 60s How often to scrape metrics
RECONCILE_INTERVAL 30s How often to discover Tenant Clusters
DEFAULT_PLAN_CODE vcluster-standard Default plan code in the adapter
BILLING_CURRENCY USD Currency for billing
PROMETHEUS_URL (empty) Prometheus URL for DCGM/network
SPOT_DISCOUNT_PERCENT 60 Discount for pods on spot nodes

Note: Pricing is NOT configured via environment variables. Configure pricing in your billing adapter's UI or API.

Helm Values

adapter: lago  # billing adapter (lago today)

lago:
  apiURL: "http://lago-api:3000"
  apiKey: ""
  existingSecret: "lago-credentials"  # or use existing K8s secret

billing:
  collectionInterval: "60s"
  reconcileInterval: "30s"

prometheus:
  url: "http://prometheus.monitoring:9090"  # optional

Use Case: AI Cloud

Each customer gets a Tenant Cluster with dedicated bare-metal GPU nodes. vBilling meters the full node allocation by GPU SKU and streams events into whichever billing adapter you run.

Customer signs up
  → Platform provisions Tenant Cluster + dedicated nodes (8× H100)
  → vBilling discovers Tenant Cluster, detects the dedicated nodes
  → Streams events: 8 GPU-hours (H100) + 96 CPU-hours + 1 TB memory-hours per hour
  → Adapter (Lago) invoices monthly at the provider's rates
  → Customer pays via the adapter's payments integration (e.g., Stripe webhooks)

Dashboard

A lightweight billing dashboard is included at dashboard/index.html. It queries the Lago API directly and shows per-tenant usage breakdowns. (Adapter-specific dashboards for Metronome and Stripe will ship alongside those adapters.)

# Serve the dashboard
cd dashboard
python3 -m http.server 9090
# Open http://localhost:9090

Features:

  • Per-tenant usage cards with metric breakdown
  • Total spend across all tenants
  • Auto-refresh every 30 seconds
  • No framework dependencies

Project Structure

cmd/vbilling/main.go              Entry point
internal/
  config/config.go                Configuration from env vars
  lago/
    client.go                     Lago HTTP API client (current adapter)
    bootstrap.go                  Auto-creates metrics + skeleton plan
  discovery/discovery.go          Tenant Cluster discovery (labels + Platform API)
  metrics/collector.go            All metrics: CPU, memory, GPU, storage,
                                  network, DCGM, dedicated nodes, spot/on-demand
  controller/controller.go        Main reconciliation + event-streaming loop
deploy/
  helm/vbilling/                  Helm chart with RBAC
  lago/                           Docker Compose for Lago
dashboard/index.html              Billing dashboard
scripts/demo.sh                   End-to-end demo using vind
Dockerfile                        Multi-stage distroless build
Makefile                          Build targets

Multi-adapter refactor (Source/Destination plugin pattern) is planned. Today Lago is wired directly; future adapters will live under internal/destinations/<name>/.

Building

make build          # Build binary (local OS/arch)
make docker-build   # Build Docker image (local arch)
make test           # Run tests
make helm-install   # Install via Helm
make tidy           # go mod tidy

Multi-Arch Docker Image

For production K8s clusters (linux/amd64) and Apple Silicon (linux/arm64):

# Build and push multi-arch image
docker buildx create --use --name vbilling-builder 2>/dev/null || true
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  -t <your-registry>/vbilling:v0.1.0 \
  --push .

The Dockerfile uses multi-stage build with gcr.io/distroless/static:nonroot as the final image (~10MB).

Deploying Lago

Docker Compose (Development/Demo)

cd deploy/lago
docker compose --env-file .env up -d

Kubernetes (Production)

Deploy Lago as Kubernetes workloads. Key components: PostgreSQL, Redis, API (Rails), Sidekiq worker, Clock, Frontend. See Lago docs for production guidance.

Roadmap

  • Source / Destination plugin refactor (adapter pattern)
  • Metronome adapter
  • Stripe Meters adapter
  • OpenMeter adapter (native CloudEvents)
  • Custom adapter developer guide
  • MIG (Multi-Instance GPU) partition tracking
  • Grafana dashboard integration
  • Budget alerts per Tenant Cluster
  • Reserved capacity / commitment pricing
  • Auto Nodes billing (dynamic node provisioning events)
  • Netris network isolation billing integration

License

Apache 2.0

About

Usage-based billing for vCluster tenants. Auto-discovers vClusters, meters CPU/memory/GPU/storage/network, generates invoices via Lago.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors