Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
8d83582
Add cloud payload search benchmark case
jamesgao-jpg May 8, 2026
1637484
Add cloud payload scalar label support
jamesgao-jpg May 11, 2026
829d7f4
Add cloud insert readiness case
jamesgao-jpg May 11, 2026
c943513
Merge remote-tracking branch 'upstream/main' into cloud-payload-searc…
jamesgao-jpg May 11, 2026
519f01c
Fix cloud insert CI lint failures
jamesgao-jpg May 11, 2026
4df4b8f
Add multitenant VDBBench design spec
jamesgao-jpg May 12, 2026
20befac
Add multitenant implementation plan
jamesgao-jpg May 13, 2026
76ad386
Add cloud insert concurrency and Pinecone readiness support
jamesgao-jpg May 13, 2026
5fb9930
Add cloud cold latency case design
jamesgao-jpg May 13, 2026
e611187
Add cloud cold latency implementation plan
jamesgao-jpg May 13, 2026
2a99d70
Add cloud cold latency case model
jamesgao-jpg May 13, 2026
f61b447
Add cold warm search runner
jamesgao-jpg May 13, 2026
46c5c94
Fix cold warm runner lint issues
jamesgao-jpg May 13, 2026
a9f7668
Wire cloud cold latency runner into tasks
jamesgao-jpg May 13, 2026
33361d2
Fix cloud cold latency task integration lint
jamesgao-jpg May 13, 2026
52b7158
Fix cloud cold latency CLI defaults
jamesgao-jpg May 13, 2026
fc66bc1
Add cloud multitenant search case
jamesgao-jpg May 13, 2026
dd2485d
Emit first-class cloud case result metrics
jamesgao-jpg May 13, 2026
2f92bbf
Pretty-print cloud result JSON output
jamesgao-jpg May 13, 2026
77dcb72
fix: validate multitenant search schema
jamesgao-jpg May 13, 2026
6cede04
fix: support turbopuffer multitenant payload runs
jamesgao-jpg May 13, 2026
df15217
fix: configure turbopuffer scalar payload field
jamesgao-jpg May 13, 2026
66475be
Add Turbopuffer namespace pinning CLI
jamesgao-jpg May 14, 2026
1994211
Document TurboPuffer unpin command
jamesgao-jpg May 11, 2026
9ecd79b
Record turbopuffer pinning request metadata
jamesgao-jpg May 14, 2026
0bf0fef
Add cloud insert benchmark raw results
jamesgao-jpg May 14, 2026
c4620d1
Remove internal docs and raw cloud results from PR
jamesgao-jpg May 15, 2026
c623e17
Merge remote-tracking branch 'upstream/main' into pr775-conflict-reso…
jamesgao-jpg May 18, 2026
b7ec21e
Format cloud leaderboard changes
jamesgao-jpg May 18, 2026
9648add
Add cloud leaderboard test results
jamesgao-jpg May 19, 2026
379f81e
Revert "Add cloud leaderboard test results"
jamesgao-jpg May 19, 2026
7bfa991
Add CloudLeadboard release note
jamesgao-jpg May 19, 2026
b353ac0
Document CloudLeadboard in README
jamesgao-jpg May 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -708,6 +708,17 @@ vectordbbench batchcli --batch-config-file <your-yaml-configuration-file>
### Introduction
To facilitate the presentation of test results and provide a comprehensive performance analysis report, we offer a [leaderboard page](https://zilliz.com/benchmark). It allows us to choose from QPS, QP$, and latency metrics, and provides a comprehensive assessment of a system's performance based on the test results of various cases and a set of scoring mechanisms (to be introduced later). On this leaderboard, we can select the systems and models to be compared, and filter out cases we do not want to consider. Comprehensive scores are always ranked from best to worst, and the specific test results of each query will be presented in the list below.

### CloudLeadboard v2

VectorDBBench now includes CloudLeadboard v2 cases for production-oriented cloud vector database evaluation. These cases complement the original raw-performance leaderboard by measuring behaviors that matter for managed services:

- `CloudInsertCase`: insert throughput plus searchable and indexed readiness delays.
- `CloudPayloadSearchCase`: search performance when responses return IDs only, scalar metadata, or vectors.
- `CloudMultiTenantSearchCase`: tenant-routed search for SaaS-shaped workloads.
- `CloudColdLatencyCase`: cold and warm serial latency for first-query and cache-sensitive serving paths.

The May 2026 release note explains why the cloud leaderboard was added, what changed, which systems were tested this round, and how to run each new case: [docs/release/2026-05-cloud-leadboard.md](docs/release/2026-05-cloud-leadboard.md).

### Scoring Rules

1. For each case, select a base value and score each system based on relative values.
Expand Down
161 changes: 161 additions & 0 deletions docs/release/2026-05-cloud-leadboard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
# CloudLeadboard v2 Release Note

May 2026

CloudLeadboard v2 moves VectorDBBench beyond a single raw-throughput ranking. It evaluates managed vector databases around the behaviors production teams have to plan for: ingest readiness, payload-aware search, tenant-shaped workloads, cold latency, and cost at practical QPS targets.

## Why we need a new leaderboard now

The vector database market has moved past the "highest QPS wins" phase. Production teams choosing a managed vector database also care about budget, data freshness, tail latency, recall, metadata payloads, tenant isolation, and operational predictability.

The existing VectorDBBench leaderboard remains useful for comparing baseline search performance across systems. But cloud buyers ask a wider set of questions:

- When does newly inserted data become searchable?
- When is it fully indexed?
- What happens when search returns metadata or vectors instead of only IDs?
- What happens when traffic is split across many tenants?
- What does each reachable QPS tier cost?

CloudLeadboard v2 is designed around those questions. It keeps performance visible, but puts it next to the readiness, payload, tenant, cold-start, and cost signals that determine what a customer can safely deploy.

## What CloudLeadboard v2 changes

CloudLeadboard v2 is a production cloud decision layer, not a replacement for the original raw-performance board. The main change is that benchmark cases now model cloud operating concerns directly instead of treating all products as simple warm search engines.

The new cases add:

- Insert readiness measurement, including client insert completion, searchable delay, and indexed delay.
- Explicit response payload profiles: IDs only, scalar label metadata, or vector values.
- Cloud cold-latency measurement for the first search path after idle or cache-cold conditions.
- Multi-tenant search, where data is split into deterministic tenant labels or namespaces and queries are routed by tenant.
- Cost-oriented interpretation, so raw QPS can be read together with monthly cost and readiness constraints.

This matters because a top-line QPS table can hide important tradeoffs. A system can look strong on peak throughput while losing ground on recall, p99 latency, cold-start behavior, payload cost, or sustained cost at the same target QPS.

## Who we tested this round

This round focuses on three popular cloud vector databases:

- Zilliz Cloud, including tiered and fixed-capacity configurations.
- turbopuffer, including normal, pinned, and backpressure-related configurations where applicable.
- Pinecone serverless.

The tested matrix is intentionally cloud-oriented. It compares managed products and managed-service modes rather than only local or self-hosted engine behavior.

## The new tests we added

Version 2 adds four cloud-oriented cases in VectorDBBench. Each case is designed to expose a production behavior that a plain QPS benchmark can miss.

### CloudInsertCase

**Purpose.** CloudInsertCase measures write readiness, not just client-side insert speed. This is important for backfills, migrations, daily refreshes, and release workflows where a team needs to know when newly written vectors can safely take traffic.

**How it works.** The case loads the dataset with `ConcurrentInsertRunner`, records insert completion time and rows per second, then polls the database until inserted data is fully searchable and fully indexed. The resulting metric separates:

- `insert_completion_seconds`
- `insert_rows_per_second`
- `searchable_after_insert_seconds`
- `indexed_after_searchable_seconds`

Example: run LAION 100M insert readiness on Zilliz Cloud with a 10k batch size.

```bash
vectordbbench zillizautoindex \
--case-type CloudInsertCase \
--uri "$ZILLIZ_URI" \
--token "$ZILLIZ_TOKEN" \
--collection-name cloud_insert_laion100m_bs10k \
--cloud-insert-batch-size 10000 \
--load-concurrency 16 \
--skip-search-serial \
--skip-search-concurrent \
--task-label cloud-insert-zilliz-12cu
```

### CloudPayloadSearchCase

**Purpose.** CloudPayloadSearchCase measures search when the response body resembles production traffic. Many applications return more than vector IDs: they return scalar metadata, labels, or the vector values themselves. That response payload can change throughput, latency, and even product ranking.

**How it works.** The case extends the normal performance case with an explicit `payload_profile`. Supported profiles are:

- `ids_only`
- `scalar_label`
- `vector`

The case can also run unfiltered search, integer-filter search through `--cloud-filter-rate`, or scalar-label filter search through `--cloud-label-percentage`. It records QPS, latency, recall where applicable, and estimated response payload bytes per query.

Example: run vector-payload search on Pinecone with a highly selective integer filter.

```bash
vectordbbench pinecone \
--case-type CloudPayloadSearchCase \
--api-key "$PINECONE_API_KEY" \
--index-name "$PINECONE_INDEX" \
--payload-profile vector \
--cloud-filter-rate 0.001 \
--k 100 \
--num-concurrency 60,80 \
--concurrency-duration 30 \
--task-label cloud-payload-pinecone-vector-int-filter-0-1p
```

### CloudMultiTenantSearchCase

**Purpose.** CloudMultiTenantSearchCase models SaaS-shaped traffic. Instead of treating the dataset as one flat global collection, it splits records across many tenants and routes each query to a tenant. This highlights products whose namespace, partition-key, or tenant-filter paths behave differently from single-tenant search.

**How it works.** The case defaults to the Cohere 10M dataset and assigns each row to a deterministic tenant by `row_id % tenant_count`. During search, queries are routed to the corresponding tenant label or namespace. The case supports the same payload profiles and optional filter modes as payload search.

Example: run 1,000-tenant IDs-only search on turbopuffer.

```bash
vectordbbench turbopuffer \
--case-type CloudMultiTenantSearchCase \
--dataset-with-size-type "Large Cohere (768dim, 10M)" \
--api-key "$TURBOPUFFER_API_KEY" \
--region aws-us-east-1 \
--namespace vdbbench_mt_seed \
--multitenant-namespace-prefix vdbbench_mt_ \
--tenant-count 1000 \
--tenant-prefix tenant_ \
--tenant-id-width 4 \
--payload-profile ids_only \
--num-concurrency 40,60,80 \
--concurrency-duration 30 \
--task-label cloud-multitenant-turbopuffer-1000t
```

### CloudColdLatencyCase

**Purpose.** CloudColdLatencyCase measures first-query and cold-path latency that warm benchmark loops can hide. This matters for serverless products, storage-tiered products, idle workloads, and customer-facing applications where the first query after an idle period is visible to users.

**How it works.** The case loads and optimizes data when requested, then uses `ColdWarmSearchRunner` to run serial searches in cold and warm passes. It records cold-latency details in `additional_parameters["cold_latency"]` and also records payload profile and estimated payload bytes per query.

Example: run a pinned turbopuffer cold-latency test with scalar-label payloads.

```bash
vectordbbench turbopuffer \
--case-type CloudColdLatencyCase \
--api-key "$TURBOPUFFER_API_KEY" \
--region aws-us-east-1 \
--namespace cloud_cold_latency_scalar_label \
--pin-namespace \
--pin-replicas 2 \
--payload-profile scalar_label \
--cloud-cold-query-count 1000 \
--skip-search-concurrent \
--task-label cloud-cold-latency-turbopuffer-pinned-scalar-label
```

## Caveats

This release note introduces the new CloudLeadboard v2 direction; it is not the full benchmark report. Detailed tables, raw JSON artifacts, pricing worksheets, and edge-case analysis should live in the benchmark report or external result artifact repository.

Important caveats:

- Pricing changes over time. Cost charts need a pricing date, region, and configuration assumptions.
- Managed-service configuration can materially change results, especially for serverless scaling, pinned replicas, capacity units, and storage-tiering modes.
- "Fully indexed" and "fully searchable" readiness may be exposed differently by each vendor, so the implementation must document how each status is detected or inferred.
- The current multi-tenant case uses deterministic tenant assignment and uniform tenant routing. It does not represent every SaaS tenant distribution.
- Cold latency depends on cache state, idle window, replica pinning, storage architecture, and service warmup behavior. The idle and warmup rules must stay strict between products.
- Payload search rankings are workload-specific. IDs-only, scalar-label, vector-return, integer-filter, and label-filter runs can produce different winners.
- Cost Pareto results must be read together with recall, latency, payload profile, and readiness constraints rather than as a standalone ranking.
Loading
Loading