Weaviate Benchmarking

This repo contains both a library for benchmarking Weaviate e2e as well as a CLI tool that makes use of the same library

Documentation for benchmarker

📊 results and context can be found in the Weaviate documentation
💬 discuss the results on our Slack channel or Twitter

ANN benchmark

Spin up two machines:

Machine description	CPU type	CPUs	Memory	Disk size	Disk type	Misc.
Machine to run Weaviate	c2	30	120GB	500GB	SSD	Ubuntu 22.04 with Docker-compose
Machine to run benchmark script	N2	8	64GB	500GB	SSD	Ubuntu 22.04 with Docker-compose

Prepare the Weaviate machine

Clone this repo and cd into it $ git clone https://github.com/semi-technologies/weaviate-benchmarking && cd weaviate-benchmarking

Run the following command to spin up Weaviate: $ docker-compose up weaviate -d

Copy the interal IP address and amount of CPU cores this machine has.

Prepare the benchmark machine

Check if the Weaviate machine is available: $ http://{IP OF WEAVIATE INSTANCE}/v1/meta. Note that the instance runs on port 8080, e.g., http://10.128.15.12:8080/v1/meta

Clone this repo and cd into it $ git clone https://github.com/semi-technologies/weaviate-benchmarking && cd weaviate-benchmarking

Download the files into a benchmark-data folder as outlined below.

$ mkdir benchmark-data && \
    curl -o ./benchmark-data/deep-image-96-angular.hdf5 http://ann-benchmarks.com/deep-image-96-angular.hdf5 && \
    curl -o ./benchmark-data/mnist-784-euclidean.hdf5 http://ann-benchmarks.com/mnist-784-euclidean.hdf5 && \
    curl -o ./benchmark-data/gist-960-euclidean.hdf5 http://ann-benchmarks.com/gist-960-euclidean.hdf5 && \
    curl -o ./benchmark-data/glove-25-angular.hdf5 http://ann-benchmarks.com/glove-25-angular.hdf5

Update the following lines in docker-compose.yml.

services:
  benchmark-ann:
      dockerfile: Dockerfile-ann # <== update this line

Update the file: ./benchmark-scripts/ann/benchmark.py. weaviate_url should be set to the Weaviate instance and CPUs should be set to the amount of CPUs on the machine running Weaviate.

Build the container: $ docker-compose build --no-cache

Run the container: $ docker-compose up benchmark-ann -d

The benchmark container will ouput files in the format: results/weaviate_benchmark__{benchmark file}__{ef constructuin}__{max connections}.json

Update the benchmark config

You can update the HNSW build config for this benchmark here.

ANN 1B benchmark

Kubernetes cluster

Follow these steps in the Weaviate docs to create a Weaviate Kubernetes cluster.

Our K8s setup:

5 pods
Per pod
- 320 RAM - 80 CPU
- SSD 960gb

Update weaviate_url in benchmark-scripts/ann-1B/benchmark.py to reflect the URL of the cluster.

Import machine

Create a machine with >= 16 CPUs, 16 GB in memory, and a 200 GB SSD. The import will run from this machine.

Clone this repo and cd into it $ git clone https://github.com/semi-technologies/weaviate-benchmarking && cd weaviate-benchmarking

Download the files into a benchmark-data folder as outlined below.

$ mkdir benchmark-data && \
    curl -o ./benchmark-data/sift-128-euclidean.hdf5 https://storage.googleapis.com/semi-technologies-public-data/sift-1B-128-euclidean.hdf5

Update the following lines in docker-compose.yml.

services:
  benchmark-ann:
      dockerfile: Dockerfile-ann1b # <== update this line

Build the container: $ docker-compose build --no-cache

Run the container: $ docker-compose up -d

The benchmark container will ouput files in the format: results/weaviate_benchmark__{benchmark file}__{ef constructuin}__{max connections}.json

Inverted index benchmark

Clone this repo and cd into it $ git clone https://github.com/semi-technologies/weaviate-benchmarking && cd weaviate-benchmarking

Spin up a beefy machine, we've used a 32 CPU, 400GB Memory, 1000 GB SSD persistent disk that has Docker installed.

services:
  benchmark-ann:
      dockerfile: Dockerfile-ii # <== update this line

ANN + inverted index benchmark

coming soon

Import transformers module benchmark

coming soon

Documentation for speed benchmarker

Once installed (see-below), the tools tries to be entirely self-documenting. Every command has a -h help option that can tell you where to go from there. For example, start with a root help command running benchmarker -h and it will print something like the following output to tell you where to go from there:

A Weaviate Benchmarker

Usage:
  benchmarker [flags]
  benchmarker [command]

Available Commands:
  dataset        Benchmark vectors from an existing dataset
  help           Help about any command
  random-text    Benchmark nearText searches
  random-vectors Benchmark nearVector searches

Flags:
  -h, --help   help for benchmarker

Use "benchmarker [command] --help" for more information about a command.

Once you picked the command you're interested in, you can again use the help command to learn about the flags, for example running benchmarker dataset -h results in the following output:

Specify an existing dataset as a list of query vectors in a .json file to parse the query vectors and then query them with the specified parallelism

Usage:
  benchmarker dataset [flags]

Flags:
  -a, --api string         The API to use on benchmarks (default "graphql")
  -c, --className string   The Weaviate class to run the benchmark against
  -f, --format string      Output format, one of [text, json] (default "text")
  -h, --help               help for dataset
  -l, --limit int          Set the query limit (top_k) (default 10)
  -u, --origin string      The origin that Weaviate is running at (default "http://localhost:8080")
  -o, --output string      Filename for an output file. If none provided, output to stdout only
  -p, --parallel int       Set the number of parallel threads which send queries (default 8)
  -q, --queries string     Point to the queries file, (.json)
  -w, --where string       An entire where filter as a string

Installation / Running the CLI

Option 1: Download a pre-compiled binary

Not supported yet, there is no CI pipeline yet that pushes artifacts

Option 2: With a local Go runtime, compiling on the fly

Print the available commands

cd benchmarker
go run . help

An example command

go run . random-vectors -c MyClass -d 384 -q 10000 -p 8 -a graphql -l 10

or the same command with the long-style flags:

go run . \
  random-vectors \
  --className MyClass \
  --dimensions 384 \
  --queries 10000 \
  --parallel 8 \
  --api graphql \
  --limit 10

Option 3: With a local Go runtime, compile and install just once

Install:

cd benchmarker && go install .

(Make sure your PATH is configured correctly to run go-install-ed binaries)

Run an example command

benchmarker random-vectors -c MyClass -d 384 -q 10000 -p 8 -a graphql -l 10

or the same command with the long-style flags:

benchmarker \
  random-vectors \
  --className MyClass \
  --dimensions 384 \
  --queries 10000 \
  --parallel 8 \
  --api graphql \
  --limit 10

Use benchmarking API programmatically

TODO

Roadmap

support random vectors
support specific vectors from json input file
print results as json
store results to file
take in ground-truth file to calculate recall
add versioning
pre-build binaries on CI and attach them to releases

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
benchmark-scripts		benchmark-scripts
benchmarker		benchmarker
.gitignore		.gitignore
Dockerfile-ann		Dockerfile-ann
Dockerfile-ann1b		Dockerfile-ann1b
Dockerfile-ii		Dockerfile-ii
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

License

weaviate/weaviate-benchmarking

Folders and files

Latest commit

History

Repository files navigation

Weaviate Benchmarking

Documentation for benchmarker

ANN benchmark

Prepare the Weaviate machine

Prepare the benchmark machine

Update the benchmark config

ANN 1B benchmark

Kubernetes cluster

Import machine

Inverted index benchmark

ANN + inverted index benchmark

Import transformers module benchmark

Documentation for speed benchmarker

Installation / Running the CLI

Option 1: Download a pre-compiled binary

Option 2: With a local Go runtime, compiling on the fly

Option 3: With a local Go runtime, compile and install just once

Use benchmarking API programmatically

Roadmap

About

Resources

License

Stars

Watchers

Forks

Languages