Starling

In this repository, we share the implementations and experiments of our work Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment (arXiv).

It contains the following features:

For build,

Build disk graph.
Build in-memory navigation graph, based on
1. Nodes that are uniformly-sampled.
2. Nodes that are generated by search frequency.
Perform Graph Partition on given base data

For search,

	With Cache Nodes	With Nav Graph	With Graph Partition	With `use_ratio`	Use SQ
Beam Search	✅	✅
Page Search	✅	✅	✅	✅	✅

Datasets

The datasets we used in the experiments can be downloaded and the data formats are explained in NeurIPS'21 Big-ANN Benchmark.

Dataset	Data type	Dimensions	Distance	# Query	Query type
BIGANN	uint8	128	L2	10000	ANNS/RS
DEEP	float	96	L2	10000	ANNS/RS
SSNPP	uint8	256	L2	100000	RS
Text2image	float	200	IP	100000	ANNS

Quick Start

To install dependencies, run

apt install build-essential libboost-all-dev make cmake g++ libaio-dev libgoogle-perftools-dev clang-format libboost-all-dev libmkl-full-dev

To run benchmarks, go to scripts directory, copy config_sample.sh to config_local.sh, modifies the datasets paths in config_dataset.sh and run

./run_benchmark.sh [debug/release] [build/build_mem/freq/gp/search] [knn/range]

Arguement	Description
`debug/release`	Debug/Release mode to run, passed to CMake
`build`	Build index
`build_mem`	Build memory index
`freq`	Generate visit-frequency file
`gp`	Graph partition given index file
`search`	Search index
`knn`	Find k-nearest neighbors
`range`	Range search

Configure datasets and parameters in config_local.sh

To run experiments with multiple segments and large-scale, please visist https://github.com/PwzXxm/segment-framework.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.github/workflows		.github/workflows
appendix		appendix
gperftools @ fe85bbd		gperftools @ fe85bbd
graph_partition @ ee8c04d		graph_partition @ ee8c04d
include		include
scripts		scripts
src		src
tests		tests
tests_data		tests_data
windows		windows
workflows		workflows
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
NOTICE.txt		NOTICE.txt
README.md		README.md
SECURITY.md		SECURITY.md
_clang-format		_clang-format
unit_tester.sh		unit_tester.sh

License

zilliztech/starling

Folders and files

Latest commit

History

Repository files navigation

Starling

Datasets

Quick Start

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages