CLANN is an algorithm for solving the Nearest Neighbors problem, built on top of PUFFINN (Parameterless and Universal FInding of Nearest Neighbors). Rather than constructing a single index, CLANN first divides the dataset into clusters and then builds a separate PUFFINN index for each cluster.
-
Similarity Measures
- Cosine Similarity
-
Search Options
- k-nearest neighbor search
- Configurable recall targets
-
Performance Metrics
- Distance computation tracking
- Memory usage monitoring
- Build and search time measurements
- Per-cluster statistics
-
Serialization Support
- HDF5-based storage
- Versioned index format
The algorithm requires several dependencies for compilation and execution:
- Clang 9.0 or greater
- OpenMP installation
- HDF5 library
- CMake (>= 3.10)
- Rust toolchain (2021 edition or newer)
If you have Nix installed:
nix develop
-
Clone the Repository
git clone https://github.com/your-username/clann.git cd clann
-
Build the Project
cargo build --release
-
Run Benchmark, you can run comparisons between PUFFINN and CLANN in terms of distance computations, modify the parameters and the dataset in
benches/configs.json
and run:cargo bench --bench=distance_benches
use clann::{init_with_config, Config, MetricsOutput};
use ndarray::Array2;
fn main() {
// Create configuration
let config = Config{
num_tables: 84,
num_clusters_factor: 0.4,
k: 10,
delta: 0.9,
dataset_name: "glove-25-angular".to_owned(),
metrics_output: MetricsOutput::DB,
};
// Initialize and build index
let data = Array2::random((10000, 128));
let mut index = init_with_config(data, config).unwrap();
build(&mut index).unwrap();
// Perform search
let query = vec![0.1; 128];
let results = search(&mut index, &query).unwrap();
}
We welcome contributions! Please see our Contributing Guidelines for details on:
- Code style
- Testing requirements
- Pull request process
- Development setup
This project is licensed under the MIT License - see the LICENSE file for details.