A simple C++ library providing a few variations of Bloom filters.
Three types of Bloom filters are supported:
- Ordinary BFs
- Counting BFs
- Paired BFs (as seen in Mick et al.)
The following operations are supported on all types:
There are also a few operations supported by particular types:
- Counting and paired BFs also support a Delete operation
- Ordinary and paired BFs also support a Union operation
- Counting BFs support conversion into ordinary BFs
- Ordinary BFs support conversion into paired BFs
- Ordinary BFs can be compressed using the halving method (as seen in Wang et al.)
A few additional helper mechanisms may eventually be implemented:
- A helper to compute the optimal BF parameters given the number of content objects to be indexed
- Helpers to compute the probabilities of false positives for queries on existing populated BFs
Doxygen documentation can be compiled with
Everything lives in the namespace
These classes are templated, so you must specify the type of the object you are indexing when you instantiate a Bloom filter. For example:
OrdinaryBloomFilter<std::string> bf(4, 32);
This creates an ordinary BF with 4 hashes and a 32-bit array, capable of indexing strings.
You must specialize
HashParams<T> for each type
T you wish to construct a BF for. The
HashParams type is a structure consisting of the template parameter
T and a
uint8_t serves as a salt, to allow multiple hashes to be generated for a single object (as per the semantics of a BF).
A helper class implementing a 32-bit FNV-1 hash is given in
FnvHash.hpp. An example of how to specialize
std::hash using it can be found in
To insert an object
o into the BF, call
bf.Insert(o), and to check for existence of an object, call
bf.Query(o). If using a CountingBloomFilter or PairedBloomFilter, you can remove items using
To serialize a BF into a
bf.Serialize(os). To deserialize a BF from a
is, use the static function
Deserialize(is) within the appropriate BF class.
For information about the other operations, refer to the Doxygen documentation or read the comments in the code.