Go package implementing Bloom filters
Latest commit c358096 Sep 16, 2016 @willf committed on GitHub Merge pull request #29 from AutoRoute/master
Add equal function.
Failed to load latest commit information.
.travis.yml add Travis CI support Feb 25, 2016
LICENSE.txt update license.txt Nov 24, 2014
README.md Update README.md Feb 25, 2016
bloom.go Merge pull request #29 from AutoRoute/master Sep 16, 2016
bloom_test.go Add equal function. Sep 8, 2016


Bloom filters

Build Status

A Bloom filter is a representation of a set of n items, where the main requirement is to make membership queries; i.e., whether an item is a member of a set.

A Bloom filter has two parameters: m, a maximum size (typically a reasonably large multiple of the cardinality of the set to represent) and k, the number of hashing functions on elements of the set. (The actual hashing functions are important, too, but this is not a parameter for this implementation). A Bloom filter is backed by a BitSet; a key is represented in the filter by setting the bits at each value of the hashing functions (modulo m). Set membership is done by testing whether the bits at each value of the hashing functions (again, modulo m) are set. If so, the item is in the set. If the item is actually in the set, a Bloom filter will never fail (the true positive rate is 1.0); but it is susceptible to false positives. The art is to choose k and m correctly.

In this implementation, the hashing functions used is a local version of FNV, a non-cryptographic hashing function, seeded with the index number of the kth hashing function.

This implementation accepts keys for setting and testing as []byte. Thus, to add a string item, "Love":

uint n = 1000
filter := bloom.New(20*n, 5) // load of 20, 5 keys

Similarly, to test if "Love" is in bloom:

if filter.Test([]byte("Love"))

For numeric data, I recommend that you look into the binary/encoding library. But, for example, to add a uint32 to the filter:

i := uint32(100)
n1 := make([]byte, 4)
binary.BigEndian.PutUint32(n1, i)

Finally, there is a method to estimate the false positive rate of a particular bloom filter for a set of size n:

if filter.EstimateFalsePositiveRate(1000) > 0.001 

Given the particular hashing scheme, it's best to be empirical about this. Note that estimating the FP rate will clear the Bloom filter.

Discussion here: Bloom filter

Godoc documentation: https://godoc.org/github.com/willf/bloom