space-efficient immutable integer sets
Latest commit 63fbe23 Oct 16, 2014 @ztellman Merge pull request #11 from danielcompton/fix-profile-ordering
Fix profile ordering, add 1.6 final and 1.7-alpha2 profiles

Project Moved to Contrib!

ztellman/immutable-bitset is now clojure/

Immutable Integer Sets

Clojure's immutable sets are great, but they can be hugely inefficient when we're only trying to store integers. This library contains an implementation for immutable sets which are both faster and smaller for this special case.


[immutable-bitset "0.1.6"]

All functions are in the immutable-bitset namespace. There are two constructors, sparse-bitset and dense-bitset, and three operators, union, intersection, and difference.

immutable-bitset> (sparse-bitset)
immutable-bitset> (conj *1 1 2 3)
#{1 2 3}
immutable-bitset> (disj *1 2)
#{1 3}
immutable-bitset> (sparse-bitset [1 2 3])
#{1 2 3}
immutable-bitset> (transient *1)
#<TransientBitSet immutable_bitset.TransientBitSet@bee743d>
immutable-bitset> (disj! *1 2)
#<TransientBitSet immutable_bitset.TransientBitSet@2aa42d42>
immutable-bitset> (conj! *1 5)
#<TransientBitSet immutable_bitset.TransientBitSet@7c079ab0>
immutable-bitset> (persistent! *1)
#{1 3 5}
immutable-bitset> (union *1 (sparse-bitset [5 7 9]))
#{1 3 5 7 9}

The set algebra operators take two bitsets, and return a bitset. Use clojure.set for general set operations.

dense-bitset behaves the same as sparse-bitset, the difference is only in their memory efficiency. Consider a case where we create a set of all numbers between one and one million:

(def s (range 1e6))

(into #{} s)              ; ~100mb
(into (sparse-bitset) s)  ; ~1mb
(into (dense-bitset) s)   ; ~150kb

Both of these are significantly smaller than the standard set, but the dense bitset is almost an order of magnitude smaller than the sparse variant. This is because the dense bitset allocates larger contiguous chunks, which is great if the numbers are densely clustered. However, if the numbers are sparse:

(def s (map (partial * 1e6) (range 1e6)))

(into #{} s)              ; ~100mb
(into (sparse-bitset) s)  ; ~130mb
(into (dense-bitset) s)   ; ~670mb

In this case, the dense bitset is much less efficient than the standard set, while the sparse bitset is about equally large. So as a rule of thumb, use dense-bitset where the elements are densely clustered (each element has multiple elements within +/- 1000), and sparse-bitset for everything else.

The bitsets are both somewhat faster than the standard set, as well; adding elements to the transient bitset can be ~50-100% faster than a normal set.


Copyright © 2013 Zachary Tellman

Distributed under the MIT License