Release BitMagic v.9.1.0 · tlk00/BitMagic

This update to the BitMagic library adds the sparse_vector_float container, designed to store both sparse and dense floating-point data in a succinct format that reduces memory overhead, with additional compression when serialized to disk. Alongside this new container, the update adds support for serialization and deserialization of this new container, as well as optimized search capabilities integrated directly into the existing sparse_vector_scanner class. Some use cases for a sparse_vector_float include storing geospatial data such as elevations or temperatures, machine learning, sparse graphs, and storing time-series such as financial data.

New Classes:

In this update the following classes have been added

sparse_vector_float
sparse_vector_float_serial_layout
sparse_vector_float_serializer
sparse_vector_float_deserializer
These are wrapper classes for the sparse vector class type, and allow for the storage of 32-bit floats.

As floats do not have an unsigned version, they cannot be stored in a normal sparse vector. The sparse_vector_float series of classes allows for the storage of these floats by splitting received floats into their sign, exponent, and mantissa. The exponent and mantissa are stored in two sparse vectors, the type of which is given by the user, and the sign is stored in a bit vector of the same type as the given sparse vectors.

This allows for the storage of floats in a single sparse vector object, and instead of the user having to split a float and store it in 3 separate containers themselves, they can instead store it in a single sparse_vector_float object and have operations on that sparse_vector_float affect all 3 inner containers simultaneously.

The sparse_vector_float_serial_layout, sparse_vector_float_serializer, and the sparse_vector_float_deserializer are used for serialization and deserialization of sparse_vector_float objects, and are largely used in the same way as the sparse vector version of the classes.

New Functions:

The following functions have been added:
sparse_vector_float_serialize()
sparse_vector_float_deserialize()
These functions are the same as their sparse vector counterparts but use the sparse_vector_float serializer/deserializer’s

New Methods:

The following methods have been added:

sparse_vector_scanner::find_gt_float
sparse_vector_scanner::find_lt_float
sparse_vector_scanner::find_ge_float
sparse_vector_scanner::find_le_float
sparse_vector_scanner::find_range_float

These methods allow for a scanner to find the greater than, less than, greater than or equal to, less than or equal to, and a specific range of values in a sparse_vector_float object, allowing easier parsing of sparse_vector_floats.

New Samples:

The following samples have been created on how to use the sparse_vector_float class

svfsample01 - Basic operations(import, set, get, optimize, etc.)
svfsample02 - Serialization and Const iterator operations
svfsample03 - Range methods(clear, compare, equal, join, merge)
svfsample04 - Extracting from a sparse_vector_float(decode, extract, gather, extract_range)

Comparing a std::vector to sparse_vector_float objects:

Memory Size Comparison between
Calculated using Bitmagic’s calc_stat::memory_used

std::vector
SVF - sparse_vector_float using a sparse_vector
- With Linear data going from -N/2 * .00123 to N/2 * .00123
- With Random data going from -1,000,000 to 1,000,000
RSC SVF - sparse_vector_float using an rsc_sparse_vector with 35% NULL’s
- With Linear data going from -N/2 * .00123 to N/2 * .00123
- With Random data going from -1,000,000 to 1,000,000

Container	100,000 Elements(bytes)	1,000,000 Elements(bytes)	10,000,000 Elements(bytes)
std::vector	400,024	4,000,024	40,000,024
SVF Linear	338,176	1,819,648	12,703,488
SVF Random	539,232	3,870,464	36,430,080
RSC SVF Linear	266,256	1,558,288	11,536,280
RSC SVF Random	351,344	2,765,584	26,786,576

Serialized Memory Size Comparison

Container	100,000 Elements(bytes)	1,000,000 Elements(bytes)	10,000,000 Elements(bytes)
std::vector	400,024	4,000,024	40,000,024
SVF Linear	191,310	956,513	6,890,859
SVF Random	375,352	3,470,964	34,693,378
RSC SVF Linear	149,633	1,166,626	9,122,900
RSC SVF Random	254,964	2,546,823	25,480,286

Search Comparison between Data structures with 10,000,000 elements

Linear Data
- Data in Data Structures
  - 10,000,000 linear elements
  - Range of -6150 to 6150 or ( -N/2 * .00123 to N/2 * .00123)
- Search Info
  - 1000 searches performed on 1000 ranges
  - Ranges are 2 random numbers from -6150 to 6150
- Legend
  - Linearly search on an std::vector
  - sparse_vector_scanner on a sparse_vector_float using a sparse_vector
  - Using the Const Iterator of a sparse_vector_float using a sparse_vector
  - sparse_vector_scanner on a sparse_vector_float using an rsc_sparse_vector
  - Using the Const Iterator of a sparse_vector_float using an rsc_sparse_vector
Random data
- Data in Data structures
  - 10,000,000 random numbers
  - Range of -1,000,000 to 1,000,000
- Search Info
  - 1000 searches performed on 1000 ranges
  - Ranges are 2 random numbers from -1,000,000 to 1,000,000
- Legend
  - Same as Linear Data

Query Method / Dataset	Linear Data(sec)	Random Data(sec)
std::vector	18.6000 s	41.1800 s
sparse_vector_scanner	0.8433 s	5.1880 s
Const Iterator	112.8000 s	239.2800 s
std::vector(35% NaN)	24.0400 s	32.9500 s
RSC sparse_vector_scanner(35% Null)	106.6200 s	230.5200 s
RSC Const Iterator(35% Null)	172.3200 s	268.3200 s

Fixes:

General:

Fixed the use of chrono_taker<> in the perf.cpp and t.cpp as it was used without the <>, causing errors.
Fixed compilation warnings in bmaggregator.h, bmblocks.h, bmfunc.h, bmserial.h, perf.cpp, and t.cpp
Updated CMakeLists.txt to not use -march=native by default for non-x86 builds, and to instead use -O2

sparse_vector_scanner

Fixed an error where in the find_gt_horizontal() method a duplicate of a given bvector was created, but if the given bvector was read only it would copy the read only, causing an error. Changed how it was copying the bvector to using bit_or()

t.cpp:

Fixed an error in the TestCompressSparseVector() test where in the Compressed load() stress test cmp 3 was testing csv2 instead of csv1
Fixed an error where in DetailedCheckCompressedDecode() where an assert would fail if given an empty rsc_sparse_vector, despite some tests being an empty rsc_sparse_vector

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BitMagic v.9.1.0

Choose a tag to compare

Sorry, something went wrong.