HyperLogLog

HyperLogLog Value Format (uint32)

|<empty> (8bit) |  rho (8bit) |  reg_id (16bit)

HyperLogLog Vector (Dense/Sparse Interleaving) Format

each hll output value in dense format takes 16384 bytes
each hll output value in sparse format takes numNonZeroRegisters * 4 bytes
total size of hll output vector: 16384 bytes * numDenseFormattedDims + 4 * numNonZeroRegisters * numSparseFormattedDims
threshold for determining dense/sparse format: 4096 non zero registers since 4096 * 4 = 16384
we only allocate the hll output vector after then end of last batch after we found out the number of non zero registers for each dimension

HyperLogLog Query Result Format

HyperLogLog query result format defines how we serialize/deserialize hyperloglog query results when client specify "Accept" as "application/hll". Its main purpose is to reduce the memory footprint of the result and let query client deserialize the results into a more meaningful format. The magic number can be used as a version number to support the new format.

Sparse values and dense values are interleaving with each other with a count vector to point to count of how many register IDs are in a single dim value row. If this count is less than 4kb, we think it's a sparse value. Otherwise, we will take it as a dense value.

Hyperloglog query results are serialized in the following format:

 [uint32] magic_number [uint32] padding

-----------query result 0-------------------
 <header> 
 [uint32] query result 0 size [uint8] error or result [3 bytes padding]
 [uint8] num_enum_columns [uint8] num dims per width ... [padding for 8 bytes]
 [uint32] result_size [uint32] raw_dim_values_vector_length
 [uint8] dim_index_0... [uint8] dim_index_n [padding for 8 bytes]
 [uint32] data_type_0...[uint32] data_type_n [padding for 8 bytes]

 <enum cases 0>
 [uint32_t] number of bytes of enum cases [uint16] column_index [2 bytes: padding]
 <enum values 0> delimited by "\u0000\n" [padding for 8 bytes]
 <end of header>
 <raw dim values vector>
 ...
 [padding for 8 byte alignment]
<count vector> 2 bytes * result_size
 [padding for 8 byte alignment]
 <raw hll value vector (sparse and dense)>
 ...
------------error 1----------
 [uint32] query result 1 size  [uint8] error or result [3 bytes padding] 
...

HyperLogLog Query Engine Aggregation Step

** Sort ** current batch using combined hash (higher 48bit from dimension value hash + 16bit reg_id from hll value)
Reduce current batch using same hash produced from sort step as key and max as reduction function
Merge previous batch results and current batch results using hash value (asce) + hll value (desc) as key
(last batch only) Create hll vector
copy dimension (same as regular aggregations)

Home

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HyperLogLog

HyperLogLog Value Format (uint32)

HyperLogLog Vector (Dense/Sparse Interleaving) Format

HyperLogLog Query Result Format

HyperLogLog Query Engine Aggregation Step

AresDB Documentation

Administration

Design Docs

Clone this wiki locally