Endianness #34

at15 · 2017-05-18T02:02:29Z

Came across the little and big endian problem when trying to start the on disk storage #32 . Looked around some existing tsdb but couldn't figure out which they are really using and why

Prometheus

the new tsdb seems to be using BigEndian https://github.com/prometheus/tsdb/blob/master/encoding_helpers.go (only have BE, no LE)
the old seems to be using LittleEndian https://github.com/prometheus/prometheus/blob/master/storage/local/persistence.go#L574
- but there is a encoding helper which only have BigEndian as well https://github.com/prometheus/prometheus/blob/master/storage/local/codable/codable.go#L113

InfluxDB

seems to be BigEndian, at least for header and checksum https://github.com/influxdata/influxdb/blob/master/tsdb/engine/tsm1/writer.go
but for the data, they don't specify encoding explicitly in https://github.com/influxdata/influxdb/blob/master/tsdb/engine/tsm1/encoding.go

SQLite

use BigEndian http://www.sqlite.org/fileformat2.html

Gob

Floating-point numbers are always sent as a representation of a float64 value. That value is converted to a uint64 using math.Float64bits. The uint64 is then byte-reversed and sent as a regular unsigned integer. The byte-reversal means the exponent and high-precision part of the mantissa go first. Since the low bits are often zero, this can save encoding bytes. For instance, 17.0 is encoded in only three bytes (FE 31 40)

Parquet

use little-endian https://github.com/Parquet/parquet-format/blob/master/Encodings.md

Arrow

use little-endian https://github.com/apache/arrow/blob/master/format/Layout.md

BDB

https://docs.oracle.com/cd/E17275_01/html/programmer_reference/am_misc_faq.html
use little endian result in bad performance because BDB sort integer as byte strings

small endian system

254	fe 0 0 0
255	ff 0 0 0
256	 0 1 0 0
257	 1 1 0 0

sort badly

big endian system

254	0 0 0 fe
255	0 0 0 ff
256	0 0 1 0
257	0 0 1 1

The text was updated successfully, but these errors were encountered:

at15 · 2017-06-10T00:07:17Z

currently using big endian for all the numbers that are read/write manually, for meta data, use protobuf (should be little endian I suppose), for data blocks, use varint (should be little endian)

at15 added disk help wanted question labels May 18, 2017

at15 added a commit that referenced this issue May 18, 2017

[play][disk] Encounter Endianness problem #34

60e5adc

at15 added the backlog label Jun 10, 2017

at15 added this to the 0.2.0 milestone Jun 10, 2017

at15 closed this as completed Jun 10, 2017

at15 added this to BACKLOG in Local disk storage Jun 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Endianness #34

Endianness #34

at15 commented May 18, 2017 •

edited

at15 commented Jun 10, 2017

Endianness #34

Endianness #34

Comments

at15 commented May 18, 2017 • edited

at15 commented Jun 10, 2017

at15 commented May 18, 2017 •

edited