Binary JSON Serialization
Clone this wiki locally
Voldemort supports pluggable serialization, including one serialization type which is custom to voldemort called, somewhat misleadingly, “json”.
This serialization type uses a JSON data model but a more compact byte format, and also checks data against an expected schema for type correctness.
int8, int16, int32, int64 – big endian signed integers.
float32, float64 – IEEE 754 floating-point format
Each of these takes the given number of bits to encode (i.e. int16 uses 16 bits).
Strings are stored as a length which encodes the number of characters in the number, N, of characters in the string followed by N UTF8 bytes. Strings are really just arrays of bytes (see below).
Dates are stored as int64 unix timestamps.
Primitive types are denoted in a schema by a JSON string with the type name, for example “int8”, “float64”, “string”, or “date”.
Compound types are recursive types composed of primitive types and other compound types, so for example an Object with Array fields is legitimate.
An object is denoted in a schema by a JSON object. For example the following object has two fields, and int8 ‘foo’, and a string ‘bar’:
An object is stored in the following format
The nullness byte is an int8 with value -1 if the object is null and 1 otherwise.
The entries are the fields of the object stored in alphabetical order by name, each stored in the format described elsewhere in this document.
Arrays are homogeniously typed sequences of values. They can contain primitives like numbers and strings or objects. The format is a length N followed by N entries of the given type.
Lengths are not a type but are used to specify the number of bytes in a string or the number of items in an object. A length is encoded as follows:
if the length is less than 2^15 it is stored as an int16. If the length is greater than 2^15 it is stored as a int32 with the first two bits set to 1.