Since: Tarantool EE 3.7
MemCS engine is populated with dictionary-encoding columns. They can be
created with layout option (here should be link to layout doc page):
local s = box.schema.create_space('test', {
engine = 'memcs', format = format, field_count = field_count,
})
local pk = s:create_index('pk', {layout = 'dict'})
Only string non-key columns can use this layout. Other columns silently
ignore this option.
The maximal amount of unique values in such column (in other words,
maximal dictionary size) is UINT16_MAX (65536). When it's full, writes
of new unique values fail with an error. All indexes occupy 2 bytes
(uint16 type is used for indexes under the hood). Hence, such column
will occupy 2 * space_size + dict_size amount of bytes.
The dictionary is accounted in the space:bsize() statistics.
ArrowStream of dictionary-encoded columns always return values in
dictionary-encoded Arrow layout. The dictionary are returned in
string-view layout, the indexes have uint16 type. When Arrow Stream
is used, we have some guarantees for dictionaries:
- Unless the space was populated with a new unique value, all batches
have the same dictionary.
- Dictionaries are not copied so their dump to ArrowArray is cheap.
- Dictionary can only grow, so values in the middle of the dictionary
will never be deleted. Hence, after dictioniary was changed, it can
be used for batches used old dictioinary.
Requested by @drewdzzz in https://github.com/tarantool/tarantool-ee/commit/cd7bd1ca233b1db4211f57439ef08364d4d27c6b.
Since: Tarantool EE 3.7
MemCS engine is populated with dictionary-encoding columns. They can be
created with
layoutoption (here should be link tolayoutdoc page):Only string non-key columns can use this layout. Other columns silently
ignore this option.
The maximal amount of unique values in such column (in other words,
maximal dictionary size) is UINT16_MAX (65536). When it's full, writes
of new unique values fail with an error. All indexes occupy 2 bytes
(
uint16type is used for indexes under the hood). Hence, such columnwill occupy
2 * space_size + dict_sizeamount of bytes.The dictionary is accounted in the
space:bsize()statistics.ArrowStream of dictionary-encoded columns always return values in
dictionary-encoded Arrow layout. The dictionary are returned in
string-view layout, the indexes have
uint16type. When Arrow Streamis used, we have some guarantees for dictionaries:
have the same dictionary.
will never be deleted. Hence, after dictioniary was changed, it can
be used for batches used old dictioinary.
Requested by @drewdzzz in https://github.com/tarantool/tarantool-ee/commit/cd7bd1ca233b1db4211f57439ef08364d4d27c6b.