Skip to content

Compression cache of numeric docvalues #14803

Open
@gf2121

Description

@gf2121

Description

When benchmarking recently with some OLAP engines (no indexes, no stored fields, only column data), the results showed that they only occupy 50-70% of the storage of NumericDocvalues, with comparable performance, which is surprising. I looked into their implementation and it turns out they simply use BitShuffle and LZ4 to compress data blocks on the write side, and use a global cache on the read side to cache decompressed data.

So in Lucene, we have non-compressed data (MMap) on both disk and in memory, but they have compressed data on disk and decompressed data in memory, which sounds quite reasonable to me. I believe that things like global cache can be easily done in a service (like ES) through a custom codec, but I still wonder if we can do something on our default codec?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions