Description
Description
When benchmarking recently with some OLAP engines (no indexes, no stored fields, only column data), the results showed that they only occupy 50-70% of the storage of NumericDocvalues
, with comparable performance, which is surprising. I looked into their implementation and it turns out they simply use BitShuffle and LZ4 to compress data blocks on the write side, and use a global cache on the read side to cache decompressed data.
So in Lucene, we have non-compressed data (MMap) on both disk and in memory, but they have compressed data on disk and decompressed data in memory, which sounds quite reasonable to me. I believe that things like global cache can be easily done in a service (like ES) through a custom codec, but I still wonder if we can do something on our default codec?