Added compression NONE #1045

prog8 · 2017-07-31T12:41:45Z

At this moment no-compression is implemented in the most naive way. This means we use memcpy to copy an uncompressed block to a comopressed buffer. This doesn't differ from compression expect we don't use any compressing algorithm. Ideally we could avoid memcpy but this will probably require deeper refactoring

robot-metrika-test · 2017-07-31T12:42:03Z

Can one of the admins verify this patch?

robot-metrika-test · 2017-07-31T12:44:02Z

Can one of the admins verify this patch?

alexey-milovidov · 2017-07-31T17:41:19Z

dbms/src/IO/CompressedReadBufferBase.cpp

    if (method == static_cast<UInt8>(CompressionMethodByte::LZ4) || method == static_cast<UInt8>(CompressionMethodByte::ZSTD))
    {
        size_compressed = unalignedLoad<UInt32>(&own_compressed_buffer[1]);
        size_decompressed = unalignedLoad<UInt32>(&own_compressed_buffer[5]);
    }
+    else if (method == static_cast<UInt8>(CompressionMethodByte::NONE))


This does not differ from above.

alexey-milovidov · 2017-07-31T17:47:59Z

dbms/src/Storages/MergeTree/MergeTreeData.cpp

@@ -1030,8 +1030,11 @@ MergeTreeData::AlterDataPartTransactionPtr MergeTreeData::alterDataPart(
            *this, part, DEFAULT_MERGE_BLOCK_SIZE, 0, 0, expression->getRequiredColumns(), ranges,
            false, nullptr, "", false, 0, DBMS_DEFAULT_BUFFER_SIZE, false);

+        auto compression_method = this->context.chooseCompressionMethod(
+            this->getTotalActiveSizeInBytes(),


Looks incorrect. We need to get the ratio of size of data part to size of table.

Does it mean I should take this->getTotalActiveSizeInBytes() and divide it by total compressed size? I'm asking because I'm not fully aware of what methods stands for what values and I am not much familiar with CH code.

alexey-milovidov · 2017-07-31T17:48:19Z

dbms/src/Storages/MergeTree/MergeTreeDataWriter.cpp

@@ -146,8 +146,12 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataWriter::writeTempPart(BlockWithDa
            ProfileEvents::increment(ProfileEvents::MergeTreeDataWriterBlocksAlreadySorted);
    }

+    auto compression_method = data.context.chooseCompressionMethod(
+        data.getTotalActiveSizeInBytes(),
+        static_cast<double>(data.getTotalCompressedSize()) / data.getTotalActiveSizeInBytes());


alexey-milovidov · 2017-08-01T17:38:11Z

dbms/src/Storages/MergeTree/MergeTreeDataWriter.cpp

@@ -146,8 +146,12 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataWriter::writeTempPart(BlockWithDa
            ProfileEvents::increment(ProfileEvents::MergeTreeDataWriterBlocksAlreadySorted);
    }

+    auto compression_method = data.context.chooseCompressionMethod(
+        data.getTotalActiveSizeInBytes(),
+        static_cast<double>(data.getTotalActiveSizeInBytes()) / data.getTotalCompressedSize());


This is little tricky. You need to provide data part size and ratio of data part size to all table size in arguments of chooseCompressionMethod. But data part size is not known in advance, before we write it.

This logic is intended for cases when existing data will be recompressed (as in case of merges and alters). And we take data part size to make decision, do we need to recompress it.
But when writing a new part, it is meaningless.

I think, you need to pass two zeros as arguments. Compression method will be selected only if both min_part_size and min_part_size_ratio are zeros or was not specified in configuration for corresponding compression method.

You definitely will specify min_part_size and min_part_size_ratio as zeros for choosing compression method none, because otherwise, lz4 will be used.

Honestly, I started (my local version of code) works with single compression method (both min_part_size and min_part_size_ratio set to 0 in config) but then prepared a pull request so I thought if it is likely to be accepted it should respect existing configuration of "compression" tag so I thought I have to use chooseCompressionMethod as it should be. If you are OK to use chooseCompressionMethod(0,0) I can add it.

alexey-milovidov

.

prog8 · 2017-08-01T20:09:02Z

Thanks @alexey-milovidov

alexey-milovidov · 2017-08-01T20:12:58Z

Ok. I have added remaining changes. Thank you!

About copy avoidance. This is definitely worth to do.
For example, look at CompressedReadBufferFromFile::nextImpl.
This method prepares buffer for decompressed data (memory) and sets working_buffer to point to it. Then decompresses into working_buffer.

If you want to avoid excessive copy, you should just point working_buffer to range inside "compressed" data (all data without header).

alexey-milovidov · 2017-08-01T20:14:18Z

Also it will be nice, if you could share performance testing results.
Both total numbers (query execution speed) and perf listings are interesting!

prog8 · 2017-08-01T20:34:52Z

Yeah I can do copy-free version but this only for reading but for writes there will be still memcpy because of hash function (checksum).
I think I will not use non-compression version in production use because it turns out I will waste too much disk space so I cannot afford speeding up queries in favor of storage usage. I think it is better to invest more in CPU cores and get the same results in terms of query speed.
Since I started playing with this I decided to push even small change instead of abandoning it so maybe someone will make a use of it.

alexey-milovidov · 2017-08-01T20:38:24Z

Ok.

prog8 · 2017-08-02T09:16:54Z

@alexey-milovidov I just reminded myself why I didn't drop memcpy for decompression. It is not a low hanging fruit task because CompressedReadBuffer allocates memory so I'd have to refactor CompressedReadBuffer. In addition to that I will have to change CompressedReadBufferBase::decompress to accept pointer to pointer instead of char * to. It is all doable but I didn't feel comfortable with making bigger changes.

alexey-milovidov · 2017-08-03T12:40:52Z

Ok. If you are not going to use this compression method, it's not worth to implement.

prog8 added 3 commits July 31, 2017 12:44

Added compression NONE

a0cc544

Use compression method from configs

e1ab721

Merge branch 'master' into nocompression

3a6c444

alexey-milovidov added 5 commits July 31, 2017 20:40

Update CompressedReadBufferBase.cpp

3d2ec76

Update CompressedReadBufferBase.cpp

5024488

Update CompressedWriteBuffer.cpp

fdb7e08

Update MergeTreeData.cpp

4877aa3

Update MergeTreeDataWriter.cpp

d174ebc

alexey-milovidov requested changes Jul 31, 2017

View reviewed changes

prog8 added 2 commits August 1, 2017 10:12

Applied changes requested by Alexey

9f79982

Applied changes requested by Alexey

d6a2056

alexey-milovidov reviewed Aug 1, 2017

View reviewed changes

alexey-milovidov requested changes Aug 1, 2017

View reviewed changes

Update MergeTreeDataWriter.cpp

2ae6f1e

alexey-milovidov merged commit ae8783a into ClickHouse:master Aug 1, 2017

prog8 deleted the nocompression branch August 2, 2017 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added compression NONE #1045

Added compression NONE #1045

prog8 commented Jul 31, 2017

robot-metrika-test commented Jul 31, 2017

robot-metrika-test commented Jul 31, 2017

alexey-milovidov Jul 31, 2017

alexey-milovidov Jul 31, 2017

prog8 Aug 1, 2017

alexey-milovidov Jul 31, 2017

alexey-milovidov Aug 1, 2017

prog8 Aug 1, 2017

alexey-milovidov left a comment

prog8 commented Aug 1, 2017

alexey-milovidov commented Aug 1, 2017

alexey-milovidov commented Aug 1, 2017

prog8 commented Aug 1, 2017

alexey-milovidov commented Aug 1, 2017

prog8 commented Aug 2, 2017 •

edited

Loading

alexey-milovidov commented Aug 3, 2017

Added compression NONE #1045

Added compression NONE #1045

Conversation

prog8 commented Jul 31, 2017

robot-metrika-test commented Jul 31, 2017

robot-metrika-test commented Jul 31, 2017

alexey-milovidov Jul 31, 2017

Choose a reason for hiding this comment

alexey-milovidov Jul 31, 2017

Choose a reason for hiding this comment

prog8 Aug 1, 2017

Choose a reason for hiding this comment

alexey-milovidov Jul 31, 2017

Choose a reason for hiding this comment

alexey-milovidov Aug 1, 2017

Choose a reason for hiding this comment

prog8 Aug 1, 2017

Choose a reason for hiding this comment

alexey-milovidov left a comment

Choose a reason for hiding this comment

prog8 commented Aug 1, 2017

alexey-milovidov commented Aug 1, 2017

alexey-milovidov commented Aug 1, 2017

prog8 commented Aug 1, 2017

alexey-milovidov commented Aug 1, 2017

prog8 commented Aug 2, 2017 • edited Loading

alexey-milovidov commented Aug 3, 2017

prog8 commented Aug 2, 2017 •

edited

Loading