-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added compression NONE #1045
Added compression NONE #1045
Conversation
Can one of the admins verify this patch? |
1 similar comment
Can one of the admins verify this patch? |
if (method == static_cast<UInt8>(CompressionMethodByte::LZ4) || method == static_cast<UInt8>(CompressionMethodByte::ZSTD)) | ||
{ | ||
size_compressed = unalignedLoad<UInt32>(&own_compressed_buffer[1]); | ||
size_decompressed = unalignedLoad<UInt32>(&own_compressed_buffer[5]); | ||
} | ||
else if (method == static_cast<UInt8>(CompressionMethodByte::NONE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not differ from above.
@@ -1030,8 +1030,11 @@ MergeTreeData::AlterDataPartTransactionPtr MergeTreeData::alterDataPart( | |||
*this, part, DEFAULT_MERGE_BLOCK_SIZE, 0, 0, expression->getRequiredColumns(), ranges, | |||
false, nullptr, "", false, 0, DBMS_DEFAULT_BUFFER_SIZE, false); | |||
|
|||
auto compression_method = this->context.chooseCompressionMethod( | |||
this->getTotalActiveSizeInBytes(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks incorrect. We need to get the ratio of size of data part to size of table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean I should take this->getTotalActiveSizeInBytes() and divide it by total compressed size? I'm asking because I'm not fully aware of what methods stands for what values and I am not much familiar with CH code.
@@ -146,8 +146,12 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataWriter::writeTempPart(BlockWithDa | |||
ProfileEvents::increment(ProfileEvents::MergeTreeDataWriterBlocksAlreadySorted); | |||
} | |||
|
|||
auto compression_method = data.context.chooseCompressionMethod( | |||
data.getTotalActiveSizeInBytes(), | |||
static_cast<double>(data.getTotalCompressedSize()) / data.getTotalActiveSizeInBytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
@@ -146,8 +146,12 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataWriter::writeTempPart(BlockWithDa | |||
ProfileEvents::increment(ProfileEvents::MergeTreeDataWriterBlocksAlreadySorted); | |||
} | |||
|
|||
auto compression_method = data.context.chooseCompressionMethod( | |||
data.getTotalActiveSizeInBytes(), | |||
static_cast<double>(data.getTotalActiveSizeInBytes()) / data.getTotalCompressedSize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is little tricky. You need to provide data part size and ratio of data part size to all table size in arguments of chooseCompressionMethod
. But data part size is not known in advance, before we write it.
This logic is intended for cases when existing data will be recompressed (as in case of merges and alters). And we take data part size to make decision, do we need to recompress it.
But when writing a new part, it is meaningless.
I think, you need to pass two zeros as arguments. Compression method will be selected only if both min_part_size
and min_part_size_ratio
are zeros or was not specified in configuration for corresponding compression method.
You definitely will specify min_part_size
and min_part_size_ratio
as zeros for choosing compression method none
, because otherwise, lz4
will be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I started (my local version of code) works with single compression method (both min_part_size
and min_part_size_ratio
set to 0 in config) but then prepared a pull request so I thought if it is likely to be accepted it should respect existing configuration of "compression" tag so I thought I have to use chooseCompressionMethod as it should be. If you are OK to use chooseCompressionMethod(0,0)
I can add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
Thanks @alexey-milovidov |
Ok. I have added remaining changes. Thank you! About copy avoidance. This is definitely worth to do. If you want to avoid excessive copy, you should just point |
Also it will be nice, if you could share performance testing results. |
Yeah I can do copy-free version but this only for reading but for writes there will be still |
Ok. |
@alexey-milovidov I just reminded myself why I didn't drop |
Ok. If you are not going to use this compression method, it's not worth to implement. |
At this moment no-compression is implemented in the most naive way. This means we use
memcpy
to copy an uncompressed block to a comopressed buffer. This doesn't differ from compression expect we don't use any compressing algorithm. Ideally we could avoidmemcpy
but this will probably require deeper refactoring