New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VReplication: Support MySQL Binary Log Transaction Compression #12950
VReplication: Support MySQL Binary Log Transaction Compression #12950
Conversation
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
If a new flag is being introduced:
If a workflow is added or modified:
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
We needed more random data in order to generate the necessary binlog size when compression is enabled. Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
f115ab4
to
2b69ba7
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
a9057b1
to
8f760b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice :-)
Similar to VStreamerEventsStreamed
, should we add a VStreamerCompressedEventsStreamed
and possibly metrics for bytes before/after compression for visibility? Not sure if it will add value, so just thinking aloud here.
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Good idea. We don't stream them, but we decode them and stream the internal events. So I added a metric for I'll open a docs PR to add that here: |
Signed-off-by: Matt Lord <mattalord@gmail.com>
} | ||
|
||
// Create a reader that caches decompressors. | ||
var zstdDecoder, _ = zstd.NewReader(nil, zstd.WithDecoderConcurrency(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this concurrency safe? Would we ever have multiple decodings happening in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah wait, looks like https://pkg.go.dev/github.com/klauspost/compress/zstd#NewReader answers this as we're using DecodeAll
.
Would it be useful at some point to do streaming mode? Since from what I understand a single encrypted entry can contain multiple events, we would already emit events while decompressing so we don't have to decompress everything into memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using the streaming mode at first and it's pretty slow. It would make sense to potentially switch to that based on the uncompressed payload size.... maybe we switch to streaming if it's > 128MiB or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattlord Hmm, interesting. How slow? Seems like maybe something we should dig into at some point but it doesn't have to hold up this PR I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not THAT slow, but slow enough that it caused failures in the OnlineDDL VRepl stress tests — which have a bunch of threads executing small transactions against the DB while the workflow is running — as the vstream could not keep up enough to perform the final cutover before hitting the cutover timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattlord That reminds me. I think we should update to the latest https://github.com/klauspost/compress as well here, looks like there recently were quite a few ztsd
improvements.
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Thanks for the review, @dbussink ! I think that I've now addressed most of your comments. |
go/mysql/binlog_event_compression.go
Outdated
// At what size should we switch from the in-memory buffer | ||
// decoding to streaming mode -- which is slower, but does not | ||
// require everything be done in memory. | ||
const zstdInMemoryDecompressorMaxSize = 128 << (10 * 2) // 128MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should make this a vttablet flag? ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments. I realize you've turned some magic numbers into constants, but while we're here, the current codebase makes false assumptions when parsing the events (it assumes CRC32
is enabled when it might not be).
Is there a potential to add a unit test file in go/mysql
to test the new TransactionPayload
parsing?
go/mysql/binlog_event_common.go
Outdated
// Offset from 0 where the 4 byte length is stored. | ||
binlogEventLenOffset = 9 | ||
// Byte length of the checksum suffix. | ||
binlogChecksumLen = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename this as binlogCRC32ChecksumLen
; there are potentially, even if not practically, other tpyes of checksums with different lengths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make all of the above public constants; I have a use case to use those values externally (and I have, like you, created constants for those outside Vitess).
go/mysql/binlog_event_common.go
Outdated
// dataBytes returns the event bytes without header prefix and without checksum suffix | ||
func (ev binlogEvent) dataBytes(f BinlogFormat) []byte { | ||
data := ev.Bytes()[f.HeaderLength:] | ||
data = data[0 : len(data)-4] | ||
data = data[0 : len(data)-binlogChecksumLen] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really check BinlogFormat
to see which checksum is being used, before assuming a checksum is at all used. Sample code:
switch f.ChecksumAlgorithm {
case mysql.BinlogChecksumAlgCRC32:
.... size is binlogChecksumLen
case mysql.BinlogChecksumAlgOff:
... size is 0
default:
return vterrors.Errorf(vtrpc.Code_INVALID_ARGUMENT, "unsupported checksum algorithm: %v", format.ChecksumAlgorithm)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not expand this PR. I don't disagree with you that we can improve this but it's not directly related and I want to keep this focused.
go/mysql/binlog_event_mariadb.go
Outdated
checksum := data[length-4:] | ||
data = data[:length-4] | ||
checksum := data[length-binlogChecksumLen:] | ||
data = data[:length-binlogChecksumLen] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, we must not assume CRC32
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, sure. But it's not related to this PR, I only made the 4 less "magical". 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also worth noting that the checksum is either 0 bytes, when there is no checksum, or it's 4 bytes regardless of the algorithm used. In both MySQL and MariaDB. Although in both cases none and CRC32 are the ONLY options supported today:
https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#option_mysqld_binlog-checksum
So we're being forward looking here, which I'm all for. But again, I don't want to expand the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
var transactionPayloadCompressionTypes = map[uint64]string{ | ||
transactionPayloadCompressionZstd: "ZSTD", | ||
transactionPayloadCompressionNone: "NONE", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have a need for these to be public to; but I will iterate in a different PR if needed.
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Thank you for the review, @shlomi-noach ! ❤️ I believe that I've addressed all of your comments now (w/o expanding the scope of the PR too much). Please let me know if I missed something though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. I think we can still remove some allocations from the ZSTD decompression code but I don't want to block the PR, since the behavior is correct. I'll run some benchmarks next week!
@vmg Yeah, I agree. That's why I wanted you, the perf master, to review. 😄 Thank you! ❤️ |
Signed-off-by: Matt Lord <mattalord@gmail.com>
…ansaction Compression (vitessio#2047) * cherry pick of 12950 * Fix merge conflicts Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> --------- Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com>
Description
MySQL 8.0 added support for binary log compression via transaction (GTID) compression in 8.0.20. You can read more about this feature here: https://dev.mysql.com/doc/refman/8.0/en/binary-log-transaction-compression.html
This can — at the cost of increased CPU usage — dramatically reduce the amount of data sent over the wire for MySQL replication while also dramatically reducing the overall storage space needed to retain binary logs (for replication, backup and recovery, CDC, etc). For larger installations this was a very desirable feature and while you could technically use it with Vitess (the MySQL replica sets making up each shard could use it fine) there was one very big limitation — VReplication workflows would not work. Given the criticality of VReplication workflows within Vitess, this meant that in practice this MySQL feature was not usable within Vitess clusters.
This PR addresses the issue by adding support for processing the compressed transaction events in VReplication, without any (known) limitations.
Related Issue(s)
Checklist