New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Event buffer size limit #4296
Event buffer size limit #4296
Conversation
transactionPolicy: workflow.TransactionPolicyActive, | ||
signals: 2, | ||
maxEvents: 2, | ||
maxSizeInBytes: 0, | ||
expectFlush: true, | ||
}, | ||
} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test to check if the calculation is done correctly? Something like two signals each has 50 bytes of data and threshold is 80 bytes, then assert flush is triggered.
Seems like existing tests can be passed even if the size calculation logic just return the buffer event count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size is non-deterministic unfortunately, but I added a test to verify that it's non-zero and less than 2x the normal size I computed.
service/history/configs/config.go
Outdated
ShardSyncMinInterval: dc.GetDurationProperty(dynamicconfig.ShardSyncMinInterval, 5*time.Minute), | ||
ShardSyncTimerJitterCoefficient: dc.GetFloat64Property(dynamicconfig.TransferProcessorMaxPollIntervalJitterCoefficient, 0.15), | ||
MaximumBufferedEventsBatch: dc.GetIntProperty(dynamicconfig.MaximumBufferedEventsBatch, 100), | ||
MaximumBufferedEventsSizeInBytes: dc.GetIntProperty(dynamicconfig.MaximumBufferedEventsSizeInBytes, 1<<20), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @yiminc any comment on the limit?
I think we want to use 2MB?
nit: I think x * 1024 * 1024
is easier to understand than x << 20
, but no strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed to be 2 * 1024 * 1024
func (b *HistoryBuilder) SizeInBytesOfBufferedEvents() int { | ||
size := 0 | ||
for _, ev := range append(b.dbBufferBatch, b.memBufferBatch...) { | ||
size += ev.Size() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
return len(b.dbBufferBatch) + len(b.memBufferBatch) | ||
} | ||
|
||
func (b *HistoryBuilder) SizeInBytesOfBufferedEvents() int { | ||
size := 0 | ||
for _, ev := range append(b.dbBufferBatch, b.memBufferBatch...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would write two for loop and avoid additional allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -4601,6 +4601,17 @@ func (ms *MutableStateImpl) closeTransactionWithPolicyCheck( | |||
} | |||
} | |||
|
|||
func (ms *MutableStateImpl) BufferSizeAcceptable() bool { | |||
if ms.hBuilder.NumBufferedEvents() >= ms.config.MaximumBufferedEventsBatch() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can change this to >
, so that when someone set the config to be 2, it really mean 2 event can be buffered, not 1...
should be safe for existing deployment as well as we are basically allowing one more event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed for both.
0d74a73
to
6115b16
Compare
6115b16
to
02df343
Compare
* Event buffer size limit * Make size limit inclusive * Make default size limit more readable * Add history builder test for proto size * Split db and memory batch for loop * Use values instead of pointers to avoid parallel test glitch * Make a copy of the test case for parallel test to work
* Event buffer size limit * Make size limit inclusive * Make default size limit more readable * Add history builder test for proto size * Split db and memory batch for loop * Use values instead of pointers to avoid parallel test glitch * Make a copy of the test case for parallel test to work
What changed?
Add a limit on the total size of the history events buffer. Here, size means the sum of the number of bytes of each HistoryEvent proto in the batch, as determined by protobuf.
Why?
Because we don't want to have huge mutable state objects.
How did you test it?
I added test cases which verify that we force-flush and restart WFTs when this limit is exceeded.
Potential risks
There are risks that this will break users that somehow had huge event buffers, but that behavior would already degrade performance anyway.
Is hotfix candidate?
I think we should definitely include a message about this new limit in the patch notes.