Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event buffer size limit #4296

Merged
merged 7 commits into from May 8, 2023
Merged

Conversation

MichaelSnowden
Copy link
Contributor

What changed?
Add a limit on the total size of the history events buffer. Here, size means the sum of the number of bytes of each HistoryEvent proto in the batch, as determined by protobuf.

Why?
Because we don't want to have huge mutable state objects.

How did you test it?
I added test cases which verify that we force-flush and restart WFTs when this limit is exceeded.

Potential risks
There are risks that this will break users that somehow had huge event buffers, but that behavior would already degrade performance anyway.

Is hotfix candidate?
I think we should definitely include a message about this new limit in the patch notes.

@MichaelSnowden MichaelSnowden requested a review from a team as a code owner May 5, 2023 23:32
transactionPolicy: workflow.TransactionPolicyActive,
signals: 2,
maxEvents: 2,
maxSizeInBytes: 0,
expectFlush: true,
},
} {
Copy link
Member

@yycptt yycptt May 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test to check if the calculation is done correctly? Something like two signals each has 50 bytes of data and threshold is 80 bytes, then assert flush is triggered.
Seems like existing tests can be passed even if the size calculation logic just return the buffer event count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size is non-deterministic unfortunately, but I added a test to verify that it's non-zero and less than 2x the normal size I computed.

ShardSyncMinInterval: dc.GetDurationProperty(dynamicconfig.ShardSyncMinInterval, 5*time.Minute),
ShardSyncTimerJitterCoefficient: dc.GetFloat64Property(dynamicconfig.TransferProcessorMaxPollIntervalJitterCoefficient, 0.15),
MaximumBufferedEventsBatch: dc.GetIntProperty(dynamicconfig.MaximumBufferedEventsBatch, 100),
MaximumBufferedEventsSizeInBytes: dc.GetIntProperty(dynamicconfig.MaximumBufferedEventsSizeInBytes, 1<<20),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @yiminc any comment on the limit?
I think we want to use 2MB?

nit: I think x * 1024 * 1024 is easier to understand than x << 20, but no strong opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to be 2 * 1024 * 1024

func (b *HistoryBuilder) SizeInBytesOfBufferedEvents() int {
size := 0
for _, ev := range append(b.dbBufferBatch, b.memBufferBatch...) {
size += ev.Size()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

return len(b.dbBufferBatch) + len(b.memBufferBatch)
}

func (b *HistoryBuilder) SizeInBytesOfBufferedEvents() int {
size := 0
for _, ev := range append(b.dbBufferBatch, b.memBufferBatch...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would write two for loop and avoid additional allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -4601,6 +4601,17 @@ func (ms *MutableStateImpl) closeTransactionWithPolicyCheck(
}
}

func (ms *MutableStateImpl) BufferSizeAcceptable() bool {
if ms.hBuilder.NumBufferedEvents() >= ms.config.MaximumBufferedEventsBatch() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can change this to >, so that when someone set the config to be 2, it really mean 2 event can be buffered, not 1...
should be safe for existing deployment as well as we are basically allowing one more event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed for both.

@MichaelSnowden MichaelSnowden force-pushed the snowden/event-batch-byte-size-limit branch from 0d74a73 to 6115b16 Compare May 8, 2023 21:06
@MichaelSnowden MichaelSnowden force-pushed the snowden/event-batch-byte-size-limit branch from 6115b16 to 02df343 Compare May 8, 2023 21:09
@MichaelSnowden MichaelSnowden merged commit 8e6203a into master May 8, 2023
10 checks passed
@MichaelSnowden MichaelSnowden deleted the snowden/event-batch-byte-size-limit branch May 8, 2023 21:46
rodrigozhou pushed a commit that referenced this pull request May 13, 2023
* Event buffer size limit

* Make size limit inclusive

* Make default size limit more readable

* Add history builder test for proto size

* Split db and memory batch for loop

* Use values instead of pointers to avoid parallel test glitch

* Make a copy of the test case for parallel test to work
rodrigozhou pushed a commit that referenced this pull request May 16, 2023
* Event buffer size limit

* Make size limit inclusive

* Make default size limit more readable

* Add history builder test for proto size

* Split db and memory batch for loop

* Use values instead of pointers to avoid parallel test glitch

* Make a copy of the test case for parallel test to work
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants