Improve ackManager.completeTask performance by two orders of magnitude#5216
Improve ackManager.completeTask performance by two orders of magnitude#5216tdeebswihart merged 7 commits intomainfrom
Conversation
Tests before refactors!
The old implementation of completeTask required a full scan of the task
map in order to move the ack level which had terrible performance.
By storing tasks in an ordered set we can limit the scan's size by
stopping at the first unacked task
Before:
$ go test -bench=AckManager ./service/matching/... -run=FooBarBaz
goos: darwin
goarch: arm64
pkg: go.temporal.io/server/service/matching
BenchmarkAckManager_AddTask-12 22768 52206 ns/op
BenchmarkAckManager_CompleteTask-12 38 29293019 ns/op
After:
$ go test -bench=AckManager ./service/matching -run=FooBarBaz
goos: darwin
goarch: arm64
pkg: go.temporal.io/server/service/matching
BenchmarkAckManager_AddTask-12 8127 147226 ns/op
BenchmarkAckManager_CompleteTask-12 8626 136614 ns/op
|
Not to be too pedantic since it's already a huge improvement, but I'm curious how the performance would compare to using https://pkg.go.dev/github.com/google/btree, specifically the ordered generic version (NewOrderedG). That's the ordered map for Go I've used a few times in the past |
|
Oh, I see, you picked that one since it's already used in the test. Well, I'm still curious. If it's faster maybe we can change both |
Looks like this doesn't allow us to store values alongside the keys, hrm. I'll build it with this alongside a map of K->V to see how it performs |
Oh right, I forgot you need the bool. You can totally store values next to the keys, but yeah, you can't use the Ordered one in that case, you need to make a struct with two members and implement a Less |
The implementation using Here's the diff if you want to vet it: |
Fair. I'll try that and see if it performs better |
service/matching/ack_manager.go
Outdated
| "go.temporal.io/server/common/util" | ||
| ) | ||
|
|
||
| func int64Comparator(a, b interface{}) int { |
There was a problem hiding this comment.
I see there's a github.com/emirpasic/gods/utils.Int64Comparator.. why copy it here?
There was a problem hiding this comment.
Good question. I copied it from the other place in our codebase that uses the gods library
There was a problem hiding this comment.
can we replace them both with that, then?
service/matching/ack_manager.go
Outdated
| taskIDs := maps.Keys(m.outstandingTasks) | ||
| util.SortSlice(taskIDs) | ||
| min, _ := m.outstandingTasks.Min() | ||
| if taskID != min { |
There was a problem hiding this comment.
taskID here is int64 and min is interface{}.. I don't think this comparison is valid. actually I'm not sure how the compiler accepts this...
service/matching/ack_manager.go
Outdated
| delete(m.outstandingTasks, taskID) | ||
| } else { | ||
| return m.ackLevel | ||
| // We've acked the minimum task, so should adjust the ack level as far as we can |
There was a problem hiding this comment.
wouldn't it be simpler to write the whole thing (starting from line 146) like this:
for {
min, acked := m.outstandingTasks.Min()
if min == nil || !acked.(bool) {
return m.ackLevel
}
m.outstandingTasks.Remove(min)
}
note that we currently never complete tasks out of order
There was a problem hiding this comment.
You're right, I overcomplicated this
service/matching/ack_manager.go
Outdated
| "go.temporal.io/server/common/util" | ||
| ) | ||
|
|
||
| func int64Comparator(a, b interface{}) int { |
There was a problem hiding this comment.
can we replace them both with that, then?
service/matching/ack_manager.go
Outdated
| min, _ := m.outstandingTasks.Min() | ||
| if taskID != min.(int64) { | ||
| return m.ackLevel | ||
| } |
There was a problem hiding this comment.
let's just get rid of this check? the loop below will take care of it
I was lazy on my previous commit and should have done this then
What changed?
I replaced the outstandingTasks map with an ordered treemap and optimized
completeTask to only scan what was necessary to update the ack level.
Why?
The old implementation of completeTask required a full scan of the task
map in order to move the ack level which had terrible performance.
By storing tasks in an ordered set we can limit the scan's size by
stopping at the first unacked task.
This trades addTask performance for completeTask performance but since all
added tasks are presumably completed we should be fine with 1/3 the
performance on addTask for 227x the completeTask performance. With this
change both operations run in about the same amount of time.
Before:
After:
How did you test it?
I added both tests and benchmarks to ensure the ackManager worked as before
Potential risks
None.
Is hotfix candidate?
No