Use proto encoding for scheduler workflow next time cache #5277

dnr · 2024-01-10T17:19:11Z

What changed?

The scheduler workflow keeps a cache of a sequence of calls to getNextTime to reduce overhead. The cache was serialized as json, this changes it to serialize as proto. This is made forwards and backwards compatible by trying to deserialize into both types.

Why?

The json decoding of this cache showed up as a significant cost in a cpu profile of a worker that was doing a lot of schedule workflow replays. In a simple benchmark of decoding only on my laptop, decoding the proto was about 20 times faster with no jitter (14 with jitter), and the encoded proto was 8 times smaller (5 with jitter). Encoding was about 4 times faster (but that's less important).

no jitter:
BenchmarkDecodeJson-12            178678             33114 ns/op            1848 B/op         13 allocs/op
BenchmarkDecodeProto-12          2249361              1571 ns/op             288 B/op          3 allocs/op
with jitter:
BenchmarkDecodeJson-12            110619             32657 ns/op            1848 B/op         13 allocs/op
BenchmarkDecodeProto-12          1565641              2336 ns/op             400 B/op          4 allocs/op

How did you test it?

existing tests (especially replay test for backwards + forwards compatibility)

Potential risks

The code is a little more complicated and there could be bugs in the conversion. Note that both json and proto support nanosecond resolution for timestamps.

ast2023

LGTM

ast2023 · 2024-01-10T17:34:26Z

service/worker/scheduler/workflow.go

+		cache := s.nextTimeCacheV2
+		start := cache.StartTime.AsTime()
+		afterOffset := int64(after.Sub(start))
+		for i, nextOffset := range cache.NextTimes {


Could this block (3 assignments and the for loop) be moved into a separate function?

…5698) ## What changed? Activate schedule workflow logic changes. Some of this code is not in 1.23.0, but we can patch those PRs into 1.23.1 to support downgrades to the 1.23 series. ## Why? Fix bugs, support new features, make more efficient. ## How did you test it? existing tests (on those PRs) ## Potential risks schedule workflow determinism errors

The scheduler workflow keeps a cache of a sequence of calls to getNextTime to reduce overhead. The cache was serialized as json, this changes it to serialize as proto. This is made forwards and backwards compatible by trying to deserialize into both types. The json decoding of this cache showed up as a significant cost in a cpu profile of a worker that was doing a lot of schedule workflow replays. In a simple benchmark of decoding only on my laptop, decoding the proto was about 20 times faster with no jitter (14 with jitter), and the encoded proto was 8 times smaller (5 with jitter). Encoding was about 4 times faster (but that's less important). ``` no jitter: BenchmarkDecodeJson-12 178678 33114 ns/op 1848 B/op 13 allocs/op BenchmarkDecodeProto-12 2249361 1571 ns/op 288 B/op 3 allocs/op with jitter: BenchmarkDecodeJson-12 110619 32657 ns/op 1848 B/op 13 allocs/op BenchmarkDecodeProto-12 1565641 2336 ns/op 400 B/op 4 allocs/op ``` existing tests (especially replay test for backwards + forwards compatibility) The code is a little more complicated and there could be bugs in the conversion. Note that both json and proto support nanosecond resolution for timestamps.

dnr added 2 commits January 10, 2024 09:01

Use proto encoding for scheduler workflow next time cache

fbf2b5b

restore json code for downgrades

cbedc6c

dnr requested a review from a team as a code owner January 10, 2024 17:19

ast2023 approved these changes Jan 10, 2024

View reviewed changes

factor out function

15a4e71

dnr merged commit 17fc86a into temporalio:main Jan 12, 2024
12 checks passed

dnr deleted the sched76 branch January 12, 2024 02:46

dnr added a commit to dnr/temporal that referenced this pull request Apr 10, 2024

use proto format cache (temporalio#5277)

155bea8

dnr added the release/1.23.1 label Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use proto encoding for scheduler workflow next time cache #5277

Use proto encoding for scheduler workflow next time cache #5277

dnr commented Jan 10, 2024

ast2023 left a comment

ast2023 Jan 10, 2024

dnr Jan 12, 2024

Use proto encoding for scheduler workflow next time cache #5277

Use proto encoding for scheduler workflow next time cache #5277

Conversation

dnr commented Jan 10, 2024

What changed?

Why?

How did you test it?

Potential risks

ast2023 left a comment

Choose a reason for hiding this comment

ast2023 Jan 10, 2024

Choose a reason for hiding this comment

dnr Jan 12, 2024

Choose a reason for hiding this comment