Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use proto encoding for scheduler workflow next time cache #5277

Merged
merged 3 commits into from
Jan 12, 2024

Conversation

dnr
Copy link
Member

@dnr dnr commented Jan 10, 2024

What changed?

The scheduler workflow keeps a cache of a sequence of calls to getNextTime to reduce overhead. The cache was serialized as json, this changes it to serialize as proto. This is made forwards and backwards compatible by trying to deserialize into both types.

Why?

The json decoding of this cache showed up as a significant cost in a cpu profile of a worker that was doing a lot of schedule workflow replays. In a simple benchmark of decoding only on my laptop, decoding the proto was about 20 times faster with no jitter (14 with jitter), and the encoded proto was 8 times smaller (5 with jitter). Encoding was about 4 times faster (but that's less important).

no jitter:
BenchmarkDecodeJson-12            178678             33114 ns/op            1848 B/op         13 allocs/op
BenchmarkDecodeProto-12          2249361              1571 ns/op             288 B/op          3 allocs/op
with jitter:
BenchmarkDecodeJson-12            110619             32657 ns/op            1848 B/op         13 allocs/op
BenchmarkDecodeProto-12          1565641              2336 ns/op             400 B/op          4 allocs/op

How did you test it?

existing tests (especially replay test for backwards + forwards compatibility)

Potential risks

The code is a little more complicated and there could be bugs in the conversion. Note that both json and proto support nanosecond resolution for timestamps.

@dnr dnr requested a review from a team as a code owner January 10, 2024 17:19
Copy link
Contributor

@ast2023 ast2023 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

cache := s.nextTimeCacheV2
start := cache.StartTime.AsTime()
afterOffset := int64(after.Sub(start))
for i, nextOffset := range cache.NextTimes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this block (3 assignments and the for loop) be moved into a separate function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@dnr dnr merged commit 17fc86a into temporalio:main Jan 12, 2024
12 checks passed
@dnr dnr deleted the sched76 branch January 12, 2024 02:46
dnr added a commit to dnr/temporal that referenced this pull request Apr 10, 2024
dnr added a commit that referenced this pull request Apr 10, 2024
…5698)

## What changed?
Activate schedule workflow logic changes. Some of this code is not in
1.23.0, but we can patch those PRs into 1.23.1 to support downgrades to
the 1.23 series.

## Why?
Fix bugs, support new features, make more efficient.

## How did you test it?
existing tests (on those PRs)

## Potential risks
schedule workflow determinism errors
dnr added a commit that referenced this pull request Apr 10, 2024
…5698)

## What changed?
Activate schedule workflow logic changes. Some of this code is not in
1.23.0, but we can patch those PRs into 1.23.1 to support downgrades to
the 1.23 series.

## Why?
Fix bugs, support new features, make more efficient.

## How did you test it?
existing tests (on those PRs)

## Potential risks
schedule workflow determinism errors
yycptt pushed a commit that referenced this pull request Apr 27, 2024
The scheduler workflow keeps a cache of a sequence of calls to
getNextTime to reduce overhead. The cache was serialized as json, this
changes it to serialize as proto. This is made forwards and backwards
compatible by trying to deserialize into both types.

The json decoding of this cache showed up as a significant cost in a cpu
profile of a worker that was doing a lot of schedule workflow replays.
In a simple benchmark of decoding only on my laptop, decoding the proto
was about 20 times faster with no jitter (14 with jitter), and the
encoded proto was 8 times smaller (5 with jitter). Encoding was about 4
times faster (but that's less important).

```
no jitter:
BenchmarkDecodeJson-12            178678             33114 ns/op            1848 B/op         13 allocs/op
BenchmarkDecodeProto-12          2249361              1571 ns/op             288 B/op          3 allocs/op
with jitter:
BenchmarkDecodeJson-12            110619             32657 ns/op            1848 B/op         13 allocs/op
BenchmarkDecodeProto-12          1565641              2336 ns/op             400 B/op          4 allocs/op
```

existing tests (especially replay test for backwards + forwards
compatibility)

The code is a little more complicated and there could be bugs in the
conversion. Note that both json and proto support nanosecond resolution
for timestamps.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants