-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Open
Description
While trying to improve Tempoal performance for ScyllaDB I (OpenCode really, but with my directions) bumped across mostly unrelated memory optimizations opportunities. Is the team interested in them? I don't wish to overload the team if there's little interest or value.
They can be cherry-picked independently. Here's the proposed content:
commit 0db4ad05c4601796c1c8f3f3675ca350ab8cda8b (HEAD -> main)
Author: Yaniv Michael Kaul <yaniv.kaul@scylladb.com>
Date: Tue Mar 17 23:57:05 2026 +0200
perf: cache Tagged scope lookups in tallyMetricsHandler to reduce allocations
Add a scopeCache (sync.Map) to tallyMetricsHandler that caches the
result of scope.Tagged() calls per unique tag combination. This avoids
repeated map allocations and tally scope registry lookups on every
Counter/Gauge/Timer/Histogram Record call with inline tags.
The scope cache builds on the WithTags handler cache from the previous
commit: WithTags caches entire handler subtrees, while scopeCache
targets the per-metric-emission path where tags are passed inline.
The cache is bounded to 1024 entries (approximate, may slightly overshoot
under high concurrency due to check-then-store races). Tags are
normalized through excludeTags before cache key computation so that
different raw values which map to the same excluded placeholder share
a single cache entry.
Combined with the WithTags cache, the two layers eliminated the top
allocation sources from tally (verified via pprof alloc_space, ScyllaDB
omes throughput_stress 5-min run, 60 iterations):
Allocation site Pre-metrics Scope-cache Delta
tagsToMap.func1 (map inserts) 1,101 MB 0 MB -100%
tally.scopeRegistry.Subscope 1,012 MB 0 MB -100%
tagsToMap (make map) 177 MB 0 MB -100%
tagsCacheKey (new: cache keys) 0 MB 369 MB +369 MB
WithTags (cum) 1,972 MB 326 MB -83.5%
Total allocations: 18,465 MB 16,511 MB -1,954 MB (-10.6%)
Net metrics savings: ~1,921 MB eliminated, 369 MB new cost
Benchmark: omes throughput_stress, 5 min, 128 shards, QPS unlimited,
GOMAXPROCS=4, GOGC=200, 4-core pinning per component (i7-1270P).
UpdateWorkflowExecution latency (avg) and total persistence ops:
ScyllaDB:
prev commit (WithTags cache): 2.33ms, 85,885 ops, 60 iters
this commit (+ scope cache): 2.27ms, 85,886 ops, 60 iters
delta: -0.06ms (-2.6%)
Cassandra:
tuned baseline (no metrics opts): 2.64ms, 85,739 ops, 60 iters
this commit (WithTags + scope): 2.61ms, 85,922 ops, 60 iters
delta: -0.03ms (-1.2%)
And:
commit fb46e34a00742cf18f659c72c4a1674b312251fd
Author: Yaniv Michael Kaul <yaniv.kaul@scylladb.com>
Date: Tue Mar 17 21:21:31 2026 +0200
perf: cache WithTags handlers in tallyMetricsHandler to reduce allocations
Add sync.Map-based caching of child handlers in WithTags(). On cache
hit, zero allocations — skips tagsToMap(), scope.Tagged(), and handler
struct allocation entirely.
Benchmark (omes throughput_stress, 5min, ScyllaDB, standard 4-core
layout: Temporal cores 0-3, DB cores 4-7, GOMAXPROCS=4 GOGC=200,
128 shards, QPS=0):
WithTags cache: avg=2.33ms ops=85,885 iters=60
Tuned baseline: avg=2.27ms ops=86,133 iters=60
Delta: ops -0.29% (within run-to-run noise)
Memory profiling (pprof alloc_space, 5min ScyllaDB workload):
Total allocations: 16,481 MB (vs 18,030 MB baseline = -8.6%)
Per-function breakdown:
| Allocation site | Baseline | WithTags | Delta |
|------------------------------|------------|------------|--------|
| tagsToMap.func1 (map insert) | 1,106 MB | 215 MB | -80.6% |
| tally Subscope | 1,014 MB | 164 MB | -83.8% |
| tagsCacheKey (new: keys) | 0 MB | 317 MB | +317MB |
| WithTags (cumulative) | 1,930 MB | 316 MB | -83.6% |
The throughput delta is not measurable at this concurrency level.
Metrics overhead is ~3.2% of total server CPU; eliminating WithTags
allocations frees ~3% of CPU which is below the noise floor of the
end-to-end benchmark. The allocation reduction is real but the system
is throughput-capped by aggregate server compute across many subsystems
(syscalls 13%, GC 10%, gRPC 9%, proto 11%), not by any single one.
Let me know and I'll be happy to submit. I only briefly reviewed the code (certainly not the area I'm focused in, interested in or have experience or knowledge in).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels