Skip to content

perf: cache WithTags handlers in tallyMetricsHandler to reduce allocations (replaced with https://github.com/temporalio/temporal/pull/9620 )#10049

Closed
mykaul wants to merge 1 commit into
temporalio:mainfrom
mykaul:perf/cache-withtags-handlers
Closed

perf: cache WithTags handlers in tallyMetricsHandler to reduce allocations (replaced with https://github.com/temporalio/temporal/pull/9620 )#10049
mykaul wants to merge 1 commit into
temporalio:mainfrom
mykaul:perf/cache-withtags-handlers

Conversation

@mykaul
Copy link
Copy Markdown
Contributor

@mykaul mykaul commented Apr 24, 2026

Summary

Cache WithTags() child handlers in tallyMetricsHandler via sync.Map to eliminate repeated tagsToMap(), scope.Tagged(), and handler struct allocations on the hot path. On cache hit: zero allocations.

Design

  • childCache (sync.Map): maps normalized tag cache keys to *tallyMetricsHandler children. Uses LoadOrStore for safe concurrent access.
  • normalizeTagsForCaching: normalizes excluded tags before cache key computation so high-cardinality excluded values (e.g. activityType with thousands of distinct activity names) collapse to a single cache entry, preventing unbounded cache growth. Zero-alloc fast path when no tags need substitution.
  • tagsCacheKey: builds compact \x00-separated string keys from tag slices, with a single-tag fast path that avoids byte buffer allocation.
  • Empty WithTags() calls short-circuit and return self.

Allocation Reduction (pprof alloc_space, 5min ScyllaDB workload)

Metric Before After Reduction
WithTags cumulative 1,930 MB 316 MB -83.6%
Total server allocs 18,030 MB 16,481 MB -8.6%

Benchmark (omes throughput_stress, mc150, 5 min)

Host networking, i7-1270P 4 cores/component, inter-run data resets:

Database Baseline After Change
Cassandra 280 294 +5.0%
ScyllaDB 290 296 +2.1%

Note: Throughput variance at mc150 is ~5-10%. The allocation reduction is confirmed by pprof but throughput gains are within noise at this concurrency level.

Testing

  • Cache hit returns same pointer (require.Same)
  • Different tags return different handlers
  • Cached handler records metrics correctly
  • Multi-tag caching
  • Tag order sensitivity (documented, not normalized)
  • Child caches are independent per handler level
  • Excluded tags share a single child handler (prevents unbounded growth)
  • Concurrent access (32 goroutines × 100 iterations, -race)
  • tagsCacheKey boundary ambiguity prevention
  • normalizeTagsForCaching unit tests (zero-alloc fast path, excluded, allowed, mixed, empty)
  • All existing tests continue to pass

…tions

Add sync.Map-based caching of child handlers in WithTags(). On cache
hit, zero allocations — skips tagsToMap(), scope.Tagged(), and handler
struct allocation entirely.

Allocation reduction (pprof alloc_space, 5min ScyllaDB workload):
  WithTags cumulative: 1,930 MB -> 316 MB (-83.6%)
  Total server allocs: 18,030 MB -> 16,481 MB (-8.6%)

Benchmark (omes throughput_stress, mc150, 5 min, host networking,
i7-1270P 4 cores/component, inter-run data resets):
  Cassandra: 294 iterations (+5.0% vs 280 baseline, +13.5% vs prev)
  ScyllaDB:  296 iterations (+2.1% vs 290 baseline, -1.3% vs prev)
@mykaul mykaul requested review from a team as code owners April 24, 2026 09:39
@mykaul
Copy link
Copy Markdown
Contributor Author

mykaul commented Apr 25, 2026

#9620 is a more complete PR.

@mykaul mykaul closed this Apr 25, 2026
@mykaul mykaul changed the title perf: cache WithTags handlers in tallyMetricsHandler to reduce allocations perf: cache WithTags handlers in tallyMetricsHandler to reduce allocations (replaced with https://github.com/temporalio/temporal/pull/9620 ) May 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant