VersionMembershipCache: Metrics and refactorings! by Shivs11 · Pull Request #8894 · temporalio/temporal

Shivs11 · 2025-12-22T22:15:52Z

What changed?

Added metrics, such as cache hits and cache misses, so that we can understand if the currently set TTL for this cache (of 1 second) is too low or too high.
Also did some re-factorings: While working on this, I realized that it was much simpler to add a new wrapper with a metrics handler, that specifically served the use case of understanding cache hits and missed, rather than use any of the existing implementations of caches that have a metrics handler attached to them. (See NewWithMetrics)
Thus, I took some inspiration from the way newEventsCache was implemented and came up with this.

Why?

Explained above.

How did you test it?

Potential risks

None.

Note

Introduces a typed, instrumented cache for worker versioning and refactors call sites to use it.

Adds VersionMembershipCache interface and NewVersionMembershipCache wrapper emitting metrics (VersionMembershipCacheGet/Put with cache_type=version_membership)
Replaces direct cache.Cache usage and ad-hoc keys in worker_versioning validation with the new cache API
Wires the cache via FX provider in history service, returning the wrapped cache; plumbs through engine, starters, and APIs (startworkflow, signalwithstartworkflow, resetworkflow, multioperation, updateworkflowoptions)
Updates metrics definitions with new cache type tag and operation scopes
Test updates: introduce simple/noop implementations and adjust existing tests to the new interface

^{Written by Cursor Bugbot for commit 920106f. This will update automatically on new commits. Configure here.}

Shivs11 · 2025-12-22T22:19:22Z

+type testVersionMembershipCache struct {
+	mu sync.Mutex
+	m  map[testVersionMembershipCacheKey]bool
+}


I was actually using an instance of cache.Cache in some of the unit tests in this file. I thought I might as well change that by using this newly implemented cache since it shall also then test it's functionality.

carlydf · 2025-12-23T21:18:33Z

+func newVersionMembershipCache(c cache.Cache, metricsHandler metrics.Handler) worker_versioning.VersionMembershipCache {
+	h := metricsHandler.WithTags(metrics.CacheTypeTag("version_membership"))
+	return &versionMembershipCache{
+		cache:   c,


it's concerning to me that I don't see eviction happening anywhere. But maybe I've just missed it. In the events cache, the underlying cache is the LRU cache, which makes sense to me. That also provides the hit and miss metrics that you need, in a way that is already standardized. Is it possible to just use the existing LRU cache instead of having to reimplement and re-test eviction logic elsewhere?

This is how the cache here is being initialized (in the fx.go file):

func VersionMembershipCacheProvider( lc fx.Lifecycle, serviceConfig *configs.Config, metricsHandler metrics.Handler, ) worker_versioning.VersionMembershipCache { c := commoncache.New(serviceConfig.VersionMembershipCacheMaxSize(), &commoncache.Options{ TTL: max(1*time.Second, serviceConfig.VersionMembershipCacheTTL()), }) lc.Append(fx.Hook{ OnStop: func(context.Context) error { c.Stop() return nil }, }) return newVersionMembershipCache(c, metricsHandler) }

The underlying cache here is a StoppableCache(exactly similar to the events cache), which is also an LRU cache. Since the versioning cache is an LRU cache, the eviction logic exists in the lru.go file and the tests that I had added in my previous PR validate this.

The underlying StoppableCache does not emit cache hit and cache miss metrics (which is what we are interested in), which is why I had defined this new wrapper on top of it. This also seemed to be one of the main reasons why the events cache was implemented as a layer on top of the StoppableCache .

Actually LRU cache can emit metrics. There is this constructor which will create a cache with a metrics handler:

temporal/common/cache/lru.go

Line 143 in abb6359

func NewWithMetrics(maxSize int, opts *Options, handler metrics.Handler) StoppableCache {

Oh, I see that it doesn't emit hit and miss metrics :/

ty for showing me where the TTL is set! my main concern is fine then.
I thought that the LRU cache / stoppable would have metrics to provide the hit rate, but honestly I'm not sure what some of the metrics it emits even mean, or how to compute hit rate from them:

NewGaugeDef("cache_pinned_usage") // I looked this up and it's the count of elements that are blocked from being evicted even if they are the LRU element

NewTimerDef("cache_entry_age_on_eviction")

NewGaugeDef("cache_usage")

NewTimerDef("cache_entry_age_on_get")

…package

added cache with metrics

e484ee5

Shivs11 requested review from a team as code owners December 22, 2025 22:15

Shivs11 commented Dec 22, 2025

View reviewed changes

Shivs11 added 2 commits December 23, 2025 10:17

stopping lint complaints

9fb63c3

lint:

7e5b78c

carlydf reviewed Dec 23, 2025

View reviewed changes

Shivs11 added 2 commits December 23, 2025 17:17

address comments, add namespaceID in metrics, move things to metrics …

f2353de

…package

simplify unit tests

1ce3a10

carlydf approved these changes Dec 23, 2025

View reviewed changes

Merge branch 'main' into ss/history-cache-metrics

920106f

Shivs11 enabled auto-merge (squash) December 23, 2025 23:27

Shivs11 merged commit d8e8685 into main Dec 24, 2025
61 checks passed

Shivs11 deleted the ss/history-cache-metrics branch December 24, 2025 00:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VersionMembershipCache: Metrics and refactorings!#8894

VersionMembershipCache: Metrics and refactorings!#8894
Shivs11 merged 6 commits intomainfrom
ss/history-cache-metrics

Shivs11 commented Dec 22, 2025 •

edited by cursor Bot

Loading

Uh oh!

Shivs11 Dec 22, 2025

Uh oh!

Uh oh!

carlydf Dec 23, 2025

Uh oh!

Shivs11 Dec 23, 2025

Uh oh!

prathyushpv Dec 23, 2025 •

edited

Loading

Uh oh!

carlydf Dec 23, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Shivs11 commented Dec 22, 2025 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed?

Why?

How did you test it?

Potential risks

Uh oh!

Shivs11 Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

carlydf Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Shivs11 Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

prathyushpv Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlydf Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Shivs11 commented Dec 22, 2025 •

edited by cursor Bot

Loading

prathyushpv Dec 23, 2025 •

edited

Loading