Skip to content

Fix dynamicConfigOverrides defaults getting tied to test's lifecycle#9918

Merged
long-nt-tran merged 1 commit intotemporalio:mainfrom
long-nt-tran:tescore-dedicated-cluster
Apr 13, 2026
Merged

Fix dynamicConfigOverrides defaults getting tied to test's lifecycle#9918
long-nt-tran merged 1 commit intotemporalio:mainfrom
long-nt-tran:tescore-dedicated-cluster

Conversation

@long-nt-tran
Copy link
Copy Markdown
Contributor

@long-nt-tran long-nt-tran commented Apr 10, 2026

What changed?

Changed onebox.go to set global dynamicConfigOverrides without binding cleanup to a test, and added unit test to verify behavior works -- reused test gets the same default dynamicConfigOverrides.

Also renamed the existing function overrideDynamicConfig -> overrideTestLevelDynamicConfig to make it clearer that that function's dynamic config override is bounded to the test's lifecycle.

Why?

When onebox.go sets up a new Temporal cluster, it applies some default overrides that is not set by the test.

However, these are set via overrideDynamicConfig(...), which ties the cleanup of these configs to a specific test's lifecycle.

As a result, if we have Test1 that gets a fresh cluster, if some Test2 later reuses this cluster after Test1 ends, Test2 will not have these default dynamicConfigOverrides -- in fact the dynamicConfig would be empty because we cleaned it up after Test1 finished.

These defaults should not be bounded to the lifetime of the test that instantiated the fresh cluster.

Note

Existing tests that instantiate their env with testcore.WithDynamicConfigOverrides(...) options already get a fresh cluster, so this bug impacted tests that don't override any dynamic config options when calling testcore.NewEnv(...).

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

I added a test_cluster_pool_test.go which deterministically repros this without the code change in onebox.go:

Expected failure without change in onebox.go:

$ go test ./tests/testcore/ -run TestDedicatedClusterReuseDropsGlobalOverrides -count=1
--- FAIL: TestDedicatedClusterReuseDropsGlobalOverrides (0.05s)
    --- FAIL: TestDedicatedClusterReuseDropsGlobalOverrides/SecondUse (0.00s)
        test_cluster_pool_test.go:42:
                Error Trace:    /Users/longtran/work/forks/temporal-wt/testcore-dedicated-cluster/tests/testcore/test_cluster_pool_test.go:57
                                                        /Users/longtran/work/forks/temporal-wt/testcore-dedicated-cluster/tests/testcore/test_cluster_pool_test.go:42
                Error:          Should NOT be empty, but was []
                Test:           TestDedicatedClusterReuseDropsGlobalOverrides/SecondUse
                Messages:       global override defaults should still be present on reused cluster: key frontend.maxconcurrentbatchoperationpernamespace missing
FAIL
FAIL    go.temporal.io/server/tests/testcore    1.175s
FAIL

With change, it passes:

$ go test ./tests/testcore/ -run TestDedicatedClusterReuseDropsGlobalOverrides -count=1
ok      go.temporal.io/server/tests/testcore    1.086s

Potential risks

Might be breaking changes if some tests rely on the existing bug for some reason (?)

@long-nt-tran long-nt-tran marked this pull request as ready for review April 10, 2026 21:16
@long-nt-tran long-nt-tran requested review from a team as code owners April 10, 2026 21:16
@long-nt-tran long-nt-tran force-pushed the tescore-dedicated-cluster branch from 2eb0117 to cd21545 Compare April 10, 2026 21:21
Comment thread tests/testcore/test_cluster_pool_test.go Outdated
Comment thread tests/testcore/onebox.go Outdated
Comment thread tests/testcore/onebox.go
Comment thread tests/testcore/onebox.go Outdated
Comment thread tests/testcore/onebox.go Outdated
@long-nt-tran long-nt-tran force-pushed the tescore-dedicated-cluster branch from cd21545 to c02f83d Compare April 13, 2026 18:13
@long-nt-tran long-nt-tran requested a review from stephanos April 13, 2026 19:05
Copy link
Copy Markdown
Contributor

@stephanos stephanos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome - thank you for tracking that down!

@long-nt-tran long-nt-tran merged commit dc2a5b9 into temporalio:main Apr 13, 2026
68 of 70 checks passed
@long-nt-tran long-nt-tran deleted the tescore-dedicated-cluster branch April 13, 2026 20:49
long-nt-tran added a commit that referenced this pull request Apr 28, 2026
## What changed?

When we set Nexus callback URL in test_env.go, the dynamic config
override is still tied to the test's lifetime, not the cluster's
lifetime, so a subsequent test that reuse this cluster will not have
that override. Moving the override to onebox.go (similar pattern to
#9918) so this default lives
for the lifetime of the cluster.

## Why?

Ran into issue with task token not set in
#9614, this solves it.
Breaking the fix in a separate PR for ease of review + checking this in
first.

## How did you test it?
- [ ] built
- [ ] run locally and tested manually
- [x] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants