Update agent code review guidelines flaky test findings#9669

Merged

spkane31 merged 4 commits intomainfrom

spk/fix-common-testing-issues

Mar 30, 2026

Contributor

spkane31 commented Mar 25, 2026

What changed?

Add new review guidelines for test reliability and fix instances found in the current codebase.

Why?

I have found these to be the most common patterns across the last two weeks of targeting flaky tests.

How did you test it?

Potential risks

Minimal


          Update agent code review w/ flaky test findings

1fb38cb

spkane31 requested review from a team as code owners

March 25, 2026 23:03

spkane31 requested a review from stephanos

March 25, 2026 23:03

stephanos reviewed

View reviewed changes

tests/activity_test.go Outdated

+              	select {
+              	case <-cancelCh:
+              	case <-ctx.Done():
+              		s.Fail("Test timed out after 90s waiting for activity cancellation", ctx.Err())

Contributor

stephanos Mar 25, 2026

nit: let's not hard-code timeouts (90s) as they might change

tests/nexus_endpoint_test.go

+              			case <-listReqDone:
+              			case <-ctx.Done():
+              				s.Fail("timed out waiting for list nexus endpoints request to complete")
               			}

Contributor

stephanos Mar 25, 2026

non-blocking musings: I wonder if this could be a helper on TestEnv; like s.WaitForChn and it uses the right context automatically. I didn't think if happens that often, but now seeing multiple examples ...

.github/copilot-instructions.md Outdated

               - Use `require.ErrorAs(t, err, &specificErr)` for specific error type checks
               - Prefer `require` over `assert` - it's rarely useful to continue a test after a failed assertion
               - Add comments explaining why `Eventually` is needed (e.g., eventual consistency)
+              - Do not use single-value type assertions on errors (`err.(*T)`); this panics instead of failing the test when the type doesn't match. Use `errors.As` with a guarded return, or the two-value form `v, ok := err.(*T)` followed by an explicit `require.True(t, ok)`.

Contributor

stephanos Mar 25, 2026

I'd suggest only errors.As and not the longer/less helpful version of checked type casting.

Contributor

stephanos Mar 25, 2026 •

edited

Loading

This 1-liner is now possible in latest Go:

s.ErrorAs(err, new(*serviceerror.NotFound))

That might be actually the best advice IMO.

.github/copilot-instructions.md Outdated

+              - Never call testify assertions (`s.NoError`, `s.Equal`, `require.NoError`, even `assert.NoError`) inside a `go func()` — if the goroutine outlives the test, the assertion panics the binary with `panic: Fail in goroutine after TestXxx has completed`. Move assertions to the test goroutine or use a buffered error channel.
+              - Any `<-ch` that isn't inside a `select` with `ctx.Done()` will hang indefinitely if the sender never sends. Always provide a context cancellation fallback.
+              - Never write to package-level or global variables in tests — parallel tests share the same process; thread values through function parameters instead.
+              - Proto message fields accessed outside the workflow lock must be cloned, not aliased: use `common.CloneProto(...)` rather than returning the pointer directly.

Contributor

stephanos Mar 25, 2026

Is this test specific? or does it belong to Concurrency and Safety?

.github/copilot-instructions.md Outdated

+              - Never write to package-level or global variables in tests — parallel tests share the same process; thread values through function parameters instead.
+              - Proto message fields accessed outside the workflow lock must be cloned, not aliased: use `common.CloneProto(...)` rather than returning the pointer directly.
+              - Never use `time.Sleep` or `time.Since(start) > threshold` to enforce ordering — use channels, `sync.WaitGroup`, or `EventuallyWithT` instead.
+              - Do not use `RealTimeSource` with a zero-duration delay in unit tests — it creates an infinite spin loop. Use a mock/event time source with a non-zero delay and advance the clock explicitly.

Contributor

stephanos Mar 25, 2026

I don't quite follow that one; is that referring to NewTimer?

Contributor Author

spkane31 Mar 26, 2026

I'm removing this one, it was a one-off that I don't think happens frequently enough for a rule

.github/copilot-instructions.md Outdated

+              - Proto message fields accessed outside the workflow lock must be cloned, not aliased: use `common.CloneProto(...)` rather than returning the pointer directly.
+              - Never use `time.Sleep` or `time.Since(start) > threshold` to enforce ordering — use channels, `sync.WaitGroup`, or `EventuallyWithT` instead.
+              - Do not use `RealTimeSource` with a zero-duration delay in unit tests — it creates an infinite spin loop. Use a mock/event time source with a non-zero delay and advance the clock explicitly.
+              - When combining a short-lived background operation with a propagation wait, verify the timeouts are compatible: if the background op can time out before the condition is satisfied, the propagation wait will never succeed.

Contributor

stephanos Mar 25, 2026

Maybe AI will get it, but I don't :D What's "propagation wait" meaning here?

Contributor Author

spkane31 Mar 26, 2026

Meaning an Eventually that is waiting on an async operation to take affect. Rewording to be more clear


          review comments

be41b64

spkane31 requested a review from a team as a code owner

March 26, 2026 15:41


          merge conflicts

66ca9c0

spkane31 requested a review from stephanos

March 30, 2026 19:13

stephanos approved these changes

View reviewed changes

.github/copilot-instructions.md Outdated

               - Prefer `sync.Mutex` over `sync.RWMutex` almost always, except when reads are much more common than writes (>1000×) or readers hold the lock for significant time
               - Don't do IO while holding locks - use side effect tasks
               - Clone data before releasing locks if it might be modified
+              - Be suspicious of `go s.someHelper(ctx, ...)` calls where the goroutine runs exactly once and the test then immediately waits for something that helper was supposed to cause. If the operation can fail transiently (network, tight deadline, busy CI), the single attempt may fail silently and the wait will never succeed. Either loop the goroutine until `ctx.Done()`, or check that the operation succeeded before proceeding.

Contributor

stephanos Mar 30, 2026

nit: should this be in ## 3. Testify Suite Correctness and Reliability ?


          linters and last comment

7e0ba6a

spkane31 enabled auto-merge (squash)

March 30, 2026 19:21

spkane31 merged commit 820d7d2 into main

46 checks passed

spkane31 deleted the spk/fix-common-testing-issues branch

March 30, 2026 19:52

ShahabT mentioned this pull request

Serverless Feature Integration #9779

Merged

chaptersix pushed a commit to chaptersix/temporal that referenced this pull request


          Update agent code review guidelines flaky test findings (temporalio#9669

f801e06

)

## What changed?
Add new review guidelines for test reliability and fix instances found
in the current codebase.

## Why?
I have found these to be the most common patterns across the last two
weeks of targeting flaky tests.

## How did you test it?
- [ ] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)

## Potential risks
Minimal

chaptersix pushed a commit to chaptersix/temporal that referenced this pull request


          Update agent code review guidelines flaky test findings (temporalio#9669

9e5dfeb

)

## What changed?
Add new review guidelines for test reliability and fix instances found
in the current codebase.

## Why?
I have found these to be the most common patterns across the last two
weeks of targeting flaky tests.

## How did you test it?
- [ ] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)

## Potential risks
Minimal

chaptersix pushed a commit that referenced this pull request


          Update agent code review guidelines flaky test findings (#9669)

## What changed?
Add new review guidelines for test reliability and fix instances found
in the current codebase.

## Why?
I have found these to be the most common patterns across the last two
weeks of targeting flaky tests.

## How did you test it?
- [ ] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)

## Potential risks
Minimal

chaptersix pushed a commit to chaptersix/temporal that referenced this pull request


          Update agent code review guidelines flaky test findings (temporalio#9669

5f04d8e

)

## What changed?
Add new review guidelines for test reliability and fix instances found
in the current codebase.

## Why?
I have found these to be the most common patterns across the last two
weeks of targeting flaky tests.

## How did you test it?
- [ ] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)

## Potential risks
Minimal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet