Skip to content

chore: improve freeport usage#2388

Merged
ysmolski merged 24 commits intomainfrom
yury/fix_ports_flakiness
Dec 9, 2025
Merged

chore: improve freeport usage#2388
ysmolski merged 24 commits intomainfrom
yury/fix_ports_flakiness

Conversation

@ysmolski
Copy link
Copy Markdown
Contributor

@ysmolski ysmolski commented Dec 3, 2025

The "makeSafeHttpTestServer" function creates server for testing using
"httptest.NewUnstartedServer". That method creates listener on ephemeral port.
Before this PR, we were opening another listener using port returned by freeport,
and closing old ephemeral listener. Freeport by itself also checked if port is free
by starting listening on it and then closing it immediately after.
What often has failed is the listening on the port that was just used in another listener.

I have removed the usage of freeport in "makeSafeHttpTestServer" and started
using just ephemeral ports. "freeport" itself mentions that ephemeral ports should be
used whenever possible.

I moved freeport.GetOne closer to actual usage for the router listener.
I vendored freeport into our codebase.

In the future, we should not use freeport if we can use ephemeral ports.
The range of ephemeral ports contains at least 15k ports, and that should
be enought for our needs if we propely close all the connections. Freeport
has only 1 block of 1500 port available and uses tons of resources to check
if the port were free finally.

Closes ENG-8627
Closes ENG-6922

Summary by CodeRabbit

  • Refactor

    • Tests now use ephemeral/on‑demand ports throughout to avoid port conflicts.
    • CDN setup API updated to return both server instance and allocated port.
  • Tests

    • Added a retry framework for resilient test runs with configurable strategies, per-attempt logging and cleanup semantics.
    • Added a robust test port allocation utility with platform-aware ephemeral-range handling and extensive tests.
  • Chores

    • Replaced external Consul dependency with local test utilities and added a license file.

✏️ Tip: You can customize this high-level summary in your review settings.

Also I moved picking the getting of ports for router server
closer to actual usage.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Dec 3, 2025

Walkthrough

Replaces the external HashiCorp freeport dependency with an in-repo router-tests/freeport package, adds a retry test helper package, switches tests to ephemeral/on-demand ports, updates test helpers (CDN/server wiring and gRPC/http startup), and adds platform-specific ephemeral-port detection and tests.

Changes

Cohort / File(s) Change Summary
Testenv & port wiring
router-tests/testenv/testenv.go, router-tests/testenv/testexec.go
Switched from external freeport import to local router-tests/freeport; changed SetupCDNServer signature to return (server, port) and updated callers to use returned port; moved to ephemeral/on-demand port allocation.
HTTP & JWKS helpers
router-tests/jwks/jwks.go
Removed manual net listener + external freeport usage; start servers via httptest.NewUnstartedServer(...); server.Start() and derive ports from the started server.
Tests updated to local freeport & goleak ignores
router-tests/router_compatibility_version_config_poller_test.go, router-tests/lifecycle/supervisor_test.go, router-tests/lifecycle/shutdown_test.go
Removed direct consul freeport imports; use testenv.SetupCDNServer(t) return values; updated goleak ignores to reference router-tests/freeport.checkFreedPorts.
New freeport core package
router-tests/freeport/freeport.go, router-tests/freeport/freeport_test.go, router-tests/freeport/LICENSE
Added local freeport implementing port-block reservation, Take/GetN/GetOne/Return API, system-wide listener lock, background freed-port checks, logging and extensive tests; included MPL-2.0 LICENSE file.
Freeport platform ephemerals & limits
router-tests/freeport/ephemeral_linux.go, router-tests/freeport/ephemeral_linux_test.go, router-tests/freeport/ephemeral_darwin.go, router-tests/freeport/ephemeral_darwin_test.go, router-tests/freeport/ephemeral_fallback.go, router-tests/freeport/systemlimit.go, router-tests/freeport/systemlimit_windows.go
Added OS-specific ephemeral-port range detection for Linux (/proc) and Darwin (sysctl), a fallback for other OSes, non-Windows system-limit detection and Windows stub, plus platform tests.
New retry package (core)
router-tests/freeport/retry/retry.go, router-tests/freeport/retry/run.go, router-tests/freeport/retry/retry_test.go
Implemented retry.R wrapping a TestingTB with per-attempt state, panic recovery, cleanup semantics, output aggregation, Run/RunWith/Option APIs, and comprehensive tests.
Retry utilities & types
router-tests/freeport/retry/retryer.go, router-tests/freeport/retry/counter.go, router-tests/freeport/retry/timer.go, router-tests/freeport/retry/interface.go, router-tests/freeport/retry/output.go, router-tests/freeport/retry/doc.go
Added Retryer interface, Counter and Timer implementations, TestingTB interface abstraction, log/output helpers (dedup, decorate) and package documentation/examples.
Module deps
router-tests/go.mod
Removed github.com/hashicorp/consul/sdk dependency and related replace; added golang.org/x/sys v0.37.0.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Attention areas:
    • router-tests/freeport/freeport.go: concurrency, mutex/cond usage, background goroutine (checkFreedPorts), listener-based locking, port allocation/release correctness.
    • router-tests/freeport/retry/retry.go: panic recovery semantics, Fail vs FailNow handling, cleanup ordering and per-attempt state merging.
    • Platform parsers: ephemeral_linux.go and ephemeral_darwin.go parsing and error handling.
    • API change impact: SetupCDNServer signature and all updated call sites to ensure ports are used/returned correctly.

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'chore: improve freeport usage' accurately summarizes the main objective of replacing freeport with ephemeral ports throughout the test infrastructure and vendoring freeport into the codebase.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64c70d3 and 3e54a0a.

⛔ Files ignored due to path filters (1)
  • router-tests/go.sum is excluded by !**/*.sum
📒 Files selected for processing (1)
  • router-tests/go.mod (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.23+ minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.25 minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-11-19T15:13:57.821Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.

Applied to files:

  • router-tests/go.mod
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: build_test
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
router-tests/go.mod (1)

38-38: Dependency strategy aligns with PR objectives.

Promoting golang.org/x/sys from an indirect to a direct dependency is sound: it's already transitively required by many dependencies in this module, and explicit version management prevents drift. The freeport package uses only stable POSIX APIs from golang.org/x/sys/unix (specifically Getrlimit() and the Rlimit type) to detect system file descriptor limits, which are well-established and available in v0.37.0.

The removal of the hashicorp/consul/sdk dependency (and its replace directive) simplifies the module and reduces external coupling, consistent with the PR's vendor-freeport approach.

Tip

✨ Issue Enrichment is now available for GitHub issues!

CodeRabbit can now help you manage issues more effectively:

  • Duplicate Detection — Identify similar or duplicate issues
  • Related Issues & PRs — Find relevant issues and PR's from your repository
  • Suggested Assignees — Find the best person to work on the issue
  • Implementation Planning — Generate detailed coding plans for engineers and agents
Disable automatic issue enrichment

To disable automatic issue enrichment, add the following to your .coderabbit.yaml:

issue_enrichment:
  auto_enrich:
    enabled: false

Comment @coderabbitai help to get the list of available commands and usage tips.

@ysmolski ysmolski changed the title chore: use free port instead of :0 chore: use freeport instead of :0 Dec 3, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 3, 2025

Router image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-1ac9858a40d3498d606c42024cc93952296886b9

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router-tests/testenv/testenv.go (1)

636-643: Port allocation for supervisor env matches main env and looks correct

Requesting two free ports and using ports[0] for Prometheus and ports[1] for listenerAddr mirrors the pattern in CreateTestEnv, so the supervisor-based tests should see the same behavior as the regular test env. Only these two indices are used.

Consider a tiny helper like allocateRouterPorts(t testing.TB) (promPort, listenPort int) used by both CreateTestEnv and CreateTestSupervisorEnv to keep this pattern in sync long term—this pattern is duplicated across multiple test setup functions in the file.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5376bd9 and a97e471.

📒 Files selected for processing (2)
  • router-tests/testenv/testenv.go (1 hunks)
  • router-tests/testenv/waitinglistener.go (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/testenv/testenv.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build_test
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_test
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (2)
router-tests/testenv/waitinglistener.go (2)

3-12: Import additions are appropriate and minimal

The added imports are exactly what’s needed for the new listener construction below; no issues or unused imports.


43-54: Switch to free-port based listener address is consistent and correct

The change aligns with established patterns throughout the testenv package where freeport.GetOne(t) is used to allocate dynamic ports. No remaining hard-coded :0 listeners exist in the repository.

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@24c0648). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2388   +/-   ##
=======================================
  Coverage        ?   32.49%           
=======================================
  Files           ?      213           
  Lines           ?    23242           
  Branches        ?        0           
=======================================
  Hits            ?     7553           
  Misses          ?    14749           
  Partials        ?      940           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router-tests/testenv/testexec.go (1)

121-138: Inline free-port listener address looks good; consider normalizing host

Using freeport.GetOne(t) directly in the listenerAddr is a nice simplification and keeps LISTEN_ADDR and RouterURL in sync. One small nit: other test helpers (e.g., JWKS server) bind explicitly to 127.0.0.1, while this uses localhost. For consistency and to avoid host-resolution/IPv6 surprises, you might want to switch this to 127.0.0.1:%d as well, unless you intentionally want hostname resolution here.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a97e471 and d67b654.

📒 Files selected for processing (3)
  • router-tests/jwks/jwks.go (1 hunks)
  • router-tests/testenv/testenv.go (6 hunks)
  • router-tests/testenv/testexec.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • router-tests/testenv/testenv.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_test
  • GitHub Check: build_push_image
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
router-tests/jwks/jwks.go (1)

170-178: Custom free-port listener wired into httptest.Server correctly

The switch to net.Listen("tcp", fmt.Sprintf("127.0.0.1:%d", freeport.GetOne(t))) and assigning that listener to the unstarted httptest.Server before Start() looks correct and aligns with the new free-port strategy. Lifetime is managed via httpServer.Close() in Server.Close, so there’s no resource leak here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
router-tests/freeport/freeport_test.go (1)

14-208: Consider removing or enabling this commented-out test code.

This large block of commented-out test code adds noise and maintenance burden. If the tests are no longer needed, remove them. If they're intended for future use, consider tracking with a TODO or issue reference.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d67b654 and 0be11dc.

⛔ Files ignored due to path filters (1)
  • router-tests/go.sum is excluded by !**/*.sum
📒 Files selected for processing (17)
  • router-tests/freeport/LICENSE (1 hunks)
  • router-tests/freeport/ephemeral_darwin.go (1 hunks)
  • router-tests/freeport/ephemeral_darwin_test.go (1 hunks)
  • router-tests/freeport/ephemeral_fallback.go (1 hunks)
  • router-tests/freeport/ephemeral_linux.go (1 hunks)
  • router-tests/freeport/ephemeral_linux_test.go (1 hunks)
  • router-tests/freeport/freeport.go (1 hunks)
  • router-tests/freeport/freeport_test.go (1 hunks)
  • router-tests/freeport/systemlimit.go (1 hunks)
  • router-tests/freeport/systemlimit_windows.go (1 hunks)
  • router-tests/go.mod (1 hunks)
  • router-tests/jwks/jwks.go (2 hunks)
  • router-tests/lifecycle/supervisor_test.go (1 hunks)
  • router-tests/router_compatibility_version_config_poller_test.go (1 hunks)
  • router-tests/testenv/testenv.go (8 hunks)
  • router-tests/testenv/testexec.go (2 hunks)
  • router-tests/testenv/waitinglistener.go (2 hunks)
✅ Files skipped from review due to trivial changes (2)
  • router-tests/freeport/LICENSE
  • router-tests/router_compatibility_version_config_poller_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • router-tests/testenv/waitinglistener.go
  • router-tests/testenv/testexec.go
  • router-tests/testenv/testenv.go
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/freeport/ephemeral_linux_test.go
  • router-tests/freeport/ephemeral_darwin_test.go
  • router-tests/freeport/freeport_test.go
  • router-tests/lifecycle/supervisor_test.go
📚 Learning: 2025-07-21T14:46:34.879Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2067
File: router/pkg/authentication/jwks_token_decoder.go:80-106
Timestamp: 2025-07-21T14:46:34.879Z
Learning: In the Cosmo router project, required field validation for JWKS configuration (Secret, Algorithm, KeyId) is handled at the JSON schema level in config.schema.json rather than through runtime validation in the Go code at router/pkg/authentication/jwks_token_decoder.go.

Applied to files:

  • router-tests/jwks/jwks.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/jwks/jwks.go
  • router-tests/go.mod
  • router-tests/lifecycle/supervisor_test.go
📚 Learning: 2025-07-21T15:06:36.664Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2067
File: router/pkg/config/config.schema.json:1637-1644
Timestamp: 2025-07-21T15:06:36.664Z
Learning: In the Cosmo router project, when extending JSON schema validation for security-sensitive fields like JWKS secrets, backwards compatibility is maintained by implementing warnings in the Go code rather than hard validation constraints in the schema. This allows existing configurations to continue working while alerting users to potential security issues.

Applied to files:

  • router-tests/jwks/jwks.go
📚 Learning: 2025-07-29T08:19:55.720Z
Learnt from: Noroth
Repo: wundergraph/cosmo PR: 2088
File: demo/pkg/subgraphs/projects/src/main_test.go:0-0
Timestamp: 2025-07-29T08:19:55.720Z
Learning: In Go testing, t.Fatal, t.FailNow, t.Skip* and similar methods cannot be called from goroutines other than the main test goroutine, as they will cause a runtime panic. Use channels, t.Error*, or other synchronization mechanisms to communicate errors from goroutines back to the main test thread.

Applied to files:

  • router-tests/freeport/freeport_test.go
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.23+ minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.25 minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-11-19T15:13:57.821Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router-tests/lifecycle/supervisor_test.go
🧬 Code graph analysis (4)
router-tests/freeport/ephemeral_linux_test.go (1)
router-tests/freeport/ephemeral_darwin_test.go (1)
  • TestGetEphemeralPortRange (12-21)
router-tests/freeport/ephemeral_darwin_test.go (1)
router-tests/freeport/ephemeral_linux_test.go (1)
  • TestGetEphemeralPortRange (12-21)
router-tests/jwks/jwks.go (1)
router-tests/freeport/freeport.go (1)
  • GetOne (496-499)
router-tests/freeport/freeport_test.go (1)
router-tests/freeport/freeport.go (7)
  • LogLevel (292-292)
  • DEBUG (295-295)
  • INFO (296-296)
  • WARN (297-297)
  • ERROR (298-298)
  • DISABLED (299-299)
  • SetLogLevel (319-328)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_test
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_push_image
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (14)
router-tests/freeport/ephemeral_linux.go (1)

1-38: Linux ephemeral port range reader is robust and aligned with /proc format

Reading /proc/sys/net/ipv4/ip_local_port_range, parsing with an anchored regexp, and validating both Atoi conversions gives correct behavior and clear failure when the value is unexpected. No changes needed.

router-tests/freeport/ephemeral_linux_test.go (1)

1-21: Linux ephemeral range test provides adequate sanity coverage

The test mirrors the Darwin variant, asserts only stable invariants (positive bounds and ordering), and logs the values for inspection. This is sufficient for platform sanity checking.

router-tests/lifecycle/supervisor_test.go (1)

14-17: Updated goleak ignore path correctly tracks in-repo freeport.checkFreedPorts

Pointing IgnoreTopFunction at github.com/wundergraph/cosmo/router-tests/freeport.checkFreedPorts preserves the intended leak-ignore behavior after vendoring the helper; the added comments make the rationale explicit.

router-tests/freeport/ephemeral_darwin_test.go (1)

1-21: Darwin ephemeral range test is consistent with the Linux variant

The test mirrors the Linux version, asserting only basic invariants on the discovered range and logging the result, which is appropriate for platform sanity checking.

router-tests/freeport/systemlimit.go (1)

1-14: systemLimit correctly wraps RLIMIT_NOFILE; caller appropriately handles errors and zero values

The implementation is sound. The caller in initialize() properly handles both error cases (panics to fail fast) and zero/negative values (retains default blockSize of 1500 and adjusts only if a meaningful limit is detected). No changes needed.

router-tests/go.mod (1)

38-38: Direct x/sys dependency matches new freeport/systemlimit usage

Adding golang.org/x/sys v0.37.0 as a direct requirement is appropriate—router-tests/freeport/systemlimit.go imports unix and uses unix.Rlimit, unix.Getrlimit, and unix.RLIMIT_NOFILE. No remaining consul SDK imports found in the codebase; the string reference in shutdown_test.go (goleak ignore function) is not a code dependency.

router-tests/freeport/systemlimit_windows.go (1)

1-10: Sentinel value of 0 for Windows systemLimit is correctly handled

The code explicitly checks if limit > 0 && limit < blockSize (line 111 in freeport.go) before applying system limit constraints to blockSize. On Windows, systemLimit() returns (0, nil), which correctly skips the constraint adjustment and allows the default blockSize of 1500 to proceed. This is the intended behavior for "no meaningful system limit available."

router-tests/freeport/ephemeral_fallback.go (1)

1-10: Clarify how callers interpret a (0,0,nil) ephemeral range on non-Linux/Darwin

Returning (0, 0, nil) is fine as long as callers consistently treat min == 0 || max == 0 as "no ephemeral range information available" and fall back appropriately, rather than using 0 as a valid port. Verify that all callers of getEphemeralPortRange() handle the sentinel value correctly instead of treating 0 as a valid port number.

router-tests/jwks/jwks.go (1)

16-17: LGTM! Clean migration to the local freeport package.

The import change and inline usage of freeport.GetOne(t) are correct. The new helper automatically handles port cleanup via t.Cleanup(), maintaining proper test lifecycle management.

Also applies to: 172-172

router-tests/freeport/ephemeral_darwin.go (1)

1-40: LGTM! Well-structured Darwin-specific ephemeral port range detection.

The implementation correctly uses an absolute path for sysctl, handles command errors, and parses output with appropriate error handling. The regex safely captures both port values.

router-tests/freeport/freeport_test.go (1)

210-233: LGTM! Good interval overlap test coverage.

The test cases cover key scenarios (identical, serial, overlapping, nested) and correctly verify symmetry by testing both orderings.

router-tests/freeport/freeport.go (3)

102-150: Initialization logic is well-structured.

The initialize() function properly handles system limits, ephemeral range detection, block allocation, and starts the background port checker. Good use of sync.Once for thread-safe lazy initialization.


330-376: Take() correctly handles port theft and waiting.

The implementation properly detects stolen ports (removing them from circulation), waits on condNotEmpty when no ports are available, and handles the edge case of zero remaining ports. Aside from the debug print noted separately, the logic is sound.


472-499: GetN and GetOne provide clean test integration.

These helpers properly track port usage via portLastUser, register cleanup to return ports, and support the TestingT interface. This is the recommended API for test usage.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
router-tests/freeport/freeport_test.go (2)

258-269: Restore os.Stderr after the test to avoid polluting other tests.

The code modifies the global os.Stderr at line 262 but never restores it. The cleanup only closes the read pipe.

 			t.Cleanup(func() {
 				_ = r.Close()
 			})

+			origStderr := os.Stderr
 			os.Stderr = w
+			t.Cleanup(func() {
+				os.Stderr = origStderr
+			})
 			SetLogLevel(tc.level)

328-336: Same issue: restore os.Stderr after the test.

This test also modifies the global os.Stderr without restoration.

 			t.Cleanup(func() {
 				_ = r.Close()
 			})

+			origStderr := os.Stderr
 			os.Stderr = w
+			t.Cleanup(func() {
+				os.Stderr = origStderr
+			})
 			SetLogLevel(tc.level)
router-tests/freeport/freeport.go (1)

351-375: Port leak on partial failure in Take.

If Take(n) successfully allocates some ports but then encounters the "no actual free ports" error (line 354), the already-allocated ports are lost—they're removed from freePorts but never returned to pendingPorts.

This is an edge case (requires external port theft mid-request), but could cause gradual pool exhaustion:

 	for len(ports) < n {
 		for freePorts.Len() == 0 {
 			if total == 0 {
+				// Return any ports we already took before failing
+				for _, p := range ports {
+					pendingPorts.PushBack(p)
+				}
 				return nil, fmt.Errorf("freeport: impossible to satisfy request; there are no actual free ports in the block anymore")
 			}
 			condNotEmpty.Wait()
 		}
🧹 Nitpick comments (1)
router-tests/freeport/freeport.go (1)

449-457: Consider using atomic for logLevel to avoid data race.

logf reads logLevel without holding mu, while SetLogLevel writes it under the lock. This is technically a data race, though in practice the impact is minimal (worst case: a log message is missed or printed unexpectedly during level transition).

If you want to eliminate the race without adding lock contention:

+import "sync/atomic"

-var logLevel LogLevel
+var logLevel atomic.Uint32

 func SetLogLevel(level LogLevel) {
-	mu.Lock()
-	defer mu.Unlock()
 	if level > DISABLED {
 		level = DISABLED
 	}
-	logLevel = level
+	logLevel.Store(uint32(level))
 }

 func logf(severity LogLevel, format string, a ...interface{}) {
 	if severity >= DISABLED {
 		return
 	}
-	if severity >= logLevel {
+	if severity >= LogLevel(logLevel.Load()) {
 		fmt.Fprintf(os.Stderr, "["+severity.String()+"] freeport: "+format+"\n", a...)
 	}
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b020cc and 0082610.

⛔ Files ignored due to path filters (1)
  • router-tests/go.sum is excluded by !**/*.sum
📒 Files selected for processing (3)
  • router-tests/freeport/freeport.go (1 hunks)
  • router-tests/freeport/freeport_test.go (1 hunks)
  • router-tests/go.mod (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • router-tests/go.mod
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/freeport/freeport_test.go
📚 Learning: 2025-07-29T08:19:55.720Z
Learnt from: Noroth
Repo: wundergraph/cosmo PR: 2088
File: demo/pkg/subgraphs/projects/src/main_test.go:0-0
Timestamp: 2025-07-29T08:19:55.720Z
Learning: In Go testing, t.Fatal, t.FailNow, t.Skip* and similar methods cannot be called from goroutines other than the main test goroutine, as they will cause a runtime panic. Use channels, t.Error*, or other synchronization mechanisms to communicate errors from goroutines back to the main test thread.

Applied to files:

  • router-tests/freeport/freeport_test.go
🧬 Code graph analysis (1)
router-tests/freeport/freeport_test.go (1)
router-tests/freeport/freeport.go (9)
  • Take (336-375)
  • Return (407-420)
  • LogLevel (292-292)
  • DEBUG (295-295)
  • INFO (296-296)
  • WARN (297-297)
  • ERROR (298-298)
  • DISABLED (299-299)
  • SetLogLevel (319-328)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: build_push_image
  • GitHub Check: build_test
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./events)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (5)
router-tests/freeport/freeport_test.go (2)

17-55: LGTM! Well-structured test setup.

The TestTakeReturn test properly uses defer reset() to clean up global state and defines a reusable waitForStatsReset helper using the retry library. The goroutine at lines 157-160 correctly uses a channel to communicate results back to the main test goroutine rather than calling t.Fatal from within, which is the correct pattern per Go testing requirements.


213-236: LGTM!

The test correctly verifies the symmetric property of intervalOverlap by testing both argument orderings and covers a good range of cases.

router-tests/freeport/freeport.go (3)

102-150: LGTM!

The initialization logic is well-designed with proper use of sync.Once for thread-safe lazy initialization, system limit checking, and ephemeral port range avoidance.


471-498: LGTM!

The GetN and GetOne functions provide a clean test-friendly API with automatic cleanup via t.Cleanup. The portLastUser tracking is helpful for debugging leaked ports.


405-420: LGTM!

The boundary validation correctly filters ports to only those within the allocated block.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
router-tests/testenv/testenv.go (4)

651-669: MCP port not released on error path.

The MCP port allocated at Line 652 is not included in the freeport.Return call at Line 667 when configureRouter fails. This causes a port leak in the error path.

Apply this diff to fix the leak:

+	var mcpPort int
 	if cfg.MCP.Enabled {
-		cfg.MCP.Server.ListenAddr = fmt.Sprintf("localhost:%d", freeport.GetOne(t))
+		mcpPort = freeport.GetOne(t)
+		cfg.MCP.Server.ListenAddr = fmt.Sprintf("localhost:%d", mcpPort)
 	}
 
 	routerPort := freeport.GetOne(t)
 	listenerAddr := fmt.Sprintf("localhost:%d", routerPort)
 	baseURL := fmt.Sprintf("http://%s", listenerAddr)
 
 	rs, err := core.NewRouterSupervisor(&core.RouterSupervisorOpts{
 		BaseLogger: cfg.Logger,
 		ConfigFactory: func() (*config.Config, error) {
 			return &config.Config{}, nil
 		},
 		RouterFactory: func(ctx context.Context, res *core.RouterResources) (*core.Router, error) {
 			rr, err := configureRouter(listenerAddr, cfg, &routerConfig, cdnServer, natsSetup)
 			if err != nil {
-				freeport.Return([]int{cfg.PrometheusPort, routerPort})
+				freeport.Return([]int{cfg.PrometheusPort, routerPort, mcpPort})
 				cancel(err)
 				return nil, err
 			}

1075-1127: MCP port not released on error paths.

The MCP port allocated at Line 1076 is not included in the freeport.Return calls at Lines 1083 and 1124 when errors occur. This causes a port leak in both error paths.

Apply this diff to fix the leak:

+	var mcpPort int
 	if cfg.MCP.Enabled {
-		cfg.MCP.Server.ListenAddr = fmt.Sprintf("localhost:%d", freeport.GetOne(t))
+		mcpPort = freeport.GetOne(t)
+		cfg.MCP.Server.ListenAddr = fmt.Sprintf("localhost:%d", mcpPort)
 	}
 
 	routerPort := freeport.GetOne(t)
 	listenerAddr := fmt.Sprintf("localhost:%d", routerPort)
 	rr, err := configureRouter(listenerAddr, cfg, &routerConfig, cdnServer, natsSetup)
 	if err != nil {
-		freeport.Return([]int{cfg.PrometheusPort, routerPort})
+		freeport.Return([]int{cfg.PrometheusPort, routerPort, mcpPort})
 		cancel(err)
 		return nil, err
 	}
 
 	// ... TLS config setup ...
 
 	if err := rr.Start(ctx); err != nil {
-		freeport.Return([]int{cfg.PrometheusPort, routerPort})
+		freeport.Return([]int{cfg.PrometheusPort, routerPort, mcpPort})
 		cancel(err)
 		return nil, err
 	}

1665-1676: Return allocated port if listener creation fails.

If net.Listen fails at Line 1668, the port allocated at Line 1667 is leaked before the test terminates. While the test will fail, releasing the port before calling t.Fatalf follows best practices and prevents port exhaustion during test runs.

Apply this diff:

 func makeSafeHttpTestServer(t testing.TB, handler http.Handler) *httptest.Server {
 	s := httptest.NewUnstartedServer(handler)
-	l, err := net.Listen("tcp", fmt.Sprintf("127.0.0.1:%d", freeport.GetOne(t)))
+	port := freeport.GetOne(t)
+	l, err := net.Listen("tcp", fmt.Sprintf("127.0.0.1:%d", port))
 	if err != nil {
+		freeport.Return([]int{port})
 		t.Fatalf("could not listen on port: %s", err.Error())
 	}
 	_ = s.Listener.Close()
 	s.Listener = l
 	s.Start()
 
 	return s
 }

1678-1696: Return allocated port if listener creation fails.

If net.Listen fails at Line 1682, the port allocated at Line 1681 is leaked before the test terminates. Release the port before calling require.NoError to prevent port exhaustion during test runs.

Apply this diff:

 func makeSafeGRPCServer(t testing.TB, sd *grpc.ServiceDesc, service any) (*grpc.Server, string) {
 	t.Helper()
 
-	endpoint := fmt.Sprintf("localhost:%d", freeport.GetOne(t))
+	port := freeport.GetOne(t)
+	endpoint := fmt.Sprintf("localhost:%d", port)
 	lis, err := net.Listen("tcp", endpoint)
+	if err != nil {
+		freeport.Return([]int{port})
+	}
 	require.NoError(t, err)
 
 	require.NotNil(t, service)
🧹 Nitpick comments (1)
router-tests/testenv/testenv.go (1)

100-102: Consider making the debug log level configurable or documenting the intent.

Setting freeport.DEBUG unconditionally in the init function may produce verbose logs for all tests. If this is intentional for debugging port-related flakiness, consider documenting it with a comment. Alternatively, make it configurable via an environment variable.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0082610 and e802674.

📒 Files selected for processing (1)
  • router-tests/testenv/testenv.go (10 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-19T15:08:03.085Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2223
File: router/cmd/main.go:32-33
Timestamp: 2025-09-19T15:08:03.085Z
Learning: pyroscope-go's Config.ServerAddress field accepts both full URLs (e.g., "http://localhost:4040") and simple host:port format (e.g., "localhost:4040"). The library handles URL normalization internally, so both formats work correctly.

Applied to files:

  • router-tests/testenv/testenv.go
🧬 Code graph analysis (1)
router-tests/testenv/testenv.go (1)
router-tests/freeport/freeport.go (4)
  • SetLogLevel (319-328)
  • DEBUG (295-295)
  • GetOne (495-498)
  • Return (407-420)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: integration_test (./events)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_test
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: build_push_image
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router-tests/testenv/testenv.go (1)

59-102: Freeport import and global DEBUG log level look fine

Switching to the local router-tests/freeport package and setting its log level to DEBUG in init is consistent with the new allocator usage and should help diagnose port issues across tests. If CI logs become too noisy over time, consider dialing this back to INFO/ERROR later, but as-is this is acceptable for test code.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e802674 and 0833c34.

📒 Files selected for processing (1)
  • router-tests/testenv/testenv.go (8 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-19T15:08:03.085Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2223
File: router/cmd/main.go:32-33
Timestamp: 2025-09-19T15:08:03.085Z
Learning: pyroscope-go's Config.ServerAddress field accepts both full URLs (e.g., "http://localhost:4040") and simple host:port format (e.g., "localhost:4040"). The library handles URL normalization internally, so both formats work correctly.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-19T15:08:03.085Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2223
File: router/cmd/main.go:32-33
Timestamp: 2025-09-19T15:08:03.085Z
Learning: Based on direct user experience from endigma (PR author), pyroscope-go's Config.ServerAddress field accepts simple host:port format (e.g., "localhost:4040") in practice, even though the official documentation primarily shows full URL examples with schemes (e.g., "http://localhost:4040"). The library likely handles URL normalization internally.

Applied to files:

  • router-tests/testenv/testenv.go
🧬 Code graph analysis (1)
router-tests/testenv/testenv.go (1)
router-tests/freeport/freeport.go (3)
  • SetLogLevel (319-328)
  • DEBUG (295-295)
  • GetOne (495-498)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (3)
router-tests/testenv/testenv.go (3)

631-656: Supervisor test env: on-demand freeport usage is correct and localized

Using freeport.GetOne(t) directly for the CDN server, Prometheus port, MCP listen address, and listenerAddr keeps port selection close to where each listener is actually bound, which should reduce flakiness from pre-allocated blocks. The same listenerAddr is passed into configureRouter and used for BaseURL, so router URL and listener stay consistent.


1053-1078: Regular test env: consistent freeport-based port allocation

The same pattern of allocating a CDN port, Prometheus port, MCP listen address, and router listenerAddr via freeport.GetOne(t) in CreateTestEnv mirrors the supervisor env and keeps concerns nicely localized. This should play well with the metrics configuration (PrometheusPort only set when a registry is provided) and with core.WithListenerAddr(listenerAddr).


1660-1691: HTTP/GRPC helpers now use freeport directly; behavior looks correct

makeSafeHttpTestServer and makeSafeGRPCServer now listen on 127.0.0.1/localhost with ports from freeport.GetOne(t), preserving the previous “explicit listener then attach to httptest.Server/gRPC server” pattern while integrating with the shared allocator. Error handling still fails the test on listen errors, and the returned endpoints remain consistent.

@ysmolski ysmolski force-pushed the yury/fix_ports_flakiness branch from 2ad7e03 to d73143e Compare December 4, 2025 16:39
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
router-tests/freeport/retry/retry_test.go (2)

59-59: Reversed argument order in testify assertions.

assert.Equal expects (expected, actual) but you have (actual, expected). This affects error message clarity when tests fail. The same pattern appears on lines 76 and 78.

-		assert.Equal(t, i, 2)
+		assert.Equal(t, 2, i)

131-133: Remove empty test function.

TestCleanup_Passthrough is empty. The passthrough behavior is already tested in TestCleanup/passthrough-to-t (lines 214-244). Remove this placeholder to avoid confusion.

-func TestCleanup_Passthrough(t *testing.T) {
-
-}
-
router-tests/freeport/retry/retry.go (1)

156-166: Redundant length check inside the loop.

The if len(a.cleanups) > 0 check on line 158 is redundant since the for loop condition already guarantees this. However, the overall panic-safe recursive cleanup pattern is well-implemented.

 	for len(a.cleanups) > 0 {
 		var cleanup func()
-		if len(a.cleanups) > 0 {
-			last := len(a.cleanups) - 1
-			cleanup = a.cleanups[last]
-			a.cleanups = a.cleanups[:last]
-		}
+		last := len(a.cleanups) - 1
+		cleanup = a.cleanups[last]
+		a.cleanups = a.cleanups[:last]
 		if cleanup != nil {
 			cleanup()
 		}
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6d7d768 and d73143e.

⛔ Files ignored due to path filters (1)
  • router-tests/go.sum is excluded by !**/*.sum
📒 Files selected for processing (11)
  • router-tests/freeport/freeport_test.go (1 hunks)
  • router-tests/freeport/retry/counter.go (1 hunks)
  • router-tests/freeport/retry/doc.go (1 hunks)
  • router-tests/freeport/retry/interface.go (1 hunks)
  • router-tests/freeport/retry/output.go (1 hunks)
  • router-tests/freeport/retry/retry.go (1 hunks)
  • router-tests/freeport/retry/retry_test.go (1 hunks)
  • router-tests/freeport/retry/retryer.go (1 hunks)
  • router-tests/freeport/retry/run.go (1 hunks)
  • router-tests/freeport/retry/timer.go (1 hunks)
  • router-tests/go.mod (1 hunks)
🧰 Additional context used
🧠 Learnings (8)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/freeport/freeport_test.go
  • router-tests/freeport/retry/retry_test.go
  • router-tests/freeport/retry/interface.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/freeport/freeport_test.go
  • router-tests/go.mod
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.23+ minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.25 minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-11-19T15:13:57.821Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-07-29T08:19:55.720Z
Learnt from: Noroth
Repo: wundergraph/cosmo PR: 2088
File: demo/pkg/subgraphs/projects/src/main_test.go:0-0
Timestamp: 2025-07-29T08:19:55.720Z
Learning: In Go testing, t.Fatal, t.FailNow, t.Skip* and similar methods cannot be called from goroutines other than the main test goroutine, as they will cause a runtime panic. Use channels, t.Error*, or other synchronization mechanisms to communicate errors from goroutines back to the main test thread.

Applied to files:

  • router-tests/freeport/retry/retry_test.go
  • router-tests/freeport/retry/doc.go
  • router-tests/freeport/retry/retry.go
  • router-tests/freeport/retry/interface.go
📚 Learning: 2025-08-26T17:19:28.178Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.

Applied to files:

  • router-tests/freeport/retry/timer.go
📚 Learning: 2025-09-17T20:50:58.518Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/internal/expr/retry_expression.go:63-67
Timestamp: 2025-09-17T20:50:58.518Z
Learning: In router/internal/expr/retry_expression.go, the maintainer SkArchon accepts the "silent no-retry" behavior when a retry expression program isn't precompiled (returning false, nil on cache miss). This is considered acceptable because config reloading is intended as a dev feature, making the potential footgun less problematic in the expected usage context.

Applied to files:

  • router-tests/freeport/retry/retry.go
  • router-tests/freeport/retry/output.go
🧬 Code graph analysis (5)
router-tests/freeport/retry/retryer.go (2)
router-tests/freeport/retry/timer.go (1)
  • Timer (10-18)
router-tests/freeport/retry/counter.go (1)
  • Counter (10-15)
router-tests/freeport/freeport_test.go (1)
router-tests/freeport/freeport.go (9)
  • Take (336-375)
  • Return (407-420)
  • LogLevel (292-292)
  • DEBUG (295-295)
  • INFO (296-296)
  • WARN (297-297)
  • ERROR (298-298)
  • DISABLED (299-299)
  • SetLogLevel (319-328)
router-tests/freeport/retry/retry_test.go (6)
router-tests/freeport/retry/retryer.go (1)
  • Retryer (10-14)
router-tests/freeport/retry/counter.go (1)
  • Counter (10-15)
router-tests/freeport/retry/timer.go (1)
  • Timer (10-18)
router-tests/freeport/retry/run.go (4)
  • Run (31-43)
  • RunWith (45-48)
  • WithImmediateCleanup (25-29)
  • WithRetryer (8-12)
router-tests/freeport/retry/retry.go (1)
  • R (13-22)
router-tests/freeport/retry/interface.go (1)
  • TestingTB (20-35)
router-tests/freeport/retry/run.go (3)
router-tests/freeport/retry/retry.go (1)
  • R (13-22)
router-tests/freeport/retry/retryer.go (2)
  • Retryer (10-14)
  • DefaultRetryer (18-20)
router-tests/freeport/retry/interface.go (1)
  • TestingTB (20-35)
router-tests/freeport/retry/interface.go (2)
router-tests/freeport/freeport.go (1)
  • TestingT (464-469)
router/pkg/pubsub/datasource/error.go (1)
  • Error (3-6)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_test
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: Analyze (go)
🔇 Additional comments (14)
router-tests/go.mod (1)

38-38: LGTM! Dependency change aligns with the migration.

Moving golang.org/x/sys from indirect to direct dependency is appropriate, as the new local freeport package likely uses it directly for platform-specific port allocation functionality.

router-tests/freeport/retry/output.go (2)

13-28: LGTM! Correct deduplication logic.

The function correctly removes duplicates while preserving order using a map for tracking and returns a newline-separated string.


30-42: LGTM! Standard caller decoration pattern.

The use of runtime.Caller(3) is a standard pattern for test utilities to decorate messages with file and line information. The fallback to "???" and line 1 properly handles the case when caller information is unavailable.

router-tests/freeport/retry/interface.go (2)

11-15: LGTM! Effective compile-time assertions.

The nil sentinel with type assertions ensures at compile time that TestingTB satisfies testify's require.TestingT and assert.TestingT interfaces, preventing runtime surprises.


17-35: LGTM! Complete testing interface abstraction.

The interface provides a complete abstraction over testing.TB, enabling use with alternative testing frameworks like Ginkgo as documented in the comment.

router-tests/freeport/retry/timer.go (1)

20-30: LGTM! Correct retry timing logic.

The method correctly:

  • Initializes the deadline on first invocation
  • Returns false once the timeout is exceeded
  • Sleeps between subsequent attempts (not before the first attempt)
router-tests/freeport/retry/doc.go (1)

4-22: LGTM! Clear documentation with important warnings.

The package documentation provides a clear usage example and prominently warns about the different behavior of Fatal/FailNow compared to *testing.T, which is critical for users to understand.

router-tests/freeport/retry/counter.go (1)

17-26: LGTM! Correct count-based retry logic.

The method correctly implements fixed-count retries with appropriate sleep timing between attempts.

router-tests/freeport/retry/retryer.go (1)

8-36: LGTM! Well-designed retry interface and factories.

The Retryer interface with the provided factory functions offers flexible retry strategies suitable for different test scenarios. The timeout and wait values are appropriate for their documented use cases.

router-tests/freeport/retry/run.go (1)

1-48: Well-structured functional options implementation.

The functional options pattern is cleanly implemented with appropriate documentation for WithImmediateCleanup. The Run and RunWith entry points correctly call t.Helper() for proper test output attribution.

router-tests/freeport/retry/retry_test.go (1)

247-263: Minimal fakeT implementation is sufficient for test purposes.

The fakeT type correctly overrides the methods needed to verify retry behavior without actually failing the test. The compile-time interface check at line 263 is good practice.

router-tests/freeport/retry/retry.go (3)

88-104: Clean implementation of environment variable management.

The Setenv method correctly preserves the original state (existing value or unset) and registers appropriate cleanup. This matches the semantics of testing.T.Setenv.


169-204: Retry orchestration logic is well-designed.

The panic recovery correctly distinguishes attemptFailed{} sentinel panics from genuine panics, re-propagating the latter. The run loop properly handles all termination conditions: explicit stop, success, and retry exhaustion. The defer ordering ensures cleanups run before panic handling.


206-222: Failure recording correctly aggregates output.

The fullOutput option provides useful debugging capability by combining output from all attempts. The final wrapped.FailNow() correctly propagates the failure to the underlying test.

@ysmolski ysmolski force-pushed the yury/fix_ports_flakiness branch from d73143e to e25723b Compare December 4, 2025 16:45
@ysmolski ysmolski changed the title chore: use freeport instead of :0 chore: improve freeport usage Dec 4, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router-tests/testenv/testenv.go (1)

1658-1669: Remove commented-out code.

The commented lines 1660-1665 are now obsolete since httptest.NewUnstartedServer with s.Start() automatically allocates an ephemeral port. Removing dead code improves maintainability.

Apply this diff:

 func makeSafeHttpTestServer(t testing.TB, handler http.Handler) *httptest.Server {
 	s := httptest.NewUnstartedServer(handler)
-	// l, err := net.Listen("tcp", fmt.Sprintf("127.0.0.1:%d", freeport.GetOne(t)))
-	// if err != nil {
-	// 	t.Fatalf("could not listen on port: %s", err.Error())
-	// }
-	// _ = s.Listener.Close()
-	// s.Listener = l
 	s.Start()
 
 	return s
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 59e8015 and b76c241.

📒 Files selected for processing (1)
  • router-tests/testenv/testenv.go (8 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-19T15:08:03.085Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2223
File: router/cmd/main.go:32-33
Timestamp: 2025-09-19T15:08:03.085Z
Learning: pyroscope-go's Config.ServerAddress field accepts both full URLs (e.g., "http://localhost:4040") and simple host:port format (e.g., "localhost:4040"). The library handles URL normalization internally, so both formats work correctly.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-19T15:08:03.085Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2223
File: router/cmd/main.go:32-33
Timestamp: 2025-09-19T15:08:03.085Z
Learning: Based on direct user experience from endigma (PR author), pyroscope-go's Config.ServerAddress field accepts simple host:port format (e.g., "localhost:4040") in practice, even though the official documentation primarily shows full URL examples with schemes (e.g., "http://localhost:4040"). The library likely handles URL normalization internally.

Applied to files:

  • router-tests/testenv/testenv.go
🧬 Code graph analysis (1)
router-tests/testenv/testenv.go (1)
router-tests/freeport/freeport.go (2)
  • SetLogLevel (319-328)
  • GetOne (495-498)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_push_image
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (4)
router-tests/testenv/testenv.go (4)

59-59: LGTM! Import updated to internal freeport package.

The migration from HashiCorp's freeport to the internal package is complete, and setting the log level to DEBUG in init() will help diagnose any port allocation issues during test execution. This is a reasonable tradeoff for test environments where verbosity can aid troubleshooting.

Also applies to: 100-102


637-637: LGTM! On-demand port allocation improves timing.

Moving freeport.GetOne(t) calls closer to actual port usage in CreateTestSupervisorEnv reduces the window for port conflicts. The conditional allocation for Prometheus (line 637) ensures ports are only claimed when needed.

Also applies to: 655-655


1059-1059: LGTM! Consistent on-demand allocation pattern.

The same improvements from CreateTestSupervisorEnv are applied here in CreateTestEnv - ports are allocated on-demand and only when required, reducing race conditions.

Also applies to: 1077-1077


1671-1689: LGTM! Excellent use of OS ephemeral port allocation.

Using net.Listen("tcp", "127.0.0.1:0") and deriving the endpoint from lis.Addr().String() eliminates the race condition window entirely - the OS allocates the port atomically, and it's immediately bound to the listener. This is the cleanest approach for port allocation.

@ysmolski ysmolski force-pushed the yury/fix_ports_flakiness branch from c0ce188 to 8e7bdf1 Compare December 5, 2025 15:16
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
router-tests/freeport/freeport.go (1)

120-128: Missing check for effectiveMaxBlocks == 0.

The check on line 120 only handles effectiveMaxBlocks < 0, but if effectiveMaxBlocks == 0, the call to alloc() at line 128 will eventually call seededRand.Int31n(0) at line 269, causing a panic.

-	if effectiveMaxBlocks < 0 {
+	if effectiveMaxBlocks <= 0 {
 		panic("freeport: no blocks of ports available outside of ephemeral range")
 	}
🧹 Nitpick comments (1)
router-tests/testenv/testenv.go (1)

100-102: Consider a less verbose default log level.

Setting freeport.DEBUG in init() will produce verbose output for every port allocation across all tests. Consider using freeport.WARN or freeport.ERROR for quieter test output, or make this configurable via an environment variable.

 func init() {
-	freeport.SetLogLevel(freeport.DEBUG)
+	freeport.SetLogLevel(freeport.WARN)
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b76c241 and c0ce188.

📒 Files selected for processing (5)
  • router-tests/freeport/freeport.go (1 hunks)
  • router-tests/jwks/jwks.go (0 hunks)
  • router-tests/router_compatibility_version_config_poller_test.go (3 hunks)
  • router-tests/testenv/testenv.go (9 hunks)
  • router-tests/testenv/testexec.go (2 hunks)
💤 Files with no reviewable changes (1)
  • router-tests/jwks/jwks.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • router-tests/testenv/testexec.go
  • router-tests/router_compatibility_version_config_poller_test.go
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/testenv/testenv.go
  • router-tests/freeport/freeport.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-19T15:08:03.085Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2223
File: router/cmd/main.go:32-33
Timestamp: 2025-09-19T15:08:03.085Z
Learning: pyroscope-go's Config.ServerAddress field accepts both full URLs (e.g., "http://localhost:4040") and simple host:port format (e.g., "localhost:4040"). The library handles URL normalization internally, so both formats work correctly.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-19T15:08:03.085Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2223
File: router/cmd/main.go:32-33
Timestamp: 2025-09-19T15:08:03.085Z
Learning: Based on direct user experience from endigma (PR author), pyroscope-go's Config.ServerAddress field accepts simple host:port format (e.g., "localhost:4040") in practice, even though the official documentation primarily shows full URL examples with schemes (e.g., "http://localhost:4040"). The library likely handles URL normalization internally.

Applied to files:

  • router-tests/testenv/testenv.go
🧬 Code graph analysis (1)
router-tests/testenv/testenv.go (1)
router-tests/freeport/freeport.go (3)
  • SetLogLevel (308-317)
  • DEBUG (284-284)
  • GetOne (484-487)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./events)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (13)
router-tests/freeport/freeport.go (8)

1-100: Package structure and initialization look correct.

The lazy initialization pattern with sync.Once, proper mutex guarding of shared state, and condition variable for blocking waits are well-designed. The constants and variable declarations are appropriate.


152-187: Shutdown and reset logic is correct.

The shutdownGoroutine() properly coordinates with the background goroutine, and reset() correctly cleans up all state for test isolation.


189-235: Port recycling logic is correct.

The background goroutine properly verifies freed ports are no longer in use before returning them to the pool, with appropriate condition variable signaling.


237-279: Block allocation logic is sound.

The ephemeral port range detection and random block allocation with TCP listener as mutex is a well-designed approach for cross-process coordination. The alloc() function correctly depends on initialize() validating effectiveMaxBlocks > 0.


281-317: Logging infrastructure is correct.

The LogLevel type and SetLogLevel function are straightforward and properly handle edge cases.


319-364: Port acquisition logic is correct.

The Take() function properly handles concurrent access, lazy initialization, and gracefully handles port theft by external processes. The wait loop pattern with condition variable is correctly implemented.


394-446: Port return and helper functions are correct.

The Return() function properly validates ports are within the allocated block before adding to pending. The helper functions (isPortInUse, tcpAddr, intervalOverlap, logf) are straightforward and correct.


448-487: Test API is well-designed.

The TestingT interface, GetN, and GetOne functions provide a clean API with automatic cleanup via t.Cleanup(). The portLastUser tracking helps debug port leaks.

router-tests/testenv/testenv.go (5)

633-655: Port allocation changes are correct.

The switch to freeport.GetOne(t) for on-demand allocation with automatic cleanup is appropriate. Ignoring the returned port from SetupCDNServer is fine since the CDN URL is extracted from the server directly.


1053-1077: Port allocation in CreateTestEnv is correct.

Consistent with CreateTestSupervisorEnv, the on-demand port allocation pattern is properly applied.


1645-1651: Comment clarifies ephemeral port usage.

The updated comments correctly explain that httptest.NewUnstartedServer already binds to an ephemeral port, making explicit freeport usage unnecessary.


1653-1673: gRPC server setup with ephemeral port is correct.

Using 127.0.0.1:0 for ephemeral port allocation and deriving the endpoint from lis.Addr().String() is the standard approach. The comments explain the rationale well.


1675-1707: SetupCDNServer signature and implementation are correct.

The updated signature returning both the server and port allows callers flexibility. Using httptest.NewUnstartedServer for ephemeral port allocation is appropriate.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
router-tests/freeport/freeport.go (2)

267-279: Potential panic if effectiveMaxBlocks is 0.

If adjustMaxBlocks() returns 0 (when the first port block overlaps with the ephemeral range), then seededRand.Int31n(0) at line 269 will panic with "invalid argument to Int31n".

This was flagged in a previous review and remains unaddressed.

 func alloc() (int, net.Listener) {
+	if effectiveMaxBlocks == 0 {
+		panic("freeport: no available port blocks (effectiveMaxBlocks is 0)")
+	}
 	for i := 0; i < attempts; i++ {
 		block := int(seededRand.Int31n(int32(effectiveMaxBlocks)))

366-371: Potential nil pointer dereference in peekFree().

If freePorts is empty, freePorts.Front() returns nil, and accessing .Value will panic. While this is an internal test helper, it could cause confusing test failures.

This was flagged in a previous review and remains unaddressed.

 func peekFree() int {
 	mu.Lock()
 	defer mu.Unlock()
+	if freePorts.Len() == 0 {
+		return 0
+	}
 	return freePorts.Front().Value.(int)
 }
🧹 Nitpick comments (1)
router-tests/testenv/testenv.go (1)

100-102: Consider a less verbose default log level.

Setting DEBUG in init() will produce verbose output for every port allocation across all tests. Consider INFO or WARN as the default to reduce noise, or make this configurable via an environment variable.

 func init() {
-	freeport.SetLogLevel(freeport.DEBUG)
+	freeport.SetLogLevel(freeport.WARN)
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0ce188 and 8e7bdf1.

📒 Files selected for processing (5)
  • router-tests/freeport/freeport.go (1 hunks)
  • router-tests/jwks/jwks.go (0 hunks)
  • router-tests/router_compatibility_version_config_poller_test.go (3 hunks)
  • router-tests/testenv/testenv.go (9 hunks)
  • router-tests/testenv/testexec.go (2 hunks)
💤 Files with no reviewable changes (1)
  • router-tests/jwks/jwks.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • router-tests/testenv/testexec.go
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/router_compatibility_version_config_poller_test.go
  • router-tests/freeport/freeport.go
  • router-tests/testenv/testenv.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/router_compatibility_version_config_poller_test.go
  • router-tests/testenv/testenv.go
📚 Learning: 2025-08-26T17:19:28.178Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.

Applied to files:

  • router-tests/router_compatibility_version_config_poller_test.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-19T15:08:03.085Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2223
File: router/cmd/main.go:32-33
Timestamp: 2025-09-19T15:08:03.085Z
Learning: pyroscope-go's Config.ServerAddress field accepts both full URLs (e.g., "http://localhost:4040") and simple host:port format (e.g., "localhost:4040"). The library handles URL normalization internally, so both formats work correctly.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-19T15:08:03.085Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2223
File: router/cmd/main.go:32-33
Timestamp: 2025-09-19T15:08:03.085Z
Learning: Based on direct user experience from endigma (PR author), pyroscope-go's Config.ServerAddress field accepts simple host:port format (e.g., "localhost:4040") in practice, even though the official documentation primarily shows full URL examples with schemes (e.g., "http://localhost:4040"). The library likely handles URL normalization internally.

Applied to files:

  • router-tests/testenv/testenv.go
🧬 Code graph analysis (1)
router-tests/router_compatibility_version_config_poller_test.go (1)
router-tests/testenv/testenv.go (1)
  • SetupCDNServer (1675-1707)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_push_image
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
🔇 Additional comments (14)
router-tests/router_compatibility_version_config_poller_test.go (2)

24-32: LGTM! Clean adaptation to the new SetupCDNServer API.

The test correctly destructures the returned (cdnServer, cdnPort) tuple and uses cdnPort for CDN client construction, aligning with the on-demand port allocation approach.


56-64: Consistent pattern applied.

Same correct adaptation as the first test case.

router-tests/testenv/testenv.go (7)

59-59: Good addition of internal freeport package.

Vendoring freeport allows better control over port allocation and avoids external dependency changes affecting test stability.


631-638: Correct on-demand port allocation pattern.

Port is correctly discarded when setting up the internal CDN server since it's not needed for internal wiring. PrometheusPort allocation via freeport.GetOne(t) ensures cleanup is handled automatically.


651-656: Consistent freeport usage for MCP and listener addresses.

Both MCP listen address and router listener address now use freeport.GetOne(t), which properly registers cleanup via t.Cleanup().


1053-1060: Same correct pattern in CreateTestEnv.

Mirrors the changes in CreateTestSupervisorEnv for consistency.


1645-1651: Good clarification in comments.

The comment correctly explains that httptest.NewUnstartedServer binds to an ephemeral port anyway, so explicit freeport is unnecessary here.


1653-1673: Ephemeral port usage for gRPC server is correct.

Using 127.0.0.1:0 lets the OS assign an ephemeral port, and lis.Addr().String() correctly captures the assigned address. The comment explaining the rationale is helpful.


1675-1707: SetupCDNServer API change looks good.

The function now returns both the server and port, deriving the port from cdnServer.Listener.Addr() after the server starts. This eliminates the need to pre-allocate ports and is the correct approach for ephemeral port usage.

router-tests/freeport/freeport.go (5)

1-17: MPL-2.0 licensed code properly attributed.

HashiCorp freeport vendored with correct license header. This approach provides better control over the port allocation mechanism.


116-125: Good defensive validation of effectiveMaxBlocks.

The initialization properly validates that effectiveMaxBlocks >= 0 and that the port range doesn't exceed 65535. However, note that effectiveMaxBlocks == 0 is not caught here (only < 0), which could lead to a panic in alloc().


319-364: Robust port allocation with proper synchronization.

Take() correctly:

  • Validates input
  • Uses once.Do(initialize) for thread-safe initialization
  • Handles port theft by external processes
  • Uses condition variable for blocking when no ports are available

460-479: Test-friendly API with automatic cleanup.

GetN() properly:

  • Calls t.Helper() for correct error attribution
  • Registers cleanup via t.Cleanup() to return ports
  • Tracks port ownership for debugging leaks

481-487: Clean single-port convenience wrapper.

GetOne() correctly delegates to GetN(t, 1) and extracts the single port.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
router-tests/testenv/testenv.go (2)

633-633: Consider simplifying SetupCDNServer signature.

The port return value from SetupCDNServer (line 1675, returned at line 1705) is unused at both call sites (lines 633, 1055). Since httptest.Server.URL already contains the full URL with the port, returning the port separately appears redundant. Consider removing the port return value to simplify the signature.

Apply this diff to simplify the signature:

-func SetupCDNServer(t testing.TB) (cdnServer *httptest.Server, port int) {
+func SetupCDNServer(t testing.TB) *httptest.Server {
 	_, filePath, _, ok := runtime.Caller(0)
 	require.True(t, ok)
 	baseCdnFile := filepath.Join(path.Dir(filePath), "testdata", "cdn")
 	cdnFileServer := http.FileServer(http.Dir(baseCdnFile))
 	var cdnRequestLog []string
 	handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		// ... handler implementation ...
 	})
 	cdnServer = httptest.NewUnstartedServer(handler)
 	cdnServer.Start()
-	port = cdnServer.Listener.Addr().(*net.TCPAddr).Port
-	return
+	return cdnServer
 }

And update the call sites:

-	cdnServer, _ = SetupCDNServer(t)
+	cdnServer = SetupCDNServer(t)

Also applies to: 1055-1055, 1675-1675, 1705-1705


1645-1651: Consider clarifying the comment for precision.

The comment at lines 1646-1647 states that NewUnstartedServer will "start listening anyway," but more precisely, it creates and binds a listener without accepting connections until Start() is called. Consider rewording for clarity.

-	// NewUnstartedServer will bind ephemeral port and start listening anyway,
-	// we don't use freeport here.
+	// NewUnstartedServer binds an ephemeral port but doesn't accept connections
+	// until Start() is called. We don't need freeport here.
 	s := httptest.NewUnstartedServer(handler)
 	s.Start()
 	return s
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e7bdf1 and fe9150d.

📒 Files selected for processing (1)
  • router-tests/testenv/testenv.go (8 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-19T15:08:03.085Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2223
File: router/cmd/main.go:32-33
Timestamp: 2025-09-19T15:08:03.085Z
Learning: pyroscope-go's Config.ServerAddress field accepts both full URLs (e.g., "http://localhost:4040") and simple host:port format (e.g., "localhost:4040"). The library handles URL normalization internally, so both formats work correctly.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-19T15:08:03.085Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2223
File: router/cmd/main.go:32-33
Timestamp: 2025-09-19T15:08:03.085Z
Learning: Based on direct user experience from endigma (PR author), pyroscope-go's Config.ServerAddress field accepts simple host:port format (e.g., "localhost:4040") in practice, even though the official documentation primarily shows full URL examples with schemes (e.g., "http://localhost:4040"). The library likely handles URL normalization internally.

Applied to files:

  • router-tests/testenv/testenv.go
🧬 Code graph analysis (1)
router-tests/testenv/testenv.go (1)
router-tests/freeport/freeport.go (1)
  • GetOne (484-487)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: build_push_image
  • GitHub Check: image_scan
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (4)
router-tests/testenv/testenv.go (4)

100-102: LGTM!

Setting the freeport log level to ERROR in the init function is appropriate for the test environment, reducing unnecessary logging noise during test execution.


637-637: LGTM!

The dynamic port allocation using freeport.GetOne(t) is now placed closer to the actual usage, which aligns with the PR objective and reduces the window for potential port conflicts.

Also applies to: 655-655, 1059-1059, 1077-1077


1653-1673: LGTM!

The use of an ephemeral port binding (127.0.0.1:0) for the gRPC server is a clean and effective approach. The comment clearly explains the design choice, and the implementation correctly extracts the endpoint from the listener's address.


59-59: LGTM!

The integration of the internal router-tests/freeport package (replacing the external HashiCorp version) is correctly implemented throughout the file. The consistent use of freeport.GetOne(t) for dynamic port allocation aligns well with the PR objectives.

@ysmolski ysmolski requested a review from Noroth December 8, 2025 12:00
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
router-tests/testenv/testenv.go (2)

655-655: Listener address now uses freeport only at router entry points

Using listenerAddr := fmt.Sprintf("localhost:%d", freeport.GetOne(t)) right before wiring the router (and router supervisor) localizes freeport usage to the router’s own listener, avoiding the previous overlap with HTTP test servers and GRPC/CDN setup. This is consistent with the PR goal of reducing port flakiness.

If you ever extend the router to support binding to :0 and reporting the actual port, you could drop freeport here entirely, but that would be a larger follow‑up.

Also applies to: 1077-1078


1677-1707: SetupCDNServer new signature and ephemeral port usage are solid; minor note on request log concurrency

The refactor to:

  • Return both *httptest.Server and its bound port, and
  • Use httptest.NewServer(handler) to get an ephemeral port and derive it from cdnServer.Listener.Addr().(*net.TCPAddr).Port

is clean and matches the earlier feedback about explicit (non‑naked) returns. Call sites that don’t care about the port correctly discard it via _.

One minor, pre‑existing concern: cdnRequestLog is a shared slice captured by the handler and appended to on each request. If tests ever issue concurrent CDN requests (which HTTP servers allow), this can become a data race under go test -race. If that matters for your test suite, consider guarding cdnRequestLog with a small sync.Mutex around the append and the / marshal path.

Example sketch:

 func SetupCDNServer(t testing.TB) (cdnServer *httptest.Server, port int) {
@@
-  var cdnRequestLog []string
-  handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+  var (
+    cdnRequestLog   []string
+    cdnRequestLogMu sync.Mutex
+  )
+  handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
     if r.URL.Path == "/" {
-      requestLog, err := json.Marshal(cdnRequestLog)
+      cdnRequestLogMu.Lock()
+      requestLog, err := json.Marshal(cdnRequestLog)
+      cdnRequestLogMu.Unlock()
@@
-    cdnRequestLog = append(cdnRequestLog, r.Method+" "+r.URL.Path)
+    cdnRequestLogMu.Lock()
+    cdnRequestLog = append(cdnRequestLog, r.Method+" "+r.URL.Path)
+    cdnRequestLogMu.Unlock()

(plus a sync import). If you’re confident all CDN usage is single‑threaded, this can be deferred.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 302db28 and 4243fd3.

📒 Files selected for processing (1)
  • router-tests/testenv/testenv.go (8 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-19T15:08:03.085Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2223
File: router/cmd/main.go:32-33
Timestamp: 2025-09-19T15:08:03.085Z
Learning: pyroscope-go's Config.ServerAddress field accepts both full URLs (e.g., "http://localhost:4040") and simple host:port format (e.g., "localhost:4040"). The library handles URL normalization internally, so both formats work correctly.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-19T15:08:03.085Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2223
File: router/cmd/main.go:32-33
Timestamp: 2025-09-19T15:08:03.085Z
Learning: Based on direct user experience from endigma (PR author), pyroscope-go's Config.ServerAddress field accepts simple host:port format (e.g., "localhost:4040") in practice, even though the official documentation primarily shows full URL examples with schemes (e.g., "http://localhost:4040"). The library likely handles URL normalization internally.

Applied to files:

  • router-tests/testenv/testenv.go
🧬 Code graph analysis (1)
router-tests/testenv/testenv.go (1)
router-tests/freeport/freeport.go (1)
  • GetOne (484-487)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: build_test
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./events)
  • GitHub Check: Analyze (go)
🔇 Additional comments (3)
router-tests/testenv/testenv.go (3)

59-59: freeport import and init configuration look good

Importing the vendored freeport and setting its log level once in init keeps test logs quieter while centralizing configuration; no issues here.

Also applies to: 100-102


631-638: CDN server setup & Prometheus port allocation align with new freeport strategy

Both env constructors now:

  • Create a CDN server via SetupCDNServer(t) (on an ephemeral port) when cfg.CdnSever is nil.
  • Use freeport.GetOne(t) only to populate cfg.PrometheusPort when a registry is provided.

This cleanly separates CDN from explicit port allocation and reduces freeport usage to the few places that actually need a concrete port number; behavior looks correct.

Also applies to: 1053-1060


1646-1653: makeSafeHttpTestServer correctly switches to httptest’s ephemeral ports

Relying on httptest.NewUnstartedServer + Start() to bind an ephemeral port, with a clear comment explaining why freeport is avoided, is exactly the right move to reduce “unable to bind” races in tests. No further changes needed here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
router-tests/testenv/testenv.go (3)

631-656: Avoid overwriting caller-specified Prometheus/MCP ports in supervisor env

CreateTestSupervisorEnv now always assigns:

  • cfg.PrometheusPort = freeport.GetOne(t) when PrometheusRegistry != nil.
  • cfg.MCP.Server.ListenAddr = fmt.Sprintf("localhost:%d", freeport.GetOne(t)) when MCP is enabled.
  • listenerAddr from another freeport.GetOne(t).

This is fine for default behavior, but it silently overrides any non-zero PrometheusPort or pre-set MCP.Server.ListenAddr that a caller might want to control explicitly.

Consider guarding these with “only if unset” checks, e.g.:

if cfg.PrometheusRegistry != nil && cfg.PrometheusPort == 0 {
    cfg.PrometheusPort = freeport.GetOne(t)
}
if cfg.MCP.Enabled && cfg.MCP.Server.ListenAddr == "" {
    cfg.MCP.Server.ListenAddr = fmt.Sprintf("localhost:%d", freeport.GetOne(t))
}

That would keep the new centralized allocation while still respecting explicit caller configuration.


1053-1078: Same optional guard for Prometheus/MCP ports in CreateTestEnv

The same pattern appears here: CreateTestEnv auto-assigns PrometheusPort, MCP ListenAddr, and listenerAddr via freeport.GetOne(t).

Behavior is correct as-is, but for symmetry with the supervisor env and to avoid surprising callers who pre-configure these values, consider the same “only if zero/empty” guards before overwriting cfg.PrometheusPort and cfg.MCP.Server.ListenAddr.


1678-1708: CDN test server helper is solid; consider tightening Authorization parsing

SetupCDNServer nicely encapsulates:

  • Locating testdata/cdn relative to this file.
  • Spinning up an httptest.NewServer and returning both the server and its TCP port.
  • Recording CDN requests and exposing them at /.
  • Enforcing a JWT-bearing Authorization header.

One small robustness nit in the handler:

authorization := r.Header.Get("Authorization")
if authorization == "" {
    require.NotEmpty(t, authorization, "missing authorization header")
}
token := authorization[len("Bearer "):]

Blindly slicing authorization[len("Bearer "):] will panic if the header is shorter than "Bearer ". It’s test-only code, but you can make this safer and clearer:

authorization := r.Header.Get("Authorization")
require.NotEmpty(t, authorization, "missing authorization header")

const bearerPrefix = "Bearer "
require.True(t, strings.HasPrefix(authorization, bearerPrefix), "invalid authorization header prefix")

token := strings.TrimPrefix(authorization, bearerPrefix)

This keeps the same assertion semantics while avoiding possible panics from malformed headers.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4243fd3 and 64c70d3.

📒 Files selected for processing (1)
  • router-tests/testenv/testenv.go (7 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-19T15:08:03.085Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2223
File: router/cmd/main.go:32-33
Timestamp: 2025-09-19T15:08:03.085Z
Learning: pyroscope-go's Config.ServerAddress field accepts both full URLs (e.g., "http://localhost:4040") and simple host:port format (e.g., "localhost:4040"). The library handles URL normalization internally, so both formats work correctly.

Applied to files:

  • router-tests/testenv/testenv.go
📚 Learning: 2025-09-19T15:08:03.085Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2223
File: router/cmd/main.go:32-33
Timestamp: 2025-09-19T15:08:03.085Z
Learning: Based on direct user experience from endigma (PR author), pyroscope-go's Config.ServerAddress field accepts simple host:port format (e.g., "localhost:4040") in practice, even though the official documentation primarily shows full URL examples with schemes (e.g., "http://localhost:4040"). The library likely handles URL normalization internally.

Applied to files:

  • router-tests/testenv/testenv.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_test
  • GitHub Check: build_push_image
  • GitHub Check: image_scan
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (3)
router-tests/testenv/testenv.go (3)

59-102: Local freeport import and log-level init are appropriate

Importing the vendored router-tests/freeport package and centralizing freeport.SetLogLevel(freeport.ERROR) in init() is a clean way to keep port allocation logging under control across tests. No issues spotted here.


1646-1652: Ephemeral HTTP test server via NewUnstartedServer looks good

Documenting and switching makeSafeHttpTestServer to rely on httptest.NewUnstartedServer’s ephemeral listener (instead of freeport) is a good move for reducing port binding races on Linux. The Start+return pattern is idiomatic and matches httptest.NewServer behavior.


1658-1673: gRPC server helper now uses ephemeral port and safe error handling

makeSafeGRPCServer now:

  • Binds an ephemeral TCP port with net.Listen("tcp", "127.0.0.1:0").
  • Returns the concrete endpoint from lis.Addr().String().
  • Runs s.Serve(lis) in a goroutine, logging non-grpc.ErrServerStopped errors with t.Errorf.

This fixes the earlier pattern of using require.NoError in a background goroutine and properly treats grpc.ErrServerStopped as a normal shutdown path. The helper is safe and matches gRPC’s expectations around Server.Stop().

@ysmolski ysmolski requested a review from Noroth December 8, 2025 15:34
@ysmolski ysmolski merged commit 469ad19 into main Dec 9, 2025
40 of 42 checks passed
@ysmolski ysmolski deleted the yury/fix_ports_flakiness branch December 9, 2025 09:49
asoorm pushed a commit that referenced this pull request Dec 16, 2025
The "makeSafeHttpTestServer" function creates server for testing using
"httptest.NewUnstartedServer". That method creates listener on ephemeral port.
Before this PR, we were opening another listener using port returned by freeport,
and closing old ephemeral listener. Freeport by itself also checked if port is free
by starting listening on it and then closing it immediately after.
What often has failed is the listening on the port that was just used in another listener.

I have removed the usage of freeport in "makeSafeHttpTestServer" and started
using just ephemeral ports. "freeport" itself mentions that ephemeral ports should be
used whenever possible.

I moved freeport.GetOne closer to actual usage for the router listener.
I vendored freeport into our codebase.

In the future, we should not use freeport if we can use ephemeral ports.
The range of ephemeral ports contains at least 15k ports, and that should
be enought for our needs if we propely close all the connections. Freeport
has only 1 block of 1500 port available and uses tons of resources to check
if the port were free finally.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants