fix(servicegraph): make virtual node tests reliable on Windows #37550

mapno · 2025-01-28T16:50:19Z

Description

The virtual node tests (TestVirtualNodeServerLabels and TestVirtualNodeClientLabels) were failing intermittently on Windows due to timing issues. This change:

Reduces MetricsFlushInterval and StoreExpirationLoop to 10ms
Adds a timeout-based polling mechanism to wait for metrics generation
Replaces unreliable sleep with proper metric availability checks
Improves error messaging for test failures

Link to tracking issue

Fixes #33679

Testing

Updates tests

Documentation

pjanotti

Still draft but LGTM, minus the change log file. Thanks @mapno!

pjanotti · 2025-01-29T01:05:03Z

.chloggen/fix_servicegraph-windows-tests.yaml

+# Use this changelog template to create an entry for release notes.
+
+# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
+change_type: bug_fix


This change doesn't require a change log: it is not visible to users. You can remove this file from the change I will add the label to skip the check for a change log.

pjanotti · 2025-01-29T18:11:09Z

connector/servicegraphconnector/connector_test.go

+	// Wait for metrics to be generated with timeout
+	deadline := time.Now().Add(5 * time.Second)
+	var metrics []pmetric.Metrics
+	for time.Now().Before(deadline) {


Side note: you can keep using the assert.Eventually or assert.EventuallyWithT, the main advantage being the error message in case of failure.

Re-added again

github-actions · 2025-02-13T05:21:34Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

The virtual node tests (TestVirtualNodeServerLabels and TestVirtualNodeClientLabels) were failing intermittently on Windows due to timing issues. This change: - Reduces MetricsFlushInterval and StoreExpirationLoop to 10ms - Adds a timeout-based polling mechanism to wait for metrics generation - Replaces unreliable sleep with proper metric availability checks - Improves error messaging for test failures Fixes open-telemetry#33679

mapno · 2025-02-17T10:07:39Z

The CI job for windows tests passed. Unfortunately I don't have a windows machine to verify that it's not flaky :/

pjanotti · 2025-02-26T03:28:57Z

@mapno 100s of runs on my box without failures, it seems good by that metric.

pjanotti · 2025-02-26T03:41:24Z

connector/servicegraphconnector/connector_test.go

 	virtualNodeDimensions := []string{"peer.service", "db.system", "messaging.system"}
 	cfg := &Config{
 		Dimensions:                virtualNodeDimensions,
 		LatencyHistogramBuckets:   []time.Duration{time.Duration(0.1 * float64(time.Second)), time.Duration(1 * float64(time.Second)), time.Duration(10 * float64(time.Second))},
 		Store:                     StoreConfig{MaxItems: 10},
 		VirtualNodePeerAttributes: virtualNodeDimensions,
 		VirtualNodeExtraLabel:     true,
-		MetricsFlushInterval:      time.Millisecond,
+		// Reduce flush interval for faster test execution
+		MetricsFlushInterval: 1 * time.Millisecond,


In practice this shouldn't make any difference, but I'm curious if it is deliberate that here is 1 ms vs 10 ms on the other test. Anyway, as I said this shouldn't make a practical difference.

github-actions bot added the connector/servicegraph label Jan 28, 2025

github-actions bot requested a review from JaredTan95 January 28, 2025 16:51

pjanotti reviewed Jan 29, 2025

View reviewed changes

pjanotti added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Jan 29, 2025

pjanotti reviewed Jan 29, 2025

View reviewed changes

github-actions bot added the Stale label Feb 13, 2025

mapno added 4 commits February 17, 2025 10:48

Add chlog

15a9476

Remove chlog entry

0e633ca

Test improvements

8d273de

mapno force-pushed the fix/servicegraph-windows-tests branch from cfda079 to 8d273de Compare February 17, 2025 09:57

mapno marked this pull request as ready for review February 17, 2025 10:27

mapno requested a review from a team as a code owner February 17, 2025 10:27

mapno requested a review from dehaansa February 17, 2025 10:27

github-actions bot assigned dashpole Feb 17, 2025

github-actions bot removed the Stale label Feb 18, 2025

pjanotti approved these changes Feb 26, 2025

View reviewed changes

atoulme approved these changes Feb 26, 2025

View reviewed changes

pjanotti added the ready to merge Code review completed; ready to merge by maintainers label Feb 26, 2025

andrzej-stencel merged commit 7993c9d into open-telemetry:main Mar 3, 2025
176 checks passed

github-actions bot added this to the next release milestone Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(servicegraph): make virtual node tests reliable on Windows #37550

fix(servicegraph): make virtual node tests reliable on Windows #37550

Uh oh!

mapno commented Jan 28, 2025

Uh oh!

pjanotti left a comment

Uh oh!

pjanotti Jan 29, 2025

Uh oh!

mapno Feb 17, 2025

Uh oh!

pjanotti Jan 29, 2025

Uh oh!

mapno Feb 17, 2025

Uh oh!

github-actions bot commented Feb 13, 2025

Uh oh!

mapno commented Feb 17, 2025

Uh oh!

pjanotti commented Feb 26, 2025

Uh oh!

pjanotti Feb 26, 2025

Uh oh!

Uh oh!

Uh oh!

fix(servicegraph): make virtual node tests reliable on Windows #37550

fix(servicegraph): make virtual node tests reliable on Windows #37550

Uh oh!

Conversation

mapno commented Jan 28, 2025

Description

Link to tracking issue

Testing

Documentation

Uh oh!

pjanotti left a comment

Choose a reason for hiding this comment

Uh oh!

pjanotti Jan 29, 2025

Choose a reason for hiding this comment

Uh oh!

mapno Feb 17, 2025

Choose a reason for hiding this comment

Uh oh!

pjanotti Jan 29, 2025

Choose a reason for hiding this comment

Uh oh!

mapno Feb 17, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 13, 2025

Uh oh!

mapno commented Feb 17, 2025

Uh oh!

pjanotti commented Feb 26, 2025

Uh oh!

pjanotti Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!