-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(servicegraph): make virtual node tests reliable on Windows #37550
fix(servicegraph): make virtual node tests reliable on Windows #37550
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still draft but LGTM, minus the change log file. Thanks @mapno!
# Use this changelog template to create an entry for release notes. | ||
|
||
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' | ||
change_type: bug_fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change doesn't require a change log: it is not visible to users. You can remove this file from the change I will add the label to skip the check for a change log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
// Wait for metrics to be generated with timeout | ||
deadline := time.Now().Add(5 * time.Second) | ||
var metrics []pmetric.Metrics | ||
for time.Now().Before(deadline) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: you can keep using the assert.Eventually
or assert.EventuallyWithT
, the main advantage being the error message in case of failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-added again
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
The virtual node tests (TestVirtualNodeServerLabels and TestVirtualNodeClientLabels) were failing intermittently on Windows due to timing issues. This change: - Reduces MetricsFlushInterval and StoreExpirationLoop to 10ms - Adds a timeout-based polling mechanism to wait for metrics generation - Replaces unreliable sleep with proper metric availability checks - Improves error messaging for test failures Fixes open-telemetry#33679
cfda079
to
8d273de
Compare
The CI job for windows tests passed. Unfortunately I don't have a windows machine to verify that it's not flaky :/ |
@mapno 100s of runs on my box without failures, it seems good by that metric. |
virtualNodeDimensions := []string{"peer.service", "db.system", "messaging.system"} | ||
cfg := &Config{ | ||
Dimensions: virtualNodeDimensions, | ||
LatencyHistogramBuckets: []time.Duration{time.Duration(0.1 * float64(time.Second)), time.Duration(1 * float64(time.Second)), time.Duration(10 * float64(time.Second))}, | ||
Store: StoreConfig{MaxItems: 10}, | ||
VirtualNodePeerAttributes: virtualNodeDimensions, | ||
VirtualNodeExtraLabel: true, | ||
MetricsFlushInterval: time.Millisecond, | ||
// Reduce flush interval for faster test execution | ||
MetricsFlushInterval: 1 * time.Millisecond, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice this shouldn't make any difference, but I'm curious if it is deliberate that here is 1 ms vs 10 ms on the other test. Anyway, as I said this shouldn't make a practical difference.
Description
The virtual node tests (TestVirtualNodeServerLabels and TestVirtualNodeClientLabels) were failing intermittently on Windows due to timing issues. This change:
Link to tracking issue
Fixes #33679
Testing
Updates tests
Documentation