feat: http status code not being correct for metrics#2516
Conversation
WalkthroughAdds tests asserting authentication-failure HTTP status is recorded in metrics, introduces helper functions for metric-data checks, and refactors GraphQL prehandler to wrap the ResponseWriter early to capture status and bytes for all code paths. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
Router image scan passed✅ No security vulnerabilities found in image: |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
router-tests/prometheus_test.go (1)
4261-4264: Set Content-Type to keep the auth failure deterministic.
With a nil header, a future change in request parsing order could yield a 400 instead of 401 and break the intent of this test.✅ Proposed tweak
- res, err := xEnv.MakeRequest(http.MethodPost, "/graphql", nil, + header := http.Header{"Content-Type": []string{"application/json"}} + res, err := xEnv.MakeRequest(http.MethodPost, "/graphql", header, strings.NewReader(`{"query":"{ employees { id } }"}`))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@router-tests/prometheus_test.go` around lines 4261 - 4264, The test currently calls xEnv.MakeRequest(http.MethodPost, "/graphql", nil, ...) with a nil header which can make auth failure nondeterministic; update the call in the test to pass an explicit headers map containing "Content-Type":"application/json" so the request parsing always treats the body as JSON and the unauthenticated request deterministically yields 401 (locate the xEnv.MakeRequest invocation in the test and replace the nil header argument with the headers map).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@router-tests/telemetry/telemetry_test.go`:
- Around line 7880-7898: The test currently assumes datapoints are in a stable
order by accessing DataPoints[0] for requestsMetric and durationMetric; change
the assertions to scan the DataPoints slice for the point whose Attributes
contains statusCodeKey == http.StatusUnauthorized instead of using
DataPoints[0]. Specifically, after calling metricReader.Collect and locating the
scope via integration.GetMetricScopeByName and the metrics via
integration.GetMetricByName (router.http.requests and
router.http.request.duration_milliseconds), iterate over
requestsMetric.Data.(metricdata.Sum[int64]).DataPoints and
durationMetric.Data.(metricdata.Histogram[float64]).DataPoints, find the
datapoint whose Attributes.Value(statusCodeKey).AsInt64() ==
int64(http.StatusUnauthorized), and assert on that point (fail the test if no
matching datapoint is found).
---
Nitpick comments:
In `@router-tests/prometheus_test.go`:
- Around line 4261-4264: The test currently calls
xEnv.MakeRequest(http.MethodPost, "/graphql", nil, ...) with a nil header which
can make auth failure nondeterministic; update the call in the test to pass an
explicit headers map containing "Content-Type":"application/json" so the request
parsing always treats the body as JSON and the unauthenticated request
deterministically yields 401 (locate the xEnv.MakeRequest invocation in the test
and replace the nil header argument with the headers map).
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2516 +/- ##
==========================================
+ Coverage 61.79% 62.18% +0.38%
==========================================
Files 231 231
Lines 24143 24141 -2
==========================================
+ Hits 14919 15011 +92
+ Misses 7980 7865 -115
- Partials 1244 1265 +21
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@router-tests/telemetry/telemetry_test.go`:
- Around line 7884-7885: The test currently blindly dereferences the result of
integration.GetMetricScopeByName (assigned to scopeMetric) which can be nil;
change the code to capture the pointer (e.g., scopeMetricPtr :=
integration.GetMetricScopeByName(rm.ScopeMetrics, "cosmo.router")), check if
scopeMetricPtr == nil and fail the test with a clear message (t.Fatalf or
require.NotNil) before using it, then use *scopeMetricPtr (or keep pointer
usage) for the rest of the assertions to avoid a panic and produce a helpful
test failure.
Right now if a non 200 http status code is returned from a non happy path before the final happy path, this would not be detected in metrics. As we set a static 200 http status code, this leads to cases where metrics report 200 http status codes but the status code returned to the user is 401 or such. This PR ensures that all paths will record the correct http status code for metrics.
Note: The wrapped writer roughly adds 70~100 nanoseconds.
Summary by CodeRabbit
Tests
Bug Fixes
Checklist