add latency and availability metrics to gateway #838

zeeshan-mohd · 2022-10-03T08:56:36Z

Added metrics that needs to be sent for latency ( in case of middleware request and response ) and availability metrics for middlewares requests

CLAassistant · 2022-10-03T08:56:42Z

All committers have signed the CLA.

isopropylcyanide · 2022-10-06T06:51:38Z

runtime/middlewares.go

+			m.recordLatency(middlewareResponseLatencyTag, start, req)
+
+			//for error metrics only emit when there is gateway error and not request error
+			if res.pendingStatusCode >= 500 {


4xx is a success?

I think it's the wrong place to verify statusCode. statusCode would be available only after we make call to downstream client - ctx = m.handle(ctx, req, res)

4xx is a success?

Removed the success part as it can be taken from total_requests.Not taking 4xx as error is because the availability defined for gateway should not be dependent on the user's request and rather on the gateway internal errors, so that is why only 5xx.

I think it's the wrong place to verify statusCode. statusCode would be available only after we make call to downstream client - ctx = m.handle(ctx, req, res)

zanzibar/runtime/server_http_response.go

Line 271 in 24bea8e

res.pendingStatusCode = statusCode

We are setting the pendingStatuscode whenever we call the res.WriteJson from the middlewares,so this will always be populated in case of errors.

runtime/middlewares.go

bishnuag · 2022-10-18T10:53:47Z

runtime/middlewares.go

+			m.recordLatency(middlewareResponseLatencyTag, start, req)
+
+			//for error metrics only emit when there is gateway error and not request error
+			if res.pendingStatusCode >= 500 {


I think it's the wrong place to verify statusCode. statusCode would be available only after we make call to downstream client - ctx = m.handle(ctx, req, res)

addressed comments

coveralls · 2022-10-25T19:39:54Z

Coverage increased (+0.03%) to 69.997% when pulling eb1ccc0 on zeeshan/emitGatewayMetrics into 1966351 on master.

bishnuag

LGTM

runtime/middlewares.go

isopropylcyanide · 2022-10-27T13:25:27Z

Please add the test plan and a dump of the new metrics from a sample Zanzibar example gateway for both successful / unsuccessful runs.

isopropylcyanide · 2022-10-27T13:25:37Z

runtime/middlewares_test.go

@@ -183,7 +184,7 @@ func TestMiddlewareRequestAbort(t *testing.T) {
 	if !assert.NoError(t, err) {
 		return
 	}
-	assert.Equal(t, resp.StatusCode, http.StatusOK)
+	assert.Equal(t, resp.StatusCode, http.StatusInternalServerError)


Why is this changed?

It was changed because if we are reaching here means that the middleware request has aborted with an error and we will return from here without calling the downstream .So sending a 200 ok in this case is not apt.Additionally , it increased the code coverage as well.

isopropylcyanide

Gave some comments

isopropylcyanide · 2022-10-27T13:33:02Z

runtime/middlewares.go

 			for j := i; j >= 0; j-- {
 				m.middlewares[j].HandleResponse(ctx, res, shared)
 			}
+			//record latency for middlewares responses in unsuccesful case
+			m.recordLatency(middlewareResponseLatencyTag, middlewareResponseStartTime, req.scope)


Are we tracking req/resp latency for each middleware or all of the edge middlewares?

If it's the former, I don't think the code does what it is supposed to do

If there are 4 middlewares and handling fails for the 4th one, then we need to run the decrement loop of handling th responses for the 3 that executed.
In this case, the middleware response latency recorded for the 3rd would be incorrect as it will include the latency of the first 2 middlewares's responses.

for j := i; j >= 0; j-- { m.middlewares[j].HandleResponse(ctx, res, shared) } m.recordLatency(middlewareResponseLatencyTag, middlewareResponseStartTime, req.scope)

Are we tracking req/resp latency for each middleware or all of the edge middlewares?

Yes , it is for all the middlewares that a request goes through , since these metrics are tagged with endpointID we will apply the filtering . So we are not measuring for one middleware but rather all the middleware that a request goes through and filter on the basis of endpointID's.

isopropylcyanide

Requested clarification

sachsingh · 2022-10-31T06:16:08Z

Please add the test plan and a dump of the new metrics from a sample Zanzibar example gateway for both successful / unsuccessful runs.

Added metrics dump https://code.uberinternal.com/P341907

zeeshan-mohd force-pushed the zeeshan/emitGatewayMetrics branch 3 times, most recently from 3f0797d to 5410878 Compare October 3, 2022 16:58

isopropylcyanide reviewed Oct 6, 2022

View reviewed changes

runtime/middlewares.go Outdated Show resolved Hide resolved

zeeshan-mohd force-pushed the zeeshan/emitGatewayMetrics branch from 5410878 to e1c6d2d Compare October 18, 2022 09:43

bishnuag requested changes Oct 18, 2022

View reviewed changes

zeeshan-mohd force-pushed the zeeshan/emitGatewayMetrics branch from e1c6d2d to f126aa7 Compare October 25, 2022 16:48

add latency and availability metrics to gateway

c61eb74

addressed comments

zeeshan-mohd force-pushed the zeeshan/emitGatewayMetrics branch from f126aa7 to c61eb74 Compare October 25, 2022 19:22

sachsingh added 3 commits October 27, 2022 10:32

changed metric to histogram

f54351c

added histogram metrics

25f29e4

added histogram metrics

5ad17f4

bishnuag approved these changes Oct 27, 2022

View reviewed changes

isopropylcyanide reviewed Oct 27, 2022

View reviewed changes

runtime/middlewares.go Outdated Show resolved Hide resolved

isopropylcyanide reviewed Oct 27, 2022

View reviewed changes

isopropylcyanide requested changes Oct 27, 2022

View reviewed changes

isopropylcyanide reviewed Oct 27, 2022

View reviewed changes

fixed comments

eb1ccc0

sachsingh closed this Oct 31, 2022

sachsingh reopened this Oct 31, 2022

isopropylcyanide approved these changes Oct 31, 2022

View reviewed changes

sachsingh merged commit d9a8b55 into master Oct 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add latency and availability metrics to gateway #838

add latency and availability metrics to gateway #838

zeeshan-mohd commented Oct 3, 2022

CLAassistant commented Oct 3, 2022 •

edited

Loading

isopropylcyanide Oct 6, 2022

bishnuag Oct 18, 2022

zeeshan-mohd Oct 26, 2022

zeeshan-mohd Oct 27, 2022

bishnuag Oct 18, 2022

coveralls commented Oct 25, 2022 •

edited

Loading

bishnuag left a comment

isopropylcyanide commented Oct 27, 2022

isopropylcyanide Oct 27, 2022

zeeshan-mohd Oct 31, 2022

isopropylcyanide left a comment

isopropylcyanide Oct 27, 2022

zeeshan-mohd Oct 31, 2022

isopropylcyanide left a comment

sachsingh commented Oct 31, 2022

add latency and availability metrics to gateway #838

add latency and availability metrics to gateway #838

Conversation

zeeshan-mohd commented Oct 3, 2022

CLAassistant commented Oct 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Oct 25, 2022 • edited Loading

bishnuag left a comment

Choose a reason for hiding this comment

isopropylcyanide commented Oct 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

isopropylcyanide left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

isopropylcyanide left a comment

Choose a reason for hiding this comment

sachsingh commented Oct 31, 2022

CLAassistant commented Oct 3, 2022 •

edited

Loading

coveralls commented Oct 25, 2022 •

edited

Loading