-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics namespace change #499
Conversation
e26940c
to
040487c
Compare
runtime/call_metrics.go
Outdated
endpointScope = "endpoint" | ||
|
||
inboundCallsRecvd = "request" | ||
inboundCallsLatency = "totalhandlertime" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "latency" is probably better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Totalhandlertime is to align with RTAPI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to align just for the sake of alignment. The name totalhandlertime
is historical and shouldn't be inherited moving forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to align just for the sake of alignment. The name
totalhandlertime
is historical and shouldn't be inherited moving forward.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you forget to push to remote? Still showing totalhandlertime
to me.
runtime/call_metrics.go
Outdated
outboundCallsErrors = "outbound.calls.errors" | ||
outboundCallsStatus = "outbound.calls.status" | ||
outboundCallsSent = "request" | ||
outboundCallsLatency = "routerhandlertime" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also just be "latency".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. routerhandlertime is to align with RTAPI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
runtime/call_metrics.go
Outdated
outboundCallsAppErrors = "app-errors" | ||
outboundCallsSystemErrors = "system-errors" | ||
outboundCallsErrors = "errors" | ||
outboundCallsStatus = "status" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the metrics are sub scoped with client
or endpoint
, the values of these metrics become call direction agnostic, we don't need to have two set of different variables to represent the same metric names, e.g. - inboundCallsSuccess
and outboundCallsSuccess
should just be consolidated to success
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
runtime/gateway.go
Outdated
@@ -476,11 +473,11 @@ func (gateway *Gateway) setupMetrics(config *StaticConfig) (err error) { | |||
gateway.AllHostScope = gateway.RootScope.SubScope( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is AllHostScope
needed anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AllHostScope is needed since it is related to zap namespace and tchannel metrics namespace, however, metrics would only have one scope which is RootScope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AllHostScope is needed since it is related to zap namespace and tchannel metrics namespace
Related in what way? Can you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AllHostScope is needed since it is related to zap namespace and tchannel metrics namespace
Related in what way? Can you elaborate?
https://github.com/uber/zanzibar/blob/master/runtime/gateway.go#L556-L560
https://github.com/uber/zanzibar/blob/master/runtime/gateway.go#L619-L622
https://github.com/uber/zanzibar/blob/master/runtime/gateway.go#L580-L596
https://github.com/uber/zanzibar/blob/master/runtime/gateway.go#L682-L684
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these references can be replaced by the RootScope
, the AllHostScope
metric name service + "." + env + ".all-workers"
is redundant and verbose, it is already tagged with env
and service
. I believe the original purpose for its existence is to reduce the cardinality introduced by tags host
and dc
, which I don't know if it is still a valid concern. If we want to have the same effect, we can replace it the RootScope
before it is tagged with perHostTags, not the one that is overwritten here https://github.com/uber/zanzibar/pull/499/files#diff-ee4b13d64c39fd8e3d0483c1076bb31cR480.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
This is a breaking change in a sense that dashboards built on previous metrics will no longer work. We need to put this in the changelog and communicate. |
bfbfc13
to
a89637d
Compare
Merged master, please review, thank you |
runtime/metrics_namespace.go
Outdated
clientSuccess = "client.success" | ||
clientStatus = "client.status" | ||
clientErrors = "client.errors" | ||
clientAppErrors = "client.app-errors" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't there be a clientSystemErrors = "client.system-errors"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
runtime/tchannel_client.go
Outdated
@@ -271,13 +272,13 @@ func (c *tchannelOutboundCall) finish(ctx context.Context, err error) { | |||
errCause := tchannel.GetSystemErrorCode(errors.Cause(err)) | |||
scopeTags := map[string]string{scopeTagError: errCause.MetricsKey()} | |||
ctx = WithScopeTags(ctx, scopeTags) | |||
c.metrics.IncCounter(ctx, outboundCallsSystemErrors, 1) | |||
c.metrics.IncCounter(ctx, clientErrors, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be clientSystemErrors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
a89637d
to
41f5c5a
Compare
This reverts commit 19aa43d.
This reverts commit 19aa43d.
This commits: