Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics namespace change #499

Merged
merged 1 commit into from
Oct 26, 2018
Merged

Metrics namespace change #499

merged 1 commit into from
Oct 26, 2018

Conversation

herainman
Copy link
Contributor

@herainman herainman commented Oct 23, 2018

This commits:

  • Use RootScope instead of PerHostScope
  • Rename counter and timer
  • Add protocol tag for HTTP and TChannel

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 61.772% when pulling 040487c on alignnamespace into c662e77 on master.

@coveralls
Copy link

coveralls commented Oct 23, 2018

Coverage Status

Coverage increased (+4.6%) to 66.308% when pulling 41f5c5a on alignnamespace into 2f9c950 on master.

endpointScope = "endpoint"

inboundCallsRecvd = "request"
inboundCallsLatency = "totalhandlertime"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "latency" is probably better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Totalhandlertime is to align with RTAPI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to align just for the sake of alignment. The name totalhandlertime is historical and shouldn't be inherited moving forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to align just for the sake of alignment. The name totalhandlertime is historical and shouldn't be inherited moving forward.

fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget to push to remote? Still showing totalhandlertime to me.

outboundCallsErrors = "outbound.calls.errors"
outboundCallsStatus = "outbound.calls.status"
outboundCallsSent = "request"
outboundCallsLatency = "routerhandlertime"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also just be "latency".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. routerhandlertime is to align with RTAPI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

outboundCallsAppErrors = "app-errors"
outboundCallsSystemErrors = "system-errors"
outboundCallsErrors = "errors"
outboundCallsStatus = "status"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the metrics are sub scoped with client or endpoint, the values of these metrics become call direction agnostic, we don't need to have two set of different variables to represent the same metric names, e.g. - inboundCallsSuccess and outboundCallsSuccess should just be consolidated to success.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -476,11 +473,11 @@ func (gateway *Gateway) setupMetrics(config *StaticConfig) (err error) {
gateway.AllHostScope = gateway.RootScope.SubScope(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is AllHostScope needed anymore?

Copy link
Contributor Author

@herainman herainman Oct 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AllHostScope is needed since it is related to zap namespace and tchannel metrics namespace, however, metrics would only have one scope which is RootScope

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AllHostScope is needed since it is related to zap namespace and tchannel metrics namespace

Related in what way? Can you elaborate?

Copy link
Contributor Author

@herainman herainman Oct 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these references can be replaced by the RootScope, the AllHostScope metric name service + "." + env + ".all-workers" is redundant and verbose, it is already tagged with env and service. I believe the original purpose for its existence is to reduce the cardinality introduced by tags host and dc, which I don't know if it is still a valid concern. If we want to have the same effect, we can replace it the RootScope before it is tagged with perHostTags, not the one that is overwritten here https://github.com/uber/zanzibar/pull/499/files#diff-ee4b13d64c39fd8e3d0483c1076bb31cR480.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@ChuntaoLu
Copy link
Contributor

This is a breaking change in a sense that dashboards built on previous metrics will no longer work. We need to put this in the changelog and communicate.

@ChuntaoLu ChuntaoLu mentioned this pull request Oct 24, 2018
@herainman herainman force-pushed the alignnamespace branch 3 times, most recently from bfbfc13 to a89637d Compare October 26, 2018 21:54
@herainman
Copy link
Contributor Author

Merged master, please review, thank you

clientSuccess = "client.success"
clientStatus = "client.status"
clientErrors = "client.errors"
clientAppErrors = "client.app-errors"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be a clientSystemErrors = "client.system-errors"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -271,13 +272,13 @@ func (c *tchannelOutboundCall) finish(ctx context.Context, err error) {
errCause := tchannel.GetSystemErrorCode(errors.Cause(err))
scopeTags := map[string]string{scopeTagError: errCause.MetricsKey()}
ctx = WithScopeTags(ctx, scopeTags)
c.metrics.IncCounter(ctx, outboundCallsSystemErrors, 1)
c.metrics.IncCounter(ctx, clientErrors, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be clientSystemErrors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@herainman herainman merged commit 19aa43d into master Oct 26, 2018
herainman added a commit that referenced this pull request Oct 27, 2018
herainman added a commit that referenced this pull request Oct 27, 2018
@herainman herainman deleted the alignnamespace branch October 27, 2018 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants