Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics to the Prometheus API #1623

Merged
merged 17 commits into from
Jan 21, 2021

Conversation

shkwsk
Copy link
Contributor

@shkwsk shkwsk commented May 9, 2020

Hello.

This patch adds latency and availability metrics for Predict method in gRPC API to the Prometheus API.
The metrics are measured for the entire process and for each model.

The added metric items are as follows.

  • /tensorflow/serving/all_request_count: the total number of requests in requests processing
  • /tensorflow/serving/all_request_fail_count: number of failed requests in the entire request process
  • /tensorflow/serving/all_request_latency_usec: total request processing latency
  • /tensorflow/serving/all_request_latency_histogram_usec: a histogram of the total latency of the request processing
  • /tensorflow/serving/model_request_count: number of requests by model
  • /tensorflow/serving/model_request_fail_count: number of failed requests by model
  • /tensorflow/serving/model_request_latency_usec: latency by model
  • /tensorflow/serving/model_request_latency_histogram_usec: latency histogram by model

qisikai
qisikai previously approved these changes May 10, 2020
@shkwsk
Copy link
Contributor Author

shkwsk commented May 24, 2020

It's been two weeks now, could you please confirm this?

@netfs netfs self-requested a review June 1, 2020 21:19
@Windfarer
Copy link

any update about this PR? will it be merged? @qisikai @netfs

@shkwsk
Copy link
Contributor Author

shkwsk commented Aug 13, 2020

@netfs Please check and merge this PR?

@netfs
Copy link
Collaborator

netfs commented Nov 24, 2020

please do look at c317582 that adds runtime (latency) metrics for predict/regress/classify methods (and these are exported as prometheus metrics, if you enable prometheus export).

the above commit does not include latency stats for rpc/http handling, and thats where your PR would be useful.

@shkwsk
Copy link
Contributor Author

shkwsk commented Dec 5, 2020

@netfs
Yes.
I want to measure the latency stats for rpc/http handling.
And I also want to measure the availability of the model.

This PR allows to measure availability and latency that not includes in c317582.

tensorflow_serving/servables/tensorflow/util.cc Outdated Show resolved Hide resolved
tensorflow_serving/servables/tensorflow/util.cc Outdated Show resolved Hide resolved
tensorflow_serving/servables/tensorflow/util.cc Outdated Show resolved Hide resolved
tensorflow_serving/servables/tensorflow/util.cc Outdated Show resolved Hide resolved
tensorflow_serving/servables/tensorflow/util.cc Outdated Show resolved Hide resolved
tensorflow_serving/servables/tensorflow/util.cc Outdated Show resolved Hide resolved
tensorflow_serving/servables/tensorflow/util.cc Outdated Show resolved Hide resolved
tensorflow_serving/servables/tensorflow/util.cc Outdated Show resolved Hide resolved
Conflicts:
	tensorflow_serving/model_servers/http_rest_api_handler.cc
	tensorflow_serving/servables/tensorflow/util.cc
	tensorflow_serving/servables/tensorflow/util.h
@shkwsk
Copy link
Contributor Author

shkwsk commented Jan 4, 2021

@netfs I fixed.

Copy link
Collaborator

@netfs netfs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the recent changes. few more suggestions.

tensorflow_serving/model_servers/http_server.cc Outdated Show resolved Hide resolved
tensorflow_serving/servables/tensorflow/util.cc Outdated Show resolved Hide resolved
tensorflow_serving/servables/tensorflow/util.cc Outdated Show resolved Hide resolved
@shkwsk
Copy link
Contributor Author

shkwsk commented Jan 10, 2021

@netfs I fixed.

Copy link
Collaborator

@netfs netfs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@netfs netfs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change! and for being patient and addressing all my comments.

@netfs
Copy link
Collaborator

netfs commented Jan 20, 2021

Will merge shortly.

@shkwsk
Copy link
Contributor Author

shkwsk commented Jan 20, 2021

Sorry, I fixed test. (I don't know the command to run unit tests in my development environment.)

@tensorflow-copybara tensorflow-copybara merged commit dedca4c into tensorflow:master Jan 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants